CABIOS Vol. 73 no. 6 1997 Pages 587-591 Q-RT-PCR: data analysis software for measurement of gene expression by competitive RT-PCR Peter A. Doris 2, Amanda Hayward-Lester and Jon K. Hays Sr 1 Department of Cell Biology and Biochemistry and 1 Department of Computer Services, Texas Tech University Health Sciences Center, Lubbock, TX 79430, USA Received on April 17, 1997, revised on June 23, 1997, accepted on July 1, 1997 Abstract Motivation: We have developed software to assist in the computation of gene expression from data obtained in competitive reverse transcription-polymerase chain reaction (RT-PCR). This report describes the mathematical basis of competitive RT-PCR and discusses the criteria which must be met to permit accurate estimations of gene expression to be obtained using this technique. Results: The software that has been developed assists in both the assessment of assay performance (specifically in establishing the equality of amplification efficiency of the native and competitor templates) and in the routine analysis of data obtained in quantitation of gene expression by competitive RT-PCR. The software is a 100 kb module which functions as a Microsoft Excel add-in. It is compatible with both Windows and Mac versions of Excel 5 and Excel 7 on the Windows 95 platform, and employs the spreadsheet, statistical and graphing capabilities incorporated into Excel. Availability: The software can be downloaded from http://www.grad.ttuhsc.edu/archive/. A brief summary in both HTML and Microsoft Word 6 format of the installation and use of the software is also located at this website. Contact: E-mail: pdoris@imm2.imm.uth.tmc.edu Introduction Quantitation of gene expression by competitive reverse transcription-polymerase chain reaction (RT-PCR) has been developed to the extent that it can now be considered, if appl ied rigorously, as a well-evaluated, robust, quantitative method capable of providing accurate and precise estimates of the abundance of specific transcripts in RNA preparations, even in very small sample sizes (Hayward-Lester et al., 1997). However, application of the method in a manner that permits the accuracy and precision of the method to be fully acquired 2 To whom correspondence should be addressed at: Institute tor Molecular Medicine, University of Texas Health Sciences Center. 2121 W. Holcombe Blvd, Houston, TX 77030, USA requires a mathematical approach to assay system validation and to routine data analysis. This approach, while routine, is not simple and has often been overlooked. Therefore, we have developed a software tool which provides users of this technique with a convenient means for the validation of assay performance and for routine data handling. System and methods The development of this quantitative method has required an approach founded on mathematical descriptions of the PCR amplification of templates in reactions in which two templates are being simultaneously amplified (Raeymaekers, 1993). Amplification of a single template can be described by the equation: N = N o {\ + E)" (1) where N is the final amount of reaction product, /Vn is the initial amount of DNA in the reaction, n is the number of cycles and E is the efficiency of the reaction. If variations in reaction efficiency of only 5% occur between two samples being compared in two different reaction tubes, changes in amount of product of >100% after 30 amplification cycles will result and make comparison of gene expression levels between the samples meaningless. Competitive RT-PCR is designed to overcome this limitation. The idea is to amplify not only the cdna of interest in the PCR reaction, but also another cdna template (competitor) which can be amplified by exactly the same primers. If the amount of the competitor cdna in the starting sample is known or is always constant between samples, it can be used as a quantitative reference providing that it shares necessary similarities with the native template for which it is providing reference. The use of RNA competitors to generate native and competitor cdn As simultaneously in the same reaction tube permits this technique to be used for accurately quantitating gene expression. Equation (1) can then be extended to describe the amplification of both templates. If the initial unknown amount of a gene U in a competitive RT-PCR reaction is U o and that of its specific competitor RNA is C o. and these are subject to /; Oxford University Press 587
P.A.Doris, A.Hayward-Lester and J.K.Hays Sr reaction cycles in which the efficiency of amplification is u and E c for the unknown and competitor, respectively, then from equation (1) (above) we can describe the amount of reaction products at the end of n cycles by: / = / (!+ )" C, = (1 +,)" Making a ratio of equations (2) and (3), and taking the logarithm, gives: \og(u,,/c n ) = log[/,,-logc u + n log[(l + )/(! + < )] (4) A basic assumption of competitive RT-PCR which we have recently proven for systems following our own design is that E u and E c remain equal to one another throughout the reaction. If true, equation (4) reduces to: (2) (3) \og(ujc n ) = \ogu a - logc,, (5) In calculating competitive RT-PCR reactions to obtain the unknown amount of a gene (t/ 0 ) present in a sample, a plot is made which relates \og(u n /C n ) to logc o, the known amount of starting competitor RNA. This allows \ogu o to be calculated. The estimated initial value for the amount of native gene expressed in a sample (U o ) is the antilog of this value. Equation (5) indicates that such a plot will form a straight line having a slope of -1 [or of 1 if \og(c n /U n ) is plotted] because it is of the form y = ax ± b where the value of a must be 1 or -1 to follow the model. The calculations to be performed in establishing a competitive RT-PCR system and demonstrating that it is in concordance with these mathematical constraints are complex. However, once such a system has been established and accordance with mathematical requirements has been demonstrated, the amount of unknown gene expressed in a sample can be estimated in single tube assays, removing the need for titrations, once the approximate abundance has been determined. However, even with such a simplified system, estimates of the initial abundance of the target template require laborious mathematics in which error can easily be introduced. Our own work in developing competitive RT-PCR systems has shown that the use of homologous competitors is essential if the competitive system is to produce accurate estimates (Hayward-Lestere/a/., 1995, 1996). Homologous competitors, however, have the capability of producing heteroduplex products formed by the annealing of the competitor amplicons with highly homologous native amplicons. In our experience, such heteroduplexes are usually not resolved by non-denaturing agarose gel electrophoresis. This may lead to the erroneous assumption that, since heteroduplexes are not visible by this technique, they are not present. The introduction of a novel, inexpensive and highly accurate PCR product analysis technique based on ion pair reversed-phase HPLC has resolved the important question of heteroduplex formation and permits accurate resolution of both homoduplex and heteroduplex reaction products (Hayward-Lester et al., 1995, 1996). In light of the insight gained in these experiments, it seems safest to assume that heteroduplexes are always present at the conclusion of competitive reactions unless positive evidence to the contrary exists. Since the mathematical method involves analysis of reaction product ratios and heteroduplex formation disturbs the final homoduplex product ratios, it is essential that heteroduplexes be identified, accurately quantitated and their contribution to final product ratios be included in the analysis. Implementation The data analysis system we have devised was produced using the visual basic programming tools in Microsoft Excel. The resulting program has been compiled as a Microsoft Excel add-in module. In this manner, the program runs within Excel and can make use of the spreadsheet capabilities, data analysis and graphic plotting tools in Excel. Because of the cross-platform functionality of Microsoft Excel, the addin module functions identically in versions of Excel 5 for both Windows and Mac operating systems, and Excel 7 for the Windows 95 operating system. The software performs two essential functions. One relates to the validation of a new quantification system, the other to routine data analysis from a validated system. As discussed earlier, competitive RT-PCR systems are founded on the assumption that the amplification efficiencies of the native and competitor templates in PCR are identical to each other throughout the reaction. This can be tested by performing a titration of various amounts of competitor against uniform amounts of native template. If E c and E u are not identical, the resulting titrations may vary from ideals of linearity and slopes of unity. Thus, one module of the software analyzes titration data and provides an opportunity to test whether a newly developed system is complying with mathematical constraints. Once such validation is performed, titration becomes unnecessary for routine quantification. So long as the approximate abundance of a target template in a sample is known, single tube competitions with only one dose of competitor can be performed using the second module in the software designed for single tube analysis. The software is dialog driven with data entry occurring in response to requests provided in the dialogs. The initial window (Figure 1) permits the user to select between options which provide a means to: (i) input information for use either in titration or single tube estimations concerning reaction product (native and competitor) and competitor RNA template sizes; (ii) enter data describing the accumulated product amounts present at the end of the reaction for either single 588
Q-RT-PCR s S t a r t i n g Uialinj Select one of the foilowl ng: (8: Set up template C- Input data for a titration which has a template C Input data for a single tube which hasatemplate [ Ok ~ I' Cancel ) C Edit a sheet Fig. 1. After selecting Q-RT-PCR from the Excel 'Tools' menu, the initial dialog allows the user to select from a variety of set-up, input and editing options. tube estimates or for titration-based estimates and the corresponding amount of competitor RNA present in the reaction; and (iii) edit existing data. If selection (i) above is made, dialogs follow which accumulate the information necessary to build a spreadsheet to provide a template for performing estimations in either titration or single tube modes (Figure 2). This template can be used for all future estimations based on the same parameters. Data entry is designed around the HPLC-based product analysis which we have developed. However, since the calculations are based on reaction product ratios, not the absolute value or the type of value (densitometry, absorbance or fluorescence units, for example), any numerical data accurately reflecting reaction product amounts can be used. Dialogs permit the entry of data for native, competitor and heteroduplex products because, in non-denaturing separation Natiue Size What is the size (bp) of gour native PCR product? systems, we believe that heteroduplexes must always be assumed to be present and the information contained in the heteroduplex product is essential to accurate calculation since the only occasion when the heteroduplex will not disturb the reaction product ratio is when the native and competitor transcripts fortuitously occur in exactly the same abundance in the reaction tube (Figure 3). For titration analysis, the plug-in module makes use of the data analysis toolpack available in a full Excel installation. The presence of this toolpack is verified by the Q-RT-PCR add-in module. The data analysis toolpack provides the program with a number of statistical analysis tools. Those for regression analysis are employed by the Q-RT-PCR add-in module. In addition to performing the regression analysis and displaying the statistics associated with it in a new spreadsheet, the Q-RT-PCR module also plots the titration line and displays the slope and other properties of the best fit line. This facility provides an opportunity to assess whether a newly developed competitive system is behaving in accordance with the requirements of equation (5) above, thereby permitting simplification of the system to a single tube assay. Data entry in both titration and single tube assay systems provides an immediate visual feedback by requesting verification that the numbers entered are correct and as intended. This reduces error and minimizes time spent locating and correcting erroneous data in completed spreadsheets. In single tube format, the estimation system completes a single row of the spreadsheet for each sample. The estimated initial copy number of the target is converted to molecules of RNA transcript from the information provided about the size of the competitor transcript in base pairs and from the amount of competitor transcript added to the reaction (derived from UV spectroscopy of the purified RNA transcript). Sample identification is requested and entered on the spreadsheet. The final element of the initial dialog permits editing of spreadsheets to allow for data correction or revision. In the s Competitor Size What is the size (bp) of your competitor PCR product S C o m p e t i t o r Transcript jiii What is the length (bp) of gour competitor transcription template measured from the RNA polgmerase start site to the restriction digest site where RNA transcription ends? Fig. 2. Information concerning the size of the PCR reaction products and the competitor RNA transcript is collected in response to dialogs. 589
P.A.Doris, A.Hayward-Lester and J.K.Hays Sr RT-PCR quantification of rat cyclophilin-like protein from kidney RNA. Analysis of reaction product amounts was by HPLC separation and UV detection. MIMUI I'lil.lUULI U l L ' j u i l l l i l -.-. Pleaae enter the value for the cell in column NATIVE PRODUCT ( a m units) Discussion The principal advantage of the software is that it performs a complete data analysis of competitive RT-PCR reactions. The mathematical routines involved are constant, so the user needs to know only basic numerical information concerning the size of the reaction products (so that reaction product ratios can be calculated after correction for molar absorbance, fluorescence or densitometric measurement), the amount of competitor added to each reaction, the size of the competitor transcript (so that the amount of competitor added in fg can be used to calculate the number of molecules of native transcript present in the sample) and the relative amount of each reaction product formed. Cross-platform compatibility is another important attribute. We hope that this software will be useful to many colleagues engaged in gene quantitation by competitive RT-PCR. The software is being freely distributed (by Internet) for non-profit use. It can be downloaded from http://www.grad.ttuhsc.edu/archive/. A brief summary in both HTML and Microsoft Word 6 format of the installation and use of the software is also located at this website. Please enter the value for the cell i n column COMPETITOR PRODUCT (area units) ' W H O PRODUCT l a t e a u n f-'lea'>" enter the value for the cell i n column HYBRID PRODUCT (area units) IMUIISI LUMPUITOR ( f ( iiue for tnecell in k'uw : column AMOUNT COMPETITOR (fg) Fig. 3. Reaction product accumulation (determined by HPLC as area units) is entered for each close level of competitor in response to dialogs (shown). Acknowledgements single tube assay mode, each row of data is recalculated immediately; in titration mode, the regression analysis is updated after all revisions have been made. A sample titration analysis is included in Figure 4. The data used to create this figure are real data obtained in competitive We are grateful to Chris Palmer for assistance in early development work. Research in this laboratory is supported in part by the NIH (DDK 45534). Workbook 1 i.. 445!10880 138660 251040 24751j 416' 338670 397750 263970 227320 I 64130 161490 169230 190860 221640 134210.4540 186705 201843 220071 357172 311718 488880 715512 0 471000 488017 363510 343028 234132 170750 123186 0 --' 2000 1333 1000 625 500 250 100 433764 446785 333435 314830 214755 184874 112001 - - - 2.60188 2,20863 1.4400 0.88002 0.68804 0.33879 0.15702 0.41631 0.34412 0.16134-0.05508-0.18182-0.47007 0.80157 3.30103 3 12483 3 2 70588 2.60887 2 30704 2-2.84170 : rut correct co AMOUNT finjlcorrtcntio nits (area unrts(fj) ( j r t j units) Jogritio 684.801 3402078 IjogstjndsrJogunknovt amountunl amount unknown j (fg) molacules t Fig. 4. (A) Sample of data obtained during competitive RT-PCR after entry into a titration template. (B) Automated statistical analysis (performed by the macro) of the titration using data in (A). (C) Automated plot of titration (performed by the macro) using data in (A). 590
Q-RT-PCR References RNA processing and genomic polymorphisms using reversed-phase HPLC. In Ferre.F. (ed.). Gene Quantification. Birkhauser, Boston. Hayward-Lester,A., Oefner,P.J., Sabatini.S. and Dons.P.A. (1995) Hayward-Lester.A., Oefner.P.J. and Doris.P.A. (1996) Rapid quanti- Accurate and absolute quantitative measurement of gene expression fication of gene expression by competitive RT-PCR and ion-pair by single tube RT-PCR and HPLC. Genome Res., 5, 494-499. reversed-phase HPLC. BioTechniques, 20, 250-257. Hayward-Lester.A., Chilton.B.S., Underhill,P.A., Oefner,P.J. and Raeymaekers,L. (1993) Quantitative PCR: theoretical considerations Doris.P.A. (1997) Quantification of specific nucleic acids, regulated with practical implications. Anal. Biochem., 214. 582-585. 591