A Statistical Analysis of Rescreening Alarms

Size: px
Start display at page:

Download "A Statistical Analysis of Rescreening Alarms"

Transcription

1 Alan R. Liss, Inc. Cytometry (1986) A Statistical Analysis of Rescreening Alarms in a Population of Normal and Abnormal A Gynecologic Specimens' L.L. Wheeless, R.D. Robinson, C. Cox, T.K. Berkan, and J.E. Reeder Analytical Cytology Unit, Department of Pathology (L.L.W., R.D.R., T.K.B., J.E.R.) and Division OfBiostatistics (C.C.), University of Rochester Medical Center, Rochester, New York Received for publication March 25,1985; accepted September 18, 1985 A multidimensional slit-scan flow system has been developed to serve as an automated prescreening instrument for gynecological cytology. Specimens are classified abnormal based on the number of cells having elevated nuclear fluorescence (alarms). An alarm region in a bivariate histogram of nuclear fluorescence versus nuclear-to-cell-diameter ratio is defined. Alarm region probability arrays are calculated to estimate the probability that an alarm falling in a particular bin of the alarm region is either from a normal or an abnormal specimen. From these arrays, a weighted alarm index is generated. In addition, summary indices are derived that measure how the distribution of alarms in each specimen compares with the average distributions for the normal and abnormal speci- A multidimensional slit-scan system has been developed to serve as an automated prescreening instrument for gynecological cytology (2). This instrument provides low-resolution morphological information in three dimensions on cells in flow in addition to quantitative fluorescence features. It has displayed great sensitivity both in its ability to recognize abnormal cells and in dealing with cellular false alarms (misclassification of a normal cell or event as abnormal). Potential causes of such events in an automated flow system include overlapping cells, multinucleated cells, artifacts, and cells with improper orientation (9). A 2-year single blind clinical study was carried out to evaluate system performance (10). Cellular material was collected by scraping the uterine cervix and was stained in suspension with acridine orange. Seven hundred and forty specimens including 156 abnormal specimens representing a broad spectrum of gynecological abnormality were analyzed. Approximately 50,000 cells were analyzed for each specimen. Nuclear fluorescence versus nuclear-to-cell-diameter (N/C) ratio feature space plots were generated for each specimen during analysis (Fig. 1). An abnormal cell decision boundary was set to define men populations. These indices together with current features are evaluated with respect to their utility in specimen classification using a nonparametric classification technique known as recursive partitioning. Resulting classification trees are presented that suggest information in the distribution of alarms in the bivariate histogram. In addition, they validate the features and rules currently used for specimen classification. Recursive partitioning appears to be useful for multivariate classification and is seen as a promising technique for other applications. Key terms: Slit-scan, flow cytometry, specimen classification, multiparameter analysis, gynecologic cytology, prescreening _._ -_ ~ ~"--- an alarm region. This decision boundary setting was based on the mean nuclear fluorescence of the intermediate squamous cell population (8). Cells with nuclear fluorescence above the decision boundary were defined as alarms. Normal specimens contained only false alarms, whereas abnormal specimens contained true alarms (abnormal cells) and false alarms. Specimens were classified abnormal based on the total proportion of alarms. In addition, specimens were classified inadequate based on a ratio of cells in regions N1 and N2 (NU N2). N1 is the region of the feature space below the decision boundary having an N/C ratio of less than 50%. N2 is the region below the decision boundary having an N/C ratio of 50% or greater. Region N1 has been shown through static cell instrument analyses to contain mainly intermediate and superficial epithelial cells, 'This work was supported by the National Cancer Institute undcr Grant CA30582.

2 206 WHEELESS ET AL. This paper presents the results of a study of the distribution of alarms for normal and abnormal specimens in the feature space of nuclear fluorescence versus nuclearto-cell-diameter (N/C) ratio through the use of descriptive arrays. New features, called alarm array indices, were calculated, and a nonparametric classification technique, known as recursive partitioning, was used to evaluate these and other features derived from the specimen data base. FIG 1. Isometric projection of two-dimensional feature space plot of nuclear fluorescence versus nuclear-to-cell-diameter ratio. Data from multidimensional slit-scan analysis of approximately 50,000 cells from a specimen containing cells derived from carcinoma-in- situ. Abnormal cell decision boundary depicted as a solid line. whereas region N2 contains primarily polymorphonuclear leukocytes and stripped nuclei (8). Normal specimens were instrument-classified as inadequate if less than 17% of the total number of cells were contained in region N1. The system false-positive rate was 17.6% while the false-negative rate was 2.8%, based on the current data base of 740 specimens. Approximately 6% of the specimens were instrument classified as inadequate. All misclassified abnormal specimens contained cells with morphological changes consistent with slight dysplasia of nonkeratinizing type. While these results were extremely encouraging, there was a desire to critically evaluate current specimen classification procedures and features, and to seek additional information that might be contained in the feature space of nuclear fluorescence versus nuclear-to-cell-diameter ratio. Toward this goal, a statistical study was undertaken to examine data from the single-blind clinical study. The objectives of this study were to 1) evaluate current classification features, 2) improve instrument performance in detecting abnormal cells, and 3) derive additional biological information from the distribution of cells in the two-dimensional feature space. A fundamental question was whether alarms were simply rare events or whether they contained meaningful biological information in their distribution. Castleman and White in a series of papers (4-6) considered optimum performance of specimen classification schemes based on a cell classifier that classified individual cells with fixed false-positive and false-negative error rates. Implicit in their scheme was the assumption that cells are clearly normal or abnormal and that an individual cell could be classified independently of the specimen from which it came. Such assumptions clearly do not hold or a specimen classification scheme in which the abnormality of a cell was based on its relationship to other subpopulations of cells in the specimen. For this study, a statistical approach was taken to this complex classification problem. MATERIALS AND METHODS Specimen Collection, Preparation, and Analysis Techniques for collection, preparation, and staining of human gynecological material follow those reported previously (3). Cellular material was collected by scraping the uterine cervix with a plastic spatula with subsequent suspension in a preservative solution. Cell dispersal was accomplished by syringing (7). Cells were stained in suspension utilizing a 0.01% acridine orange solution. All specimens were analyzed on a multidimensional slit-scan (MDSS) flow system (2). Nuclear fluorescence, nuclear size, and nuclear-to-cell-diameter ratio were used in real time for cell classification. These data were also stored for subsequent re-analysis. Numerous other lowresolution morphological features were used for realtime recognition of false alarms. These other features were not stored. Specimen Data Base Data from 251 specimens were used for these studies. This represents data from the most recent year (1983) of the single-blind clinical study, enriched for abnormal specimens by adding abnormal specimens analyzed in This data base included 166 (66%) normal and 85 (34%) abnormal specimens. The abnormal specimens included 30 specimens containing cells derived from carcinoma-in-situ and dysplasia, 46 specimens containing cells derived from invasive squamous cell carcinoma, and nine specimens containing cells derived from uterine adenocarcinoma. Specimens diagnosed as dysplasia included those with cells derived from slight, moderate, and marked dysplastic lesions. Data were recorded in list-mode files. Nuclear fluorescence, nuclear size, and nuclear-to-cell-diameter ratio were recorded for each of the approximately 50,000 (median 49,949 with 25% below 48,295 and 25% above 50,006) cells of each specimen. Alarm Region A horizontal abnormal cell decision boundary (constant value of nuclear fluorescence) was set to define an alarm region in the feature space of nuclear fluorescence versus nuclear-to-cell-diameter ratio (Fig. 1). The alarm region was defined as the subset of the feature space above the abnormal cell decision boundary. The placement of the decision boundary was established through static cell instrument studies. These studies demonstrated that the best internal reference for positioning the decision boundary was the mean nuclear fluorescence of the normal intermediate squamous cell popula-

3 tion. This population was always present in adequate specimens and was easily identified in the feature space. A factor of 2.5 times the mean nuclear fluorescence of this population was documented to be useful (in terms of specimen classification) in setting the abnormal cell decision boundary (8). Alarm Region Probability Array Composite 64- x -64-element alarm region bivariate histograms were created for both the normal and abnormal specimens of the data base. Each specimen making up the composite was normalized for total number of cells. Only cells falling in the region of the feature space having nuclear fluorescence from 0.8 of the abnormal cell decision boundary to 3.0 times the decision boundary were included. The horizontal axis remained the nuclear-to-cell-diameter ratio. Each array was then normalized by the number of specimens to create the alarm region probability array (ARPA). These arrays estimated the probability that an alarm falling in a particular bin of the alarm region was from either a normal or abnormal specimen. A weighted alarm index for each specimen was produced by multiplying the number of counts in a particular alarm region bin by that bin s appropriate probability value for all the cells in a specimen and summing. Ratio Array A second index was based on a ratio of the abnormal probability to the total probability (abnormal plus normal) within each bin. Each bin in the alarm region bivariate histogram was assigned a value, R, such that: R = (abnormal + K)/(abnormal + normal + 2K) where abnormal and normal denote that bin s value in the abnormal and normal probability arrays, respectively. The parameter K is a sensitivity constant. For these studies, K was set to 10 exp-6. This value was empirically set to smooth out bins having very few events while preserving the information in the array. Smoothing of the ratio array (RA) was performed using a 5- x -5-point moving average. Alarm Array Indices A number of alarm array indices were calculated using normal, abnormal, and ratio arrays. These indices together with current features were then evaluated with respect to their utility in specimen classification. A complete listing of all features is presented in Table 1. A nonparametric classification technique known as recursive partitioning was employed to select the best features for specimen classification. This approach was chosen first because of its nonparametric nature. Originally, a standard stepwise logistic discrimination analysis was used. This did not provide adequate discrimination between the two specimen classes. Therefore, an approach was chosen that made fewer parametric assumptions. Secondly, the basic idea of recursive parti- STATISTICAL ANALYSIS OF PRESCREENING ALARMS 207 Table 1 Classification Features Alarm rate-the percent of cells in a specimen having nuclear fluorescence values above the abnormal decision boundary NUN2 ratio-the ratio of the number of cells in a specimen below the abnormal cell decision boundary having a nuclear-to-cell-diameter ratio less than 50% to those below the decision boundary having a ratio greater than 50% HiN (number of high N size cellskthe number of cells in a specimen having a nuclear size greater than 42 units HiNAlarms-the number of cells in a specimen above the abnormal decision boundary whose nuclear size is greater than 42 units ABNPRB (abnormal ProbabilitykThe summation of the alarms for each specimen, each multiplied by the value of the abnormal ARPA for that particular bin of the alarm array NORPRB (normal probabilitykthe summation of the alarms for each specimen, each multiplied by the value of the normal ARPA for that particular bin of the alarm array ABNRAT (abnormal ratio indexkthe summation of the alarms for each specimen, each multiplied by the value of the ratio array for that particular bin of the alarm region tioning is to choose cutpoints for individual features. This is quite similar to the use of a decision boundary based on nuclear fluorescence. Recursive Partitioning Recursive partitioning is a classification technique that is based on the concept of finding the cutpoint for a feature that hest separates the two populations (i.e., a decision boundary) (1). For multivariate data, the technique is applied recursively (hence the name). That is, given a number of features measured on individuals in each of two populations (e.g., normal and abnormal specimens), the method first calculates an optimal cutpoint for each feature and then chooses the feature that gives the best separation. The data are then divided according to this cutpoint. Following this division, the feature selection algorithm is again applied to each remaining subset of the data. This process is continued as long as sufficient data remain. The result of this recursive splitting of the data may be summarized by a tree whose nodes are identified with cutpoints of successively chosen features. To eliminate overfitting, this full tree is then pruned to a best tree using a cross-validation method. The final classification rule is obtained by identifying each of the terminal nodes of this best tree with one of the two populations. RESULTS AND DISCUSSION Alarm region probability arrays are created from the specimen data base. The normal ARPA is depicted in Figure 2 and the abnormal ARPA in Figure 3. Each is presented as a contour plot and a three dimensional (isometric) projection. These distributions represent the probability of an alarm coming from an individual normal or abnormal specimen. From these arrays, two new features are derived for each specimen. The abnormal probability index (ABNPRB) is a summation of the

4 208 WHEELESS ET AL. A 3 z A NUCLEAR-TO-CELL-DIAMETER RATIO K g z FIG. 2. Normal specimen alarm region probability array. A. Contour plot. B. Three-dimen sional (isometric) projection. FIG. 3. Abnormal specimen alarm region probability array. A. Contour plot. B. Tbree-dimensional (isometric) projection. A NUCLEAR-TO-CELL-DIAMETER RATIO w 1 alarms for each specimen, each multiplied by the value of the abnormal ARPA for the corresponding bin of the alarm region. Likewise, the normal probability index (NORPRB) is the same calculation using the normal ARPA. It was anticipated that these weighted alarm count features would provide better classification of normal and abnormal specimens than a simple summing of alarms. The ratio array for the data base is presented in Figure 4. The ratio equals 0.5 where no information exists. The sensitivity constant K smooths very small bin val- ues that would drive the ratios toward extremes (infinity or zero), and instead pushes them toward 0.5. Hence, ratio values greater than 0.5 indicate a greater likelihood of specimen abnormality and values less than 0.5 indicates greater likelihood of normality. This plot suggests the abnormal cell decision boundary might be lowered, or made nonlinear, to provide greater sensitivity in the detection of abnormal cells. Additional features are calculated from the ratio arrays. The abnormal ratio index (ABNRA'I') is the summation of the alarms for each specimen, each multiplied

5 STATISTICAL ANALYSIS OF PRESCREENING ALARMS 209 NUCLEAR-TO-CELL-DIAMETER RATIO t FIG. 4. Ratio array for flow data base. A. Contour plot. B. Three-dimensional (isometric) projection. Ratio values above the 0.5 plan indicate a greater likelihood and abnormality. by the value of the ratio array for the corresponding bin of the alarm region. This index is calculated for a ratio array including all abnormal specimens, In addition, a ratio array is calculated for each class of abnormality and an abnormal ratio index generated for each specimen in each class. Recursive partitioning is used to evaluate all features from the specimen data base (Table 1). The resulting classification tree is presented in Figure 5. This tree is derived for a relatively high loss factor (20:l). The loss factor is a measure of the loss incurred by a false-negative specimen. The greater the loss factor, the more the recursive partitioning technique will reduce the number of false-negative specimens at the expense of an increase in the specimen false-positive rate. In effect, this permits selection of an operating point on the receiver operating characteristic (ROC) cme. The dotted line separates nodes of the tree that are pruned to form a best tree using a cross-validation methodology (1). This prevents over-fitting of the data and eliminates variables unlikely to be useful in classification of future specimens. Figure 6 depicts the best tree for a lower loss factor (6.7:l). Given the features in Table 1 for evaluation, the recursive partitioning technique selected the features of abnormal probability (ABNPRB), alarm rate, Nl/N2 ratio, number of high N size cells (HiN), and number of high N size cells in the alarm region (HiNAlarms) for use in classsifying specimens. Both the tree structure, features selected, and cutpoints on individual features clearly depend on the loss factor. Empirically, it appears that the higher the penalty for a false-negative specimen, the more vigorously the tree is pruned and the simpler the final tree structure. Finally, the tree diagram of the current classification scheme used in the clinical study is presented in Figure 7. For the high loss factor (Fig. 51, it is significant that the two features selected for the best tree are the same two features arrived at in early static cell instrument 186 NORMALS 85 ABNORMALS FIG. 5. A classification tree based on recursive Partitioning analysis with high IOSS factor. The dotted line separates nodes of the tree that are pruned to form a best tree using a cross validation methodology. The pruned subtree becomes a single node (abnormal).

6 210 WHEELESS ET AL, 166 NORMALS FIG. 6. A classification tree baaed on recursive partitioning analysis with low loss factor. This tree has been pruned to form a "best" tree. FIG. 7. A tree diagram of the current classification scheme for prescreening gynecologic specimens on the multidimensional slit-scan system. For this study, an inadequate specimen is considered to be abnormal. Table 2 Specimen Classification Errors* No. of No. of false-positive specimens % error false-negative specimens % error High loss factor (20:l) Entire tree Best tree Low loss factor (6.7:l) Best tree Current classification tree *Data base of 251 specimens including 166 (66%) normal and 85 (34%) abnormal specimens. studies and used for specimen classification in the current clinical study. Although the cutoff points are slightly different, this verifies the features currently used for specimen classification. ABNPRB is selected for use but pruned off when reducing to a best tree. For the lower loss factor, ABNPRB is selected as the primary classification feature with alarm rate and N1/ N2 ratio selected next. This again points to the utility of alarm rate and NUN2 ratio in specimen classification. Importantly, the primary selection of ABNPRB as a classification feature suggests there is information in the distribution of cells in the alarm region of the feature space. Specimen classification error rates for each tree are presented in Table 2. The current classification tech- nique results in 26 false-positive specimens and five false-negative specimens. The best tree for a high loss factor reduces the number of false-negative specimens to 1 at the expense of an increase in the number of falsepositive specimens to 40. By reducing the loss factor, the number of false-positive specimens is reduced to 13, and the number of false-negative specimens is increased to eight. Clearly, the recursive partitioning technique is useful in selecting an optimal classification tree for a given loss factor. However, missing from this analysis is a consideration of the classification of false-negative specimens. The five false-negative specimens resulting from the current classification tree'are tolerable because the majority of the specimens contain cells derived from slight dysplasia.

7 STATISTICAL ANALYSIS OF PRESCREENING ALARMS 211 Preliminary studies have been initiated to include abnormal specimen classification in the recursive partitioning analysis by considering abnormality as a graded response. Recursive partitioning can be used to do regression analysis in the sense it will predict a continuous variable as a function of several independent variables. In a preliminary study, specimens are separated into three classes: normal, slight dysplasia, and abnormal. Each class is assigned a loss factor. The resulting recursive partitioning regression analysis selects ABNPRB as the primary prediction feature with alarm rate and NUN2 ratio as the next two. The cutpoints are approximately the same as those used in the current classification tree. In conclusion, these studies indicate information in the distribution of alarms in the feature space. It is recognized, however, that there is variability between specimens that tends to mask this information. Secondly, this work validates the features and rules currently used for specimen classification. Recursive partitioning is seen as a useful nonparametric technique for multivariate classification in this and other applications. LITERATURE CITED 1. Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. Wadsworth International Group, Belmont, California, Camhier JL, Kay DB, Wheeless LL: A multidimensional slit-scan flow system. J. Histochem Cytochem 27: , Cambier MA, Wheeless LL, Patten SF A post-staining fixation technique for Acridine Orange: Quantitative aspects. Anal Quant Cytol 1:57-60,1979, 4. Castleman KB, White BS: The tradeoff of cell classifier error rates. Cytometry 1: , Castleman KR, White BS: Optimizing cervical cell classifiers. Anal Quant Cytol2: , Castleman KR, White BS: The effect of abnormal cell proportion on specimen classifier performance. Cytometry 2: , Mead JS, Horan PK, Wheeless LL Syringing as a method of cell dispersal. Acta Cytol22:86-90, Wheeless LL, Patten SF, Onderdonk MA: Slit-scan cytofluorometry: Data base for automated cytopathology. Acta Cytol 19: , Wheeless LL, Cambier JL, Camhier MA, Kay DB, Wightman LL, Patten SF: False alarms in a slit-scan flow system: Causes and occurrence rates. Implications and potential solutions. J Histochem Cytochem 27: , Wheeless LL, Lopez PA, Berkan TK, Wood JCS, Patten SF: Multidimensional slit-scan flow prescreening system: Preliminary results of a single blind clinical study. Cytometry 5:l-8, 1984.