Diagnosis of epithelial mesothelioma using tree-based regression analysis and a



Similar documents
20 Diagnostic Cytopathology, Vol 36, No 1 ' 2007 WILEY-LISS, INC.

Practical Effusion Cytology

Diagnosis of Mesothelioma Pitfalls and Practical Information

Update on Mesothelioma

MALIGNANT MESOTHELIOMA UPDATE ON PATHOLOGY AND IMMUNOHISTOCHEMISTRY

MALIGNANT MESOTHELIOMA UPDATE ON PATHOLOGY AND IMMUNOHISTOCHEMISTRY

Académie internationale de Pathologie - Division arabe XX ème congrès novembre 2008 Alger. Immunohistochemistry in malignant mesotheliomas

The Value of Thyroid Transcription Factor-1 in Cytologic Preparations as a Marker for Metastatic Adenocarcinoma of Lung Origin

A Cytokeratin- and Calretinin-negative Staining Sarcomatoid Malignant Mesothelioma

Case of the. Month October, 2012

A 70-year old Man with Pleural Effusion

Cytopathology Case Presentation #8

Notice of Faculty Disclosure

The develpemental origin of mesothelium

Outline. Workup for metastatic breast cancer. Metastatic breast cancer

Seattle. Case Presentations. Case year old female with a history of breast cancer 12 years ago. Now presents with a pleural effusion.

264 Diagnostic Cytopathology, Vol 38, No 4 ' 2010 WILEY-LISS, INC.

Diagnostic Challenge. Department of Pathology,

Anatomic Pathology / PERITONEAL MESOTHELIOMA AND SEROUS CARCINOMA

Disclosures. Learning Objectives. Effusion = Confusion. Diagnosis Of Serous Cavity Effusions - Beware The Mesothelial Cell!

PATHOLOGY OF THE PLEURA: Mesothelioma and mimickers Necessity of Immunohistochemistry. M. Praet

3-F. Pathology of Mesothelioma

How To Test For Cancer

How To Use Calretinin

HKCPath Anatomical Pathology Peer Review and Scores : PDF version for download

Recommendations for the Reporting of Pleural Mesothelioma

Immunohistochemistry on cytology specimens from pleural and peritoneal fluid

Protocol applies to all primary borderline and malignant epithelial tumors, and malignant mesothelial neoplasms of the peritoneum.

Cytology : first alert of mesothelioma? Professor B. Weynand, UCL Yvoir, Belgium

The Use of Immunohistochemistry to Distinguish Reactive Mesothelial Cells From Malignant Mesothelioma in Cytologic Effusions

Immunohistochemical differentiation of metastatic tumours

Effusions: Mesothelioma and Metastatic Cancers

Ovarian tumors Ancillary methods

How To Distinguish Between A Metastatic Renal Cell Carcinoma And A Diffuse Malignant Mesothelioma

How To Diagnose And Treat A Tumour In An Effusion

The diagnostic usefulness of tumour markers CEA and CA-125 in pleural effusion

Malignant Mesothelioma Electron Microscopy

Distinguishing benign from malignant mesothelial

Pleural Mesothelioma: Diagnostic Problems and Evaluation of Prognostic Factors

Novocastra Liquid Mouse Monoclonal Antibody CD141 (Thrombomodulin)

Protocol for the Examination of Specimens From Patients With Tumors of the Peritoneum

Malignant Mesothelioma in Body Fluids - with Special Reference to Differential Diagnosis from Metastatic Adenocarcinoma -

Pleural Mesothelioma: An Institutional Experience of 66 Cases

PRIMARY SEROUS CARCINOMA OF PERITONEUM: A CASE REPORT

Lymphohistiocytoid Mesothelioma An Often Misdiagnosed Variant of Sarcomatoid Malignant Mesothelioma

Uses and Abuses of Pathology in Asbestos-exposed Populations

Product Datasheet and Instructions for Use

DESMOPLASTIC SMALL ROUND CELL TUMOR: A RARE PATHOLOGY PUZZLE

MOC-31 Exhibits Superior Reactivity Compared With Ber-EP4 in Invasive Lobular and Ductal Carcinoma of the Breast. A Tissue Microarray Study

Pathology of lung cancer

Cytopathology of Pleural Mesotheliomas

Video Microscopy Tutorial 5

Today s Topics. Tumors of the Peritoneum in Women

Case based applications part III

Case presentation. Awatif Al-Nafussi

Gerry Hobbs, Department of Statistics, West Virginia University

Ep-CAM/Epithelial Specific Antigen (MOC-31)

H istochem istry in the Diagnosis o f M alignant M esotheliom a *

Fine Needle Aspiration Cytologic Features of Well-Differentiated Papillary Mesothelioma in the Pleura

11. Analysis of Case-control Studies Logistic Regression

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Lung Carcinomas New 2015 WHO Classification. Spasenija Savic Pathology

Changes in Breast Cancer Reports After Second Opinion. Dr. Vicente Marco Department of Pathology Hospital Quiron Barcelona. Spain

Intraobserver and Interobserver Reproducibility of WHO and Gleason Histologic Grading Systems in Prostatic Adenocarcinomas

Malignant Mesothelioma Diagnosed by Bronchoscopic Biopsy

WORKPLACE SAFETY AND INSURANCE APPEALS TRIBUNAL DECISION NO. 1557/14

D-optimal plans in observational studies

Materials and Methods Specimens

Renal Cell Carcinoma: Advances in Diagnosis B. Iványi, MD

ISOLATED PANCREATIC METASTASIS OF A MALIGNANT PLEURAL MESOTHELIOMA

The Role of Genetic Testing in the Evaluation of Thyroid Nodules. Thyroid Cancer and FNA. Thyroid Cancer. Pure Follicular Cancers.

HER2 Status: What is the Difference Between Breast and Gastric Cancer?

INTERNATIONAL ASSOCIATION FOR THE STUDY OF LUNG CANCER Prospective Mesothelioma Staging Project

Fact sheet 10. Borderline ovarian tumours. The difficult cases. What is borderline ovarian cancer (BOC)?

The Diagnosis of Cancer in the Pathology Laboratory

The role of immunohistochemical evaluation in the diagnosis of malignant mesothelioma of the pleura

Carcinosarcoma of the Ovary

Case Report Epithelioid malignant mesothelioma presenting with features of gastric tumor in a child

Protocol for the Examination of Specimens From Patients With Malignant Pleural Mesothelioma

Comparison of tumour markers in malignant

Role of immunohistochemistry

General Rules SEER Summary Stage Objectives. What is Staging? 5/8/2014

TUMORS OF THE TESTICULAR ADNEXA and SPERMATIC CORD

International Journal of Case Reports in Medicine

Transcription:

Diagnosis of epithelial mesothelioma using tree-based regression analysis and a minimal panel of antibodies, Running title: Tree regression analysis for mesothelioma SONJA KLEBE*^, MARKKU NURMINEN, JAMES LEIGH AND DOUGLAS W. HENDERSON* *Department of Anatomical Pathology, Flinders Medical Centre, Adelaide, South Australia Finnish Institute of Occupational Health, Helsinki, and Department of Public Health, University of Helsinki, Finland, Centre for Occupational and Environmental Health, School of Public Health, University of Sydney, New South Wales, Australia ^To whom correspondence should be addressed

2 SUMMARY AIMS: Immunohistochemistry with panels of antibodies is standard procedure to distinguish between malignant mesothelioma and metastatic adenocarcinoma. Most studies assess only the sensitivity and specificity for single antibodies, even when the paper concludes by recommending an antibody panel. It was the aim of this study to use a novel statistical approach to identify a minimal panel of antibodies, which would make this distinction in the majority of cases. METHODS: Two hundred consecutive cases of pleural malignancy (173 pleural mesotheliomas of epithelial type and 27 cases of secondary adenocarcinoma) were investigated using a standard panel of 12 antibodies (CAM5.2, CK5/6, calretinin, HBME-1, thrombomodulin, WT-1, EMA, CEA, CD15, B72.3, BG8 and TTF-1). Regression and classification tree-based methods were applied to select the best combination of markers. The modelling procedures used employ successive, hierarchical predictions computed for individual cases to sort them into homogeneous classes. RESULTS: Labelling for calretinin and lack of labelling for BG8 were sufficient for definite correlation with a diagnosis of malignant mesothelioma. CD15 provided further differentiating information in some cases. CONCLUSION: A panel of three antibodies was sufficient in most cases to diagnose, or to exclude, epithelial mesothelioma. Calretinin exhibits the strongest correlative power of the antibodies tested. KEYWORDS: Pleural malignant mesothelioma, antibody, calretinin, panel, metastatic adenocarcinoma, tree regression analysis 2

3 INTRODUCTION The biopsy diagnosis of pleural malignant mesothelioma (MM) can be problematic and requires the use of ancillary techniques more frequently than most other epithelioid tumors. In most laboratories, immunohistochemistry is today the mainstay for the pathological diagnosis of MM. 1-4 Because no single antibody has been identified with 100% sensitivity and 100% specificity for a diagnosis of MM, panels of antibodies that include both positive and negative markers are routinely employed: yet even the most recent and comprehensive of the published studies focus on the sensitivity and specificity of single antibodies investigated independently of the others 4, and numerous recent reviews suggesting various panels of antibodies 5-13 are based on this type of analysis.5, 6, 14 A recent attempt at meta-analysis has been made to provide guidance,9 but because of heterogeneity in the raw data in the individual studies on which the meta-analysis was carried out, the validity of this approach is limited. The same principle applies to the website STATdxPathIq (formerly known as 'Immunoquery'),15 which provides sensitivities and specificities of antibodies, based on published studies, and suggests IHC panels for differential diagnosis based on those data. Only a handful of studies have attempted to use more advanced statistical methods, including logistic regression 1, 2, 16, but the only study attempting to include stepwise logistic regression employed pleural effusion fluids and used a panel of antibodies that included predominantly positive carcinoma markers, relying on lack of labelling for a diagnosis of mesothelioma. 1 In this current study, a panel of mesothelial-related antibodies and carcinoma-related antibodies, as well as two general epithelial antibodies was used (Table 1) and a multivariate 3

4 statistical analysis which involves observation and analysis of more than one variable at a time while taking simultaneously into account the effects of all variates on the endpoint of interest was carried out. Therefore, the present study investigated a comprehensive antibody panel jointly. Constructing regression trees may be seen as a type of variate selection procedure. The aim is to differentiate reliably between pleural epithelial MM and secondary adenocarcinoma affecting the pleura. The objective was to ascertain which specific minimal set of markers proved most reliable in making the distinction between these two malignancies and correlating with a diagnosis of MM. This approach allows a rational decision for a minimal set of antibodies as primary line of investigation, which is known to maximize the diagnostic yield. This may reduce the cost of the investigation and the time required for the pathologist to assess the sections an inconvenient but important rational consideration in a climate of ever increasing pressures associated with time constraints and cost savings. In other words, this approach aims to maximize the number of definitive diagnoses that can be made based on a specific limited panel of antibodies. If no definite diagnosis is reached after using the specified antibody panel, further immunohistochemical studies or other ancillary studies including electron microscopy may then be carried out as necessary. MATERIALS AND METHODS Histological Samples A data set of 200 consecutive cases of pleural malignancy, for which there were comprehensive immunohistochemical data, was investigated. The cases were sourced from routine surgical specimens submitted to the Department of Anatomical Pathology at Flinders Medical Centre, South Australia, and referral cases to one of the authors (DWH). There were 4

5 173 cases of definite epithelial MM and 27 cases of secondary adenocarcinoma. The bias of mesothelioma cases was due to the nature of our referral practice. This was taken into account in the statistical modelling approach (vide infra). The diagnosis of epithelial MM was defined according to currently accepted morphological criteria, 11, 17-22 including the radiological findings, histological appearances, mucin histochemistry, immunohistochemistry and, in many instances, electron microscopy including ultrastructural examination of deparaffinized tissue when available and when there remained any doubt about a definite diagnosis. EM was used in particular when the histological findings were atypical and/or whenever immunohistochemistry yielded one major discordant result (e.g., positive labelling with B72.3 in a suspected MM) or two or more minor discordancies (e.g., equivocal staining for B72.3 and CD15 in a suspected MM). All tissues had been fixed in 10% buffered formalin and had undergone standard processing and embedding in paraffin wax. Sections were cut 4µm thick, deparaffinized and rehydrated, if blocks had been received. For some referral cases, only unstained spare slides were received (instead of a paraffin block) and immunohistochemical studies were performed on those cases as necessary. Some of the referral cases were sent including the immunohistochemical preparations that were done in the source laboratory, but with insufficient spare slides and no paraffin blocks, such that only limited immunohistochemical studies could be carried out. This accounts for the missing data (vide infra). The panel of antibodies used included the low molecular weight cytokeratin antibody CAM5.2 as a general epithelial marker, antibodies against epithelial membrane antigen (EMA), five mesothelial cell markers and five carcinoma-related markers (Table 1). Antigen retrieval was individualized for each antibody, and incubation with all primary antibodies was overnight (see Table 1 for details of dilutions of antibodies, source and antigen retrieval used). For the 5

6 first 69 cases the streptavidin-biotin-peroxidase complex method was used (Ultra Strepatavidin Detection System, Signet Laboratories) as a detection system, whilst for the remaining cases the Dako Cytomation EnVision + Dual Link System (DAKO) was used. This system was introduced into the Department because it reduces problems with background staining relating to endogenous biotin, particularly in small biopsies. The primary antibodies were still used at the same dilution as before and the retrieval systems remained unchanged. All sections were assessed independently by two investigators (SK and DWH) and immunohistochemical labelling was scored on a 3-point ordinal scale: positive labeling = 1, equivocal labeling = 0.5, no labeling = 0. Equivocal labelling was defined as positive labelling in less than 2% of tumour cells or when it was uncertain whether trace staining represented genuine labelling or simply high background staining. Statistical methods Regression and classification tree-based modelling was applied in the statistical data analysis. This is an explanatory technique for uncovering structure in data, useful for both diagnostic and prognostic regression problems. 23, 24 In growing a regression tree, the process is continued until the terminal node is homogeneous enough or it contains too few cases ( 5 by default). The model is fitted using a binary recursive partitioning algorithm, whereby the data are successively split along coordinate axes of the predictor variates (presence of labelling with different antibodies) so that, at any node, the split that maximally distinguishes the response variate (diagnosis of MM) in the left and the right branches is selected. The decision to branch was based on whether a so-called deviance statistically exceeded a cut-off value, which was chosen to optimize the modelling. 6

7 Deviance measures the fit of a statistical model to the data when the parameter estimation is likelihood-based i.e., it is a measure of node heterogeneity. The fit is inferred to be good when the deviance equals its expected value which is the number of degrees of freedom (df), that is the number of cases available for estimating the model minus the number of model parameters. The deviance can also be employed to quantitate the significance of individual antibody markers. This is achieved by computing twice the log-likelihood of the ratio of the best model to that of the current model. The modelling approach followed was first to fit an overly complex tree, and then pruning the tree down to a suitable size. The tree-construction process has to be seen as a hierarchical refinement of probability modeling. If the sample numbers are sufficiently large, the study yields unbiased estimates of the disease classification probabilities (and misclassification rates). Any split which did not improve the model fit by a default factor of the complexity parameter was pruned off by (10-fold) cross-validations employed to ensure the numerical stability of the terminal nodes. An attractive property of the method is that at each node of a classification tree there is a probability distribution over the classes. The prediction probability available for each terminal node is constant, but remains dependent on the structure of the tree (viz. its depth). It follows that the interpretation of this probability may not be exactly the same as the one provided by the logistic (discriminant analysis) risk model, which was also fitted to the data. The action taken for handling missing values was to retain cases with partially unavailable marker values (because this most closely represents the clinical situation where, for example in referral practice, complete sets of immunohistochemical studies are not always available). The method used surrogate rules if the splitting variate was unavailable. The strategy was to pass a case down the tree as far as it will go. If it reached a terminal node, a predicted probability 7

8 of 'caseness' (mesothelioma or adenocarcinoma) was computed for it. Otherwise the distribution at the node reached was employed to predict the outcome. This is perhaps the single most useful feature of the applied modeling approach, considering there were generally some (2 to 6%) missing values on the antibody markers. For some markers the number of missing observations was very large (over 70% for TTF-1) so as to render them practically useless for the tree analysis. In the application, the recursive partitioning method was used as implemented in the rpart routine (http://mayoresearch.mayo.edu/mayo/research/biostat/splusfunctions.cmf) of the S- PLUS system.25 We also evaluated the statistical efficacy of the investigated panel of markers. For the malignant mesothelioma markers, we report sensitivity (Se) and specificity (Sp) rates, whereas for the anti-markers or adenocarcinoma markers we use their complement values (i.e., 1-Se and 1-Sp). The cutpoint for a classification was chosen based on the tree regression for the individual markers, with scoring: 1 = positive, 0.5 = equivocal, 0 = negative. The binary cutpoint was decided by the algorithm, that is whether to align the indeterminate score 0.5 with either 0 or 1. In clinical practice, however, clear positive staining is required for a result to be regarded as positive. In situations where this decision resulted in a discrepancy with the algorithmic split, the computations were also done with equivocal staining scored as negative. The clinical approach to diagnosis has the effect of lowering the Se- and raising the Sp-values. RESULTS 8

9 Sensitivity and Specificity Table 2 presents in the conventional way the results of the sensitivities and specificities of the antibodies used. The highest sensitivities of mesothelioma markers were observed for calretinin (Se = 98%), while in the case of adenocarcinoma markers the highest sensitivity was for CEA (1-Se = 100%, indicating that none of the mesotheliomas showed positive labeling for this antigen) and B72.3 (1-Se = 98%). The scoring of equivocal labelling (0.5) as positive altered these results only marginally. Logistic Regression Analysis Table 3 presents the results of fitting a logistic model with mesothelioma as the clinical endpoint. The intercept is the log odds for the subpopulation with all the other terms set equal zero. The null model with only the intercept term had a deviance of 138 on 179 (= 200-20 - 1) df; 20 cases were deleted due to missing values. When this deviance was subtracted from the deviance of the updated model with four additional significant parameters it yielded a residual deviance of 20 on 175 (= 179-4) df. This indicates a very good model fit as the value of the deviance statistic is much less than its expected value, viz. the df. The identified set included the positive marker calretinin plus three carcinoma-related markers which generally fail to label MMs (with negative coefficients, indicating that lack of labeling would support a diagnosis of mesothelioma) for epithelial MM. The change in deviance, that is 138-20 = 118 with four df, proves that the four predictors jointly form a statistically significant panel of antibodies. The sensitivity of the logistic model to predict MM was almost 99%, while the specificity was 88%. Regression and Classification Tree Analyses 9

10 Classification tree is the most common use of tree-based methods whose endpoint is a factor giving the diagnostic categorisation of the patients. It is also possible to construct regression trees in which terminal node gives the predicted (numerical) value. The ideas for classification and regression tree stucture are quite similar, but the terminology differs. The justification for regression analysis is that it serves as the basis for the classification treeconstruction procedure, so we consider it first. Table 4 presents a regression tree-based analysis of antibody combinations for predicting the clinical outcome of the 200 patients (no cases were excluded in the final model because of no missing values in these four markers). At the root (i.e. top node of the tree) the predicted probability of MM (coded as a binary outcome variable) was 86.5%. This percentage represents the actual relative frequency of mesothelioma cases in the combined study series. Starting from this empirical setting, the ranking of the markers in the order of their improvement of the deviance in separate primary splits was the following: 1) calretinin, 2) CD15, 3) BG8, 4) B73.2, and 5) HBME-1. The data partition ended up in four terminal nodes defined by the three markers calretinin, CD15 and HBME-1. To be 99% sure of a mesothelioma endpoint, the model established that the following three conditions were sufficient (splitwise probability in parenthesis): calretinin > 0 (97%), CD15 = 0 (99%). In the leftmost branch, a case correlated almost certainly with a 'not MM' diagnosis, based on the indications of calretinin = 0 and HBME-1 < 1. The prediction based on regression tree was almost the same as with logistic regression. These results can be explained by the fact that calretinin was so important factor that it dominated the role of the other antigens. In a less clear circumstance, the different approaches would probably give a more disrepant outcome. 10

11 Table 5 presents the corresponding classification tree-based analysis for a differential diagnosis. With an equal prior probability distribution (0.5, 0.5) for MM and adenocarcinoma cases (defined as categorical outcome variable), the branching terminated in a structure depicted in Figure 1 with four terminal nodes. In the selection of three markers, the predominant role of calretinin remained the same, but now CD15 complemented BG8, both of which are adenocarcinoma-related markers ('negative' MM markers). The model predicted probability of MM diagnosis based on the three antibodies was estimated to be 100% (correct). We note parenthetically that using the priors proportional to the empirical class counts produced a classification tree very similar to the regression tree of Table 3. Therefore, the bias towards MM cases in our collection of cases did not affect the validity of the modeling process. DISCUSSION Comparison of the findings in this study with those in the published literature Although many studies have evaluated the differential diagnostic value of antibody sets in the differential diagnosis of epithelial MM versus adenocarcinoma, no conclusive shortlist of antibodies has been identified to date, and there is no consensus on the optimal antibody panel. There are numerous problems with the published series available: in the directly comparable series, the case numbers have not always been large. For example, one study that concluded that the sensitivity of the mesothelioma-related antibody HBME-1 for malignant mesothelioma was 100% 26 based its findings on a series of only 17 cases. Other studies have attempted meta-analysis of all published studies, but this approach suffers from the inability to control the quality of the primary data. Such difficulties include the diversity of material included, with cytology specimens, autopsy material and surgically removed tissue being compared indiscriminately. Furthermore, different studies have used different sources 11

12 and dilutions of antibodies, different antigen retrieval techniques and different secondary antibodies. Different researches have followed different criteria for assessing positive labeling, and so forth.9 Some of these difficulties are particularly apparent with two of the antibodies that we found to be valuable in the diagnosis of MM, namely HBME-1 and calretinin. Calretinin: Our results confirm some of the observations made by other investigators. A considerable number of studies over recent years have identified calretinin as a particularly useful marker, with a number of studies identifying 100% of MM cases as positive for calretinin 13, 27-30. This may be in part due the fact that we restricted our carcinoma-group to adenocarcinomas, and we applied strict criteria for assessing positive labelling, in that positive labelling of nuclei was required for a result to be designated as positive, whereas some previous publications have accepted cytoplasmic staining only, resulting in higher numbers of lung adenocarcinomas labelling with calretinin. 2, 30 Like us, other investigators have found high specificity for mesotheliomas, if nuclear labelling was required for a positive result, irrespective of cytoplasmic staining.7 It is worth noting that the same clone of antibody was used for all of our cases (Table 1). Previously, one group31 considered calretinin to be "useless" when a Chemicon guinea pig antibody was used but conceded that, when a different antibody was used, it showed "the highest sensitivity for mesothelial cells."31 The high sensitivity and specificity we found with calretinin may of course in part related to the selection of our cases: some tumours known to show positive labeling for calretinin, such as squamous cell carcinoma of lung origin and which may label for calretinin in up to 40% of cases, 32, 33 were not included in our study which assessed only the value of antibody panels for the distinction between mesothelioma and adenocarcinomas metastatic to the pleura. 12

13 The diagnostic utility of calretinin, amongst other markers, was reviewed in two recent publications 34, 35 which were based essentially on the same published reports and data, and although both studies drew slightly different conclusions in their assessment of diagnostic usefulness of this antibody, they both regarded calretinin as a 'useful' positive mesothelioma marker. Interestingly, one of those studies then concluded by recommending panels of antibodies 'based on their sensitivities and specificities' but, like so many other studies, not taking into account the complex relations between antibody reactions. In contrast to those papers, which attempted to analyze data collected over many years, from many different laboratories and assessed and scored by many different investigators, our study enrolled 173 consecutive cases of epithelial MM and applied a recursive partitioning algorithm to select a probabilistically-founded array of reliable correlative markers for the type of malignancy present. There is only one other study that attempted standard multiple logistic regression analysis, which was performed on various indicator combinations:2. That analysis pinpointed calretinin as an important marker, and can thus be compared to a selected list of markers in our corresponding logistic analysis. It is reassuring that our more sophisticated statistical approach confirms the role of calretinin as a useful positive marker for the differential diagnosis between epithelial mesothelioma and adenocarcinoma. In fact, in our series calretinin emerged as the premier correlate, and this had the effect of rendering marginal the supplementary information provided by the two next-best selected markers BG8 and CD15. BG8 has been found useful in the distinction of adenocarcinoma from epithelioid MM.5, 9, 36 In a study investigating 12 antibodies and using regression analysis, BG8 was found to be one of the three most useful antibodies.7 However, in contrast to our study cytology samples were used, and the statistical approach used also differed from the approach used by us, discussed in detail below, with regards to calretinin. Nonetheless, it is reassuring that others have also proven the value of this antibody. 13

14 Apart from the majority of adenocarcinomas, BG8 labels 80% of squamous cell carcinomas33. It does not label renal cell carcinomas, which were not included in our study, and this may in part account for the high sensitivity and specificity we found with this antibody.37 CD15 has been established for over 20 years as a positive adenocarcinoma marker for the distinction from mesothelioma 38, 39, despite some authors reporting positivity in up to 32% of malignant mesotheliomas.28, 40 Like us, others have found that MM is only rarely positive for CD15 12, 41-43. Again, the relative high sensitivity and specificity for this antibody may be in part related to the fact that we excluded squamous cell carcinomas, which are generally not positive.44 The majority of recent studies 38, 39, 45 have found CD15 to be undetectable in most or all mesotheliomas investigated, while it was found to be expressed in all or most adenocarcinomas, particularly those of pulmonary origin.43 It has been noted that CD15 expression is often focal and that false-negative reactions can be related to small biopsies, and the high sensitivity and specificity seen in our study may be repeated to the fact that our study consisted predominantly of larger video-assisted thoracoscopy (VAT) biopsies. 45 Interestingly, the one study that used logistic regression on a panel that included CD15 found other antibodies more suitable for the diagnosis of adenocarcinoma, despite high sensitivity and specificity of CD15.7 As mentioned above, that study used cytology samples and used an antibody panel, which was heavily weighted towards carcinoma-related markers with only one mesothelial-related antibody (EMA with a membranous pattern of staining). In addition, the statistical approach used also differs from the approach used by ourselves: the authors applied a stepwise discriminant logistic regression analysis to select the 'parameters' of the model function for a differential diagnosis between mesothelioma and adenocarcinoma cases, 14

15 which is essentially a backwards-selection approach. In contrast, our study used a forward variable selection approach and based the intercept on the distribution of diagnoses in our collection of cases. Particular versus General Case Series Our alternative classification or regression tree-based analysis began at the root of the tree with an a priori probability of a case being an epithelial mesothelioma case at 0.865. From this starting point, the graph set forth a hierarchical construct for the correlative values of the antibody probe markers calretinin, CD15, and BG8 or HBME-1. However, the prevalence of mesotheliomas in our series does not represent the relative frequency of mesotheliomas versus adenocarcinomas in general clinical practice (about seven secondary adenocarcinomas for every mesothelioma). 46 On the other hand, not all cases of metastatic carcinoma come to biopsy, due to the known clinical history. We pooled all the adenocarcinoma cases, regardless of primary tumour site, which for obvious reasons does not provide a homogenous immunohistochemical profile, but the scenario of a 'malignant pleural effusion secondary to metastatic carcinoma of unknown primary site' represents a common problem. This approach to the unknown primary tumour resembles the true clinical dilemma. One of the strengths of our study is that we used only sections of the actual pleural deposits of secondary carcinomas, as opposed to performing immunohistochemical studies on the sections of primary carcinoma which has been done in some other studies, 47, 48 because there is always a possibility that the immunohistochemical profile for the pleural deposits is a little different from that of the primary tumour. Therefore, one may ponder whether the specified assumption of prior probabilities of 'caseness' affects the tree analysis qualitatively, or how much it changes the quantitative 15

16 values of the nodal markers. The rpart model allows the specification of optional prior parameters. For the case of equal priors (i.e., the probability of being a mesothelioma or a adenocarcinoma is 0.5), produce a different result, both in terms of the hierarchical ordering of the markers and the magnitude of their prediction probabilities. Basically, the statistical inference question is whether the study aims to address a particularistic (descriptive) problem or a general (causal) one.49 In other words, any such case series that come from hospitalbased and referral populations are somewhat specific place-specific and time-specific. However, this is no different to the variability between sensitivities and specificities between the same antibody in the hands of different investigators. Also, because the settings in particular places (say, Adelaide, Australia v. Houston, Texas) are never absolutely the same, it is not surprising that different studies, using representative samples from the study populations, can produce discrepant results and inferences. Therefore, in investigating the problem of the role of the various antibodies in the diagnosis of mesothelioma, the relative sizes of the compared series is a matter of study design, which the investigator can choose. Assuming an equal (or ignorance ) distribution is theoretically appropriate when little is known beforehand about the tumour in question. Stability of the Statistical Model Both the logistic regression and the tree-based methods identified the same three antibodies (calretinin, CD15 and BG8) that can be used to diagnose or exclude epithelial MM. However, the constructed tree can extract complementary information over the traditional regression analysis. This is conveyed, especially, by the hierarchical ordering of the included variates that lends itself to evaluation of the marker's predictive importance. The basic rule is that the closer to the root node a factor appears, the more important is its effect on the outcome. 16

17 Moreover, the dichotomous representation of markers allows convenient characterizations of individuals falling below or above a certain cutpoint. On the other hand, the relation between a marker and the outcome may not be natural in the sense that the cutpoint divides the group essentially into two homogeneous subgroups regarding the marker's influence on the outcome. In these circumstances, deciding on the existence of a suitable cutpoint is problematic. The recursive partition algorithm is a data-driven procedure and as such it behaves, by and large, as a black-box system concerning the choice of an adequate cutpoint. In the present study, the antibody labelling used only three score values. Therefore, the constructed tree is likely to be stable. Generally speaking, the process of model selection may be viewed as a pattern-recognition process. To assert that a model fits the data well using some appropriate criterion in the ordinary sense, means that the data appear consistent with the pattern predicted by the model.50 Conversely, to assert that the model fit is poor means that the data appear to deviate appreciably from the model pattern. Model correctness cannot be inferred from the fact that the fit of the model to the data is good, since there are alternative models that may also provide a good fit. A good fit is, however, a necessary condition for inference. In the context of predictive modelling, where the objective is interpretation, given specific states of knowledge, the function of automatic procedures for the selection of variates into a model (such as stepwise forward selection in logistic regression) seems limited. In principle, all simple models adequately fitting the original data set (statistical validation) should be listed, and any choice between them should be made on how successfully a model performs on a new case series (clinical validation). Issues of interaction between variates are handled implicitly; there is no need to enter product terms in the model formula. The questions are 17

18 reduced to which variates to divide on, and how to achieve the split. The justification for the tree-based methodology is to view the tree as providing a probability model. The decision on a split is made, based on a change in a deviance measure for the tree under a likelihood function that is conditioned on a fixed set of observed random variates. Note that once fixed by observation, a random variate is in no sense variable, so it might better be called a correlative or predictive marker. The unknown prediction probabilities are estimated from the proportions in the split node. The tree construction process chooses the split according to the maximum reduction in the deviance measure. When the objective is prediction, classification error rate is the appropriate criterion for judging any particular model form. Problem of overfitting predictive models There are deficiencies in the standard modelling methods. It is well known that analyses that are not pre-specified but are data-dependent and liable to lead to overoptimistic conclusions. Many applications involve a large number of variates to be modelled using a relatively small patient sample. In the case of very expressive models such as regression trees, there is the danger that the models come up with chance idiosyncrasies of the particular data, which are not true in general. It is also important that the minimum size of the terminal nodes is large enough. Problems of overfitting and of identifying important markers are exacerbated in predictive modelling, because the accuracy of a model is more a function of the number of events than of the total sample size. Experience indicates that a complex model is more likely to give overoptimistic prediction when extensive variate selection has been done. A related problem of variable selection is multicollinearity. We encountered a high correlation within clusters of positive and negative mesothelioma antigens and vis-à-vis the clusters. 18

19 Thus it was difficult to disentangle their individual effects and impossible to identify a unique solution for the regression tree. This means that correlated markers, epitomised by r(cea, WT-1) = -0.9, may be essentially measuring the same underlying pathology or construct. In statistical and predictive terms, they both convey essentially the same information, although this may obviously not be true in a biological sense. Comparison to other statistical models Our type of tree-modeling logistic regression analysis is different from the standard multiple logistic regression analysis by Carella et al.2, which was performed on various indicator combinators. It is nonetheless reassuring that their analysis also pinpointed calretinin as an important marker, and their approach can be compared to a selected list of markers in our corresponding logistic analysis. As mentioned above, Dejmek and Hjerpe 1 applied a backwards selection based regression analysis approach, whilst our tree-construction process is essentially a forward variable selection. We favour the forward selection approach on intuitive grounds. Apart from the direction of selection, the adjustment of the intercept parameter (simply, the average proportion of mesothelioma) was adjusted in a manner to represent the proportion in the general population, instead of that in the study series. An attractive property of our statistical approach is that at each node of a classification tree there is a probability distribution over the classes. The prediction probability available for each terminal node is constant, but remains dependent on the structure of the tree (related to its depth). It follows that the interpretation of this probability may not be exactly the same as the one provided by the logistic regression analysis risk model, which was also fitted to the data. 19

20 The action taken for handling missing values was to retain cases with partially unavailable marker values. The method used surrogate rules if the splitting variate was unavailable. The strategy was to pass a case down the tree as far as it will go. If it reached a terminal node, a predicted probability of caseness was computed for it. CONCLUSIONS The factors underpinning the selection of antibodies in large panels of antibodies for the distinction between MM of epithelial type and metastatic adenocarcinoma are often based on individual antibody's characteristics rather than the behavior of the whole panel. This study provides a novel and unique approach to this situation, because advanced tree-based partitioning methods are decidedly different from customary regression methods for predicting class membership on a binary response. Unlike any other statistical approach applied to this problem previously, our approach employs hierarchical modeling, with successive predictions being applied to particular cases, to sort the cases into homogeneous classes. Traditional methods use simultaneous techniques to make one and only one prediction for each and every case. Our findings, based on a large prospective study encompassing 200 consecutive pleural biopsies, indicate that a panel consisting of three antibodies, namely calretinin, BG8, and CD15 are jointly sufficient to accurately diagnose or exclude a case as a primary epithelial MM in the majority of cases. This correlates quite closely with the recommendation from the International Mesothelioma Panel that at least two mesothelial cell markers and two carcinoma-related markers be used for the immunohistochemical investigation of suspected epithelial mesothelioma. However, we emphasise that this study represents a retrospective 20

21 statistical correlation using cases with a known diagnosis of mesothelioma. We will now use the three markers (calretinin, CD15, BG8) as a prospective first-line approach to mesothelioma diagnosis and then ascertain if the use of additional markers influences the confidence index for a diagnosis of mesothelioma. However, this approach is only suitable for a differntial diagnosis of epithelial mesothelioma versus adenocarcinoma. In clinical practice, often a much wider differnial diagnosis including epithelioid hemanagioendothelioma and biphasic synovial sarcoma needs to be considered. We also emphasise that the findings in this study are valid for our department, but because of the differences in methodology for immunohistochemistry (for example variable tissue processing, antigen retrieval, dilutions of primary antibodies and different detection systems, even when using the same antibodies produced by the same manufacturer), other laboratories may record different findings. Ideally, each laboratory should establish their own optimal panel of markers for mesothelioma diagnosis. Our rational and systematic approach to mesothelioma diagnosis indicates that overly exhaustive panels of antibodies may not improve the diagnostic confidence, and in our laboratory we will proceed with the prospective study of a limited panel of three markers, as indicated above. We believe that this approach deserves wider application for the immunohistochemical diagnosis of a variety of tumours. 21

22 REFERENCES 1. Dejmek A, Hjerpe A: Reactivity of six antibodies in effusions of mesothelioma, adenocarcinoma and mesotheliosis: stepwise logistic regression analysis, Cytopathology 2000, 11:8-17 2. Carella R, Deleonardi G, D'Errico A, Salerno A, Egarter-Vigl E, Seebacher C, Donazzan G, Grigioni WF: Immunohistochemical panels for differentiating epithelial malignant mesothelioma from lung adenocarcinoma: a study with logistic regression analysis, Am J Surg Pathol 2001, 25:43-50. 3. Adachi Y, Aoki C, Yoshio-Hoshino N, Takayama K, Curiel DT, Nishimoto N: Interleukin-6 induces both cell growth and VEGF production in malignant mesotheliomas, Int J Cancer. 2006 Apr 26;. 2006, 4. King J, Thatcher N, Pickering C, Hasleton P: Sensitivity and specificity of immunohistochemical antibodies used to distinguish between benign and malignant pleural disease: a systematic review of published reports, Histopathology. 2006, 49:561-568. 5. Riera JR, Astengo-Osuna C, Longmate JA, Battifora H: The immunohistochemical diagnostic panel for epithelial mesothelioma: a reevaluation after heat-induced epitope retrieval [see comments], Am J Surg Pathol 1997, 21:1409-1419 6. Kushitani K, Takeshima Y, Amatya VJ, Furonaka O, Sakatani A, Inai K: Immunohistochemical marker panels for distinguishing between epithelioid mesothelioma and lung adenocarcinoma, Pathol Int 2007, 57:190-199 7. Yaziji H, Battifora H, Barry TS, Hwang HC, Bacchi CE, McIntosh MW, Kussick SJ, Gown AM: Evaluation of 12 antibodies for distinguishing epithelioid mesothelioma from adenocarcinoma: identification of a three-antibody immunohistochemical panel with maximal sensitivity and specificity, Mod Pathol 2006, 19:514-523 8. Ordonez NG: The immunohistochemical diagnosis of epithelial mesothelioma. [Review] [107 refs], Human Pathology 1999, 30:313-323 9. King JE, Thatcher N, Pickering CA, Hasleton PS: Sensitivity and specificity of immunohistochemical markers used in the diagnosis of epithelioid mesothelioma: a detailed systematic analysis using published data, Histopathology 2006, 48:223-232 10. Ordonez NG: The immunohistochemical diagnosis of mesothelioma: a comparative study of epithelioid mesothelioma and lung adenocarcinoma, Am J Surg Pathol 2003, 27:1031-1051 11. Ordonez NG: Immunohistochemical diagnosis of epithelioid mesothelioma: an update, Arch Pathol Lab Med 2005, 129:1407-1414 12. Ordonez NG: What are the current best immunohistochemical markers for the diagnosis of epithelioid mesothelioma? A review and update, Hum Pathol. 2007, 38:1-16. Epub 2006 Oct 2023. 13. Comin CE, Novelli L, Boddi V, Paglierani M, Dini S: Calretinin, thrombomodulin, CEA, and CD15: a useful combination of immunohistochemical markers for differentiating pleural epithelial mesothelioma from peripheral pulmonary adenocarcinoma, Hum Pathol 2001, 32:529-536. 14. Yaziji H, Battifora H, Barry TS, Hwang HC, Bacchi CE, McIntosh MW, Kussick SJ, Gown AM: Evaluation of 12 antibodies for distinguishing epithelioid mesothelioma from adenocarcinoma: identification of a three-antibody immunohistochemical panel with maximal sensitivity and specificity, Mod Pathol 2006, 19:514-523 15. Frisman DM: immunoquery. Edited by 2006, p. 22

23 16. Dejmek A, Hjerpe A: Immunohistochemical reactivity in mesothelioma and adenocarcinoma: A stepwise logistic regression analysis, APMIS 1994, 102:255-264 17. Battifora H, McCaughey WTE: Tumors of the Serosal Membranes. Atlas of Tumor Pathology, 3rd series, fascicle 15. Edited by Washington D.C., Armed Forces Institute of Pathology, 1995, p 18. Henderson DW, Shilkin KB, Langlois SL, Whitaker D: Malignant Mesothelioma. Edited by New York, Hemisphere, 1992, p. 412 19. Churg A, Cagle PT, Roggli VL: Tumors of the Serosal Membranes. Edited by Washington DC, American registry of Pathology/Armed forces Institute of Pathology, 2006, p 20. Sporn TA, Roggli VL: Mesothelioma. Edited by Roggli VL, Oury TD, Sporn TA. New York, Springer, 2004, p. pp. 104-168 21. Churg A, Roggli V, Galateau-Salle F, Cagle PT, Gibbs AR, Hasleton PS, Henderson DW, Vignaud JM, Inai K, Praet M, Ordonez NG, Hammar SP, Testa JR, Gazdar AF, Saracci R, Pugatch R, Samet JM, Weill H, Rusch V, Colby TV, Vogt P, Brambilla E, Travis WD: Mesothelioma. Edited by Travis WD, Brambilla E, Müller-Hermelink HK, Harris C. Lyon, IARC, 2004, p. pp. 128-136 22. Galateau-Sallé F: International Mesothelioma Panel: Brambilla E., Cagle PT, Churg AM, Colby, TV, Gibbs AR, Hammar SP, Hasleton PS, Henderson DW, Inai K, Praet M, Roggli VL, Travis WD, Vignaud JM: Pathology of Malignant Mesothelioma. Edited by London, Springer, 2006, p. 23. Breiman L, Friedman JH, Olshen RA, Stone CJ: Classification and Regression Trees. Edited by Belmont CA. Wadswoth & Brooks, 1984, p 24. Nurminen M: Prognostic models for delayed onset of renal allograft function., The internet journal of epidemiology 2003, 1: 25. Venables W, Ripley B: Modern applied statistics with S-Plus. Edited by New York, Springer Verlag, 2002, p 26. Bateman AC, Al-Talib RK, Newman T, Williams JH, Herbert A: Immunohistochemical phenotype of malignant mesothelioma: predictive value of CA125 and HBME-1, Histopathology 1997, 30:49-56 27. Barberis MC, Faleri M, Veronese S, Casadio C, Viale G: Calretinin. A selective marker of normal and neoplastic mesothelial cells in serous effusions, Acta Cytol 1997, 41:1757-1761 28. Leers MP, Aarts MM, Theunissen PH: E-cadherin and calretinin: a useful combination of immunochemical markers for differentiation between mesothelioma and metastatic adenocarcinoma, Histopathology 1998, 32:209-216 29. Ordonez NG: Value of calretinin immunostaining in differentiating epithelial mesothelioma from lung adenocarcinoma, Modern Pathology 1998, 11:929-933 30. Abutaily AS, Addis BJ, Roche WR: Immunohistochemistry in the distinction between malignant mesothelioma and pulmonary adenocarcinoma: a critical evaluation of new antibodies, J Clin Pathol 2002, 55:662-668 31. Fetsch PA, Simsir A, Abati A: Comparison of antibodies to HBME-1 and calretinin for the detection of mesothelial cells in effusion cytology, Diagn Cytopathol. 2001, 25:158-161. 32. Miettinen M, Sarlomo-Rikala M: Expression of calretinin, thrombomodulin, keratin 5, and mesothelin in lung carcinomas of different types: an immunohistochemical analysis of 596 tumors in comparison with epithelioid mesotheliomas of the pleura, Am J Surg Pathol 2003, 27:150-158 33. Ordonez NG: The diagnostic utility of immunohistochemistry in distinguishing between epithelioid mesotheliomas and squamous carcinomas of the lung: a comparative study, Mod Pathol 2006, 19:417-428. doi:410.1038/modpathol.3800544 23

24 34. King J, Thatcher N, Pickering C, Hasleton P: Sensitivity and specificity of immunohistochemical antibodies used to distinguish between benign and malignant pleural disease: a systematic review of published reports, Histopathology 2006, 49:561-568 35. Ordonez NG: What are the current best immunohistochemical markers for the diagnosis of epithelioid mesothelioma? A review and update, Hum Pathol 2007, 38:1-16 36. Ordonez NG: Value of thyroid transcription factor-1, E-cadherin, BG8, WT1, and CD44S immunostaining in distinguishing epithelial pleural mesothelioma from pulmonary and nonpulmonary adenocarcinoma, Am J Surg Pathol 2000, 24:598-606 37. Ordonez NG: The diagnostic utility of immunohistochemistry in distinguishing between mesothelioma and renal cell carcinoma: a comparative study, Hum Pathol 2004, 35:697-710 38. Sheibani K, Battifora H, Burke JS: Antigenic phenotype of malignant mesotheliomas and pulmonary adenocarcinomas. An immunohistologic analysis demonstrating the value of Leu M1 antigen, American Journal of Pathology 1986, 123:212-219 39. Sheibani K, Battifora H, Burke JS, Rappaport H: Leu-M1 antigen in human neoplasms: An immunohistologic study of 400 cases, American Journal of Surgical Pathology 1986, 10:227-236 40. Miettinen M, Limon J, Niezabitowski A, Lasota J: Calretinin and other mesothelioma markers in synovial sarcoma: analysis of antigenic similarities and differences with malignant mesothelioma, Am J Surg Pathol 2001, 25:610-617. 41. Abutaily AS, Addis BJ, Roche WR: Immunohistochemistry in the distinction between malignant mesothelioma and pulmonary adenocarcinoma: a critical evaluation of new antibodies, J Clin Pathol. 2002, 55:662-668. 42. Ordóñez NG: The immunohistochemical diagnosis of mesothelioma. Differentiation of mesothelioma and lung adenocarcinoma, American Journal of Surgical Pathology 1989, 13:276-291 43. Wick MR, Loy T, Mills SE, al e: Malignant epithelioid pleural mesothelioma versus peripheral pulmonary adenocarcinoma: A histochemical, ultrastructural, and immunohistologic study of 103 cases, Human Pathology 1990, 21:759-766 44. Ordonez NG: Mesothelioma with rhabdoid features: an ultrastructural and immunohistochemical study of 10 cases, Mod Pathol 2006, 19:373-383 45. Battifora H: The pleura. Edited by Sternberg SS. New York, Raven Press, 1989, p. pp. 829-855 46. DiBonito L, Falconieri G, Colautti I, Bonifacio D, Dudine S: The positive pleural effusion. A retrospective study of cytopathologic diagnoses with autopsy confirmation, Acta Cytol 1992, 36:329-332 47. Ordonez NG: Value of the MOC-31 monoclonal antibody in differentiating epithelial pleural mesothelioma from lung adenocarcinoma, Human Pathology 1998, 29:166-169 48. Ordonez NG: Value of the Ber-EP4 antibody in differentiating epithelial pleural mesothelioma from adenocarcinoma. The M.D. Anderson experience and a critical review of the literature. [Review] [21 refs], American Journal of Clinical Pathology 1998, 109:85-89 49. Miettinen O: Theoretical Epidemiology.Principles of Occurrence Research in Medicine.. Edited by New York, Wiley, 1985, p. Section 1.6. 50. Greenland S: Summarization, smoothing, and inference in epidemiologic analysis., Scandinavian Journal of Social Medicine 1993, 21:227-231 51. Ordonez NG: Immunohistochemical diagnosis of epithelioid mesotheliomas: a critical review of old markers, new markers, Hum Pathol. 2002, 33:953-967. 52. Ordonez NG: Value of cytokertin5/6 in distinguishing epithelial mesothelioma of pleura from lung adenocarcinoma., Am J Surg Pathol 1998, 22:1215-1221 24

25 53. Ordonez NG: Role of immunohistochemistry in distinguishing epithelial peritoneal mesotheliomas from peritoneal and ovarian serous carcinomas. [Review] [85 refs], American Journal of Surgical Pathology 1998, 22:1203-1214 54. Clover J, Oates J, Edwards C: Anti-cytokeratin 5/6: a positive marker for epithelioid mesothelioma, Histopathology 1997, 31:140-143 55. Attanoos RL, Webb R, Dojcinov SD, Gibbs AR: Value of mesothelial and epithelial antibodies in distinguishing diffuse peritoneal mesothelioma in females from serous papillary carcinoma of the ovary and peritoneum, Histopathology 2002, 40:237-244. 56. Foster MR, Johnson JE, Olson SJ, Allred DC: Immunohistochemical analysis of nuclear versus cytoplasmic staining of WT1 in malignant mesotheliomas and primary pulmonary adenocarcinomas, Arch Pathol Lab Med 2001, 125:1316-1320 57. Oates J, Edwards C: HBME-1, MOC-31, WT1 and calretinin: an assessment of recently described markers for mesothelioma and adenocarcinoma, Histopathology 2000, 36:341-347 58. Gulyas M, Hjerpe A: Proteoglycans and WT1 as markers for distinguishing adenocarcinoma, epithelioid mesothelioma, and benign mesothelium, Journal of Pathology 2003, 199:479-487 59. Amin KM, Litzky LA, Smythe WR, Mooney AM, Morris JM, Mews DJ, Pass HI, Kari C, Rodeck U, Rauscher FJ, 3rd, et al.: Wilms' tumor 1 susceptibility (WT1) gene products are selectively expressed in malignant mesothelioma, Am J Pathol. 1995, 146:344-356. 60. Renshaw AA, Pinkus GS, Gorson JM: HBME-1 aids in distinguishing mesotheliomas and adenocarcinomas of the lung and breast., Mod Pathol 1995, 8:152A 61. Gonzalez-Lois C, Ballestin C, Sotelo MT, Lopez-Rios F, Garcia-Prats MD, Villena V: Combined use of novel epithelial (MOC-31) and mesothelial (HBME-1) immunohistochemical markers for optimal first line diagnostic distinction between mesothelioma and metastatic carcinoma in pleura, Histopathology 2001, 38:528-534. 62. Attanoos RL, Goddard H, Thomas ND, Jasani B, Gibbs AR: A comparative immunohistochemical study of malignant mesothelioma and renal cell carcinoma: the diagnostic utility of Leu-M1, Ber EP4, Tamm-Horsfall protein and thrombomodulin, Histopathology. 1995, 27:361-366. 63. Ordonez NG: Role of immunohistochemistry in differentiating epithelial mesothelioma from adenocarcinoma. Review and update. [Review] [144 refs], American Journal of Clinical Pathology 1999, 112:75-89 64. Skov BG, Lauritzen AF, Hirsch FR, Skov T, Nielsen HW: Differentiation of adenocarcinoma of the lung and malignant mesothelioma: predictive value and reproducibility of immunoreactive antibodies, Histopathology. 1994, 25:431-437. 65. Szpak CA, Johnston WW, Roggli V, al e: The diagnostic distinction between malignant mesothelioma of the pleura and adenocarcinoma of the lung as defined by a monoclonal antibody (B72.3), American Journal of Pathology 1986, 122:252-260 66. Jordan DA, Jagirdar J, Kaneko M: Blood group antigens, Lewisx and Lewisy in the diagnostic discrimination of malignant mesothelioma versus adenocarcinoma, American Journal of Pathology 1989, 135:931-938 25

26 Table 1. Details of antibodies used, including source, antigen retrieval and dilutions. General Pattern of labeling Antigen retrieval Dilution Markers Positive in Malignant Mesothelioma Calretinin (Zymed) Currently regarded by many as the most sensitive and specific marker for MM.2, 12, 13, 51 Accept ONLY nuclear labelling (+/- cytoplasmic) Patchy cytoplasmic may be present in some carcinomas Trypsin 1:500 CK5/6 (Chemicon) Positive in most epithelial MM (negative in most adenocarcinomas), but positive in ovarian serous carcinomas 52-55 Membrane labelling Citric acid ph6 1:100 WT-1 (Dako) A protein expressed by some fetal tissues and adult mesothelium. Reportedly good sensitivity and specificity for epithelial mesotheliomas;7, 40, 56-59 not usually reactive with renal cell carcinoma Nuclear labelling Citric acid ph6 1:100 Thrombomo dulin (Dako) Avoids missing an epithelioid hemangioendotheliom a Membrane labelling Trypsin 1:400 HBME-1 (Dako) Raised from human mesothelial cell line, exact antigen not known, appears to be associated with microvilli. Variably regarded 9, 12, 26, 57, 60, 61 Membrane labelling No retrieval 1:15,000 Please note: data sheet suggests 1:50-100 26

27 60, 61 1:50-100 Markers Positive in Carcinoma CEA (Zymed) Strong, diffuse, linear labelling supports diagnosis of malignancy CD15 (Leu- Well M1) characterized12, 38, 39, 41-43useful for (Dako) distinction from renal cell carcinoma (most are positive)37, 62 B72.3 (kind gift of Dr Cant) BG-8 (Signet) TTF-1 (Dako) Others EMA (clone E29) CAM5.2 (Becton- Dickinson) Complex glycoprotein expressed in breast Ca, well established in literature 10, 11, 28, 33, 63, 64 but variable reports 43, 65 LewisY antigen, labels adenocarcinoma,5, 7, 9, 36, 66 and 80% of squamous cell carcinomas,33 no labelling of renal cell carcinomas37 Very specific for primary lung carcinoma (or carcinomas of thyroid follicular epithelium) Membrane-bound glycosylated phosphoprotein anchored to the apical surface of many epithelia. General epithelial marker Membrane +/- cytoplasmic labelling Membrane +/- cytoplasmic labelling Predominantly membrane labelling Predominantly membrane labelling Nuclear labeling No retrieval No retrieval No retrieval No retrieval Alkali retrieval ph9 1:200 1:50 1:4000 1:200 1:200 Strong diffuse membrane labeling supports dx of MM, but cytoplasmic +/- membrane labeling in some carcinomas No retrieval 1:100 Membrane Trypsin 1:100 27

28 Table 2. Individual sensitivity (Se) and specificity (Sp) of immunohistochemical markers for epithelial mesothelioma and adenocarcinoma, based on 173 cases of epithelial mesothelioma and 27 cases of adenocarcinoma. Discrepant percentages resulting from scoring equivocal marker staining as negative diagnosis are given in parenthesis. Epithelial marker Sensitivity Se, % Specificity Sp, % Total number of tests CAM 5.2 100.0 0.0 194 Mesothelioma marker Sensitivity Se, % Specificity Sp, % Total number of tests Calretinin 98.2 (95.2) 81.5 (85.2) 194 CK 5.6 96.6 57.9 137 EMA 90.9 (86.0) 7.7 (15.4) 190 HBME-1 89.2 (86.7) 76.0 (80.0) 191 Thrombomodulin 89.6 (79.1) 56.0 (68.0) 188 WT-1 77.8 88.9 126 Adenocarcinoma marker 1 Se, % 1 Sp, % Total number of tests B72.3 98.2 (98.8) 54.2 (50.0) 187 BG8 83.2 88.5 193 CD15 68.2 73.1 196 CEA 100.0 63.0 193 Ber-EP4 82.4 83.3 57 TTF-1 92.9 52.9 59 28

29 Table 3. A logistic regression analysis of multiple antibodies for the diagnosis of epithelial malignant mesothelioma. Statistical significance Regression term Estimate Standard error t value p value Intercept 2.02 1.21 1.62 0.11 Calretinin 4.53 1.63 2.77 0.006 BG.8-4.47 1,99-2.25 0.026 B72.3-3.55 1.62-2.19 0.017 CD.15-4.05 1.68-2.41 0.030 29

30 Table 4. A regression tree-based analysis of the deviance of multiple antibodies for the diagnosis of epithelial malignant mesothelioma. *denotes a terminal node. Criterion of partition Number Deviance Predicted Terminal for nodes of the of statistic within probability1 of node regression tree cases the node caseness, % * Top node of the tree 200 23.4 86.5 Left branch Calretin = 0 26 3.4 15.4 HBME-1 < 1 19 0.9 5.3 * HBME-1 = 1 7 1.7 1.7 * Right branch Calretin > 0 174 4.9 97.1 Left branch CD15 > 0 9 2.0 88.7 * Right branch CD15 = 0 165 2.0 98.8 * 1Probability at the top of the tree equals the relative frequency of mesothelioma cases in the series of patients with pleural malignancies. 30

31 Table 5. A classification tree-based analysis of the deviance of multiple antibodies for a diagnosis of epithelial mesothelioma. *denotes a terminal node. Order and criterion of node partition Number of cases Prediction misclassification1 rate,% Probability1 of mesothelioma caseness, %% 0) Root node 200 50 50 / 50 Terminal node * 1 Left branch) calretinin = 26 8.9 97.2 / 2.8 * 0 1 Right branch) calretinin 174 10.6 15.9 / 84.1 > 0 2 L) BG8 > 0 32 48.8 54.3 / 45.7 3 L) CD15 > 0 7 33.0 82.8 / 17.2 * 3 R) CD15 = 0 25 28.6 35.8 / 64.2 * 2 R) BG8 = 0 142 0 0 / 100 * 1Probability means relative frequency (empirical or theoretical), as in "the clinically observed relative frecuency of mesothelioma cases in our series of patients with pleural malignancies was 0.865" or "the model predicted probability of mesothelioma diagnosis based on the three antibodies was estimated to be 100% (correct), that is the misclassificaton rate (frequency in the particular patient population experience) was zero". 31

32 Classification Tree Model 27/173 0.5 22/4 0.028 Calretinin = 0 Calretinin > 0 5/169 0.841 BG8 > 0 BG8 = 0 5/27 0.457 0/142 1 3/4 0.172 CD15 > 0 CD15 = 0 2/23 0.642 Figure 1. A decision tree (based on Table 5) for the immunohistochemical assessment of epithelial mesothelioma, assuming equal likelihood of the tumour representing mesothelioma or adenocarcinoma (0.5 likelihood of being a malignant mesothelioma at the first node). Each of the following nodes then provides the allocation ratio of cases as either tumour (adenocarcinoma / mesothelioma) and provides the likelihood of a lesion being a MM sbased on the immunohistochemical labelling. The inferior nodes (ellipses) reached by those initial distinctions are further analysed by left and right splits. The terminal nodes (rectangles) give the final differentiation between the cases. The decimal figure in each node is the predicted probability (ranging from 0 to 1) that an individual case represents an epithelial MM. For example, if a tumour was negative for calretinin at the first node, there was a 2.8% chance (probablity of 0.028) of the tumour being finally diagnosed as epithelial MM. 32

33 Marker scoring: 1 = positive, 0.5 = equivocal, 0 = negative. The binary cutpoint was decided by the algorithm (i.e., whether to align the indeterminate score 0.5 with either 0 or 1). 33