1 International Journal of Mathematical Sciences Vol. 10, No. 3-4, July-December 2011, pp Serials Publications PREDICTION OF INDIVIDUAL CELL FREQUENCIES IN THE COMBINED 2 2 TABLE UNDER NO CONFOUNDING IN STRATIFIED CASE-CONTROL STUDIES A. K. Dixit & F. Ansari Abstract: In stratified case-control studies, the individual cell frequencies in the combined table are so predicted that the simple Odd Ratio (OR) computed from this table approximates Mantel-Haenszel odd ratio (ORMH). This provides useful information about the distribution of cell frequencies in the combined table under no-confounding, which is of much use in health promotion, planning and management. The procedure first nullifies the effective strength of association of the two components of the confounder and then uses linear regression between the cell counts of matched pathways drawn to connect attributes providing each cell frequency in combined table, through presence and absence of confounder. The procedure when applied in considered 30 case-control situations produced OR close to ORMH in terms of the ratio of ORMH to it. On an average, this ratio was (SD = 0.10). Underestimation was however noted (in cases, ORMH > 1); but with average of the considered ratio (SD = 0.08). Keywords: Case-control studies, Cell frequencies, Distribution, Confounding, Pathways, Prediction, Regression. 1. INTRODUCTION In a stratified case-control study, the distribution of cell frequencies in the combined 2 2 table (combined table), which is the 2 2 table with its cell frequencies as the sum of respective individual cell frequencies in the 2 2 tables formed across the strata, is their distribution under confounding. Though, the measure, such as Mantel-Haenszel Odd Ratio (ORMH), provides reasonable overall estimate of the Odd Ratio ( OR), addressing confounding through stratification  but such measures do not provide the distribution of cell frequencies in the combined table under no confounding. Today, when we talk more of health than disease, it becomes of much interest to know the distribution of cell frequencies in the combined table in absence of confounding. The health promotion targets, contrary to disease prevention targets not only aim at reducing the risk factor prevalence among the diseased but simultaneously to enhance absence of risk factor among the non diseased, e.g. in an example of health promotion , the targets set were to reduce tobacco consumption by 50% among the diseased as well as to increase nonsmoking to 80% among non diseased.
2 412 A. K. Dixit & F. Ansari Here for proper targets setting, it becomes important to assess beforehand, the distribution of cell frequencies in the combined table under no confounding. The paper describes a procedure to predict individual cell frequencies in the combined table such that the simply computed OR from this table approximates well the ORMH. As such, then it provides an idea of the distribution pattern of cell frequencies in the combined table, had there been no-confounding. The procedure derives first the effective strength of association of the two components of the confounder and nullifies it. The prediction of individual cell frequencies in the combined table then is done using linear regression between the cell counts of matched pathways drawn to connect attributes providing each cell frequency in the combined table, through presence/absence of the confounder. 2. MATERIAL AND METHOD 30 situations of case-control studies were considered, in which only two categories of a confounder was considered, for the purpose of demonstration. Thus, each study considered was described by the two given 2 2 tables pertaining to the two categories of a confounder. The cell counts in these tables, usually denoted by letters, a, b, c and d; read (R +, D + ), (R +, D ), (R, D + ) and (R, D ) combinations for the presence/absence of disease (D + /D ) and the risk factor (R + /R ). The thirty situations considered included examples of stratified case-control studies from the literature and also the simulated situations, so as to cover wide variation in ORMH values ( ), which Figure in all most all practical situations. First, the strength of association of the two components of the confounder in terms of OR was deduced from the given two 2 2 tables in each study, by constructing two more 2 2 tables; one to provide OR 2, the strength of association of the confounder with the disease and the other to provide OR 3, its strength of association with the risk factor. For this, one of the two given tables pertaining to two categories of the confounder was designated with presence of confounder (C + ), for which number of diseased (D + ) was more and then the other with absence of confounder (C ). The a, b, c, d counts of the first 2 2 table constructed, read respectively (C +, D + ), (C +, D ), (C, D + ), and (C, D ) combinations. Naturally, the first row of this constructed table is column totals of the given 2 2 table, designated with presence of confounder. The second row of this table then is the column totals of the given 2 2 table designated with absence of confounder. The OR of this constructed table provided OR 2. Similarly, for the other 2 2 table constructed, the usual a, b, c, d counts read respectively ( R +, C + ), ( R +, C ), ( R, C + ), and ( R, C ). The first column of the second constructed table read the row totals of the given table, designated with presence of confounder and the second column of it read the row totals of the given table, designated with absence of confounder. The OR of second constructed table provided OR 3.
3 Prediction of Individual Cell Frequencies in the Combined 2 2 Table Under In state of no-confounding, the two components of the confounder should be independent of their associations. Accordingly for no-confounding state, individual cell frequencies in the tables providing OR 2 and OR 3 must replace themselves with their expected frequencies under independence. With the replacement of cell frequencies, providing OR 2 and OR 3 by their expected number under independence; two more 2 2 tables were obtained as to provide respectively OR 2 and OR 3, both equal to unity. It was noted from the 2 2 tables, constructed to provide OR 2 and OR 3 that to reach from R + to D + (R + D + ) in the combined table, we have two pathways through the presence (C + ) and absence (C ) of the confounder, viz. R + C + D + and R + C D + and so is the case under the state of no-confounding. To get the predicted frequency of ( R +, D + ) count in the combined table, the similar pathways counts under confounding and noconfounding states; viz the count (R +, C + ) of the first constructed table (confounding) and then this count from the second constructed table (no confounding) were paired as ( x, y) pair and so also the counts (C +, D + ), (R +, C ) and (C, D + ) were paired. These pairs were then regressed under linear regression. To this regression, when the observed ( R +, D + ) value of the combined table was put as x value, the y value, it yielded was put as predicted (R +, D + ) frequency in the combined table. Likewise, the predicted frequencies of (R +, D ), (R, D + ), (R, D ) counts in the combined table were obtained. The simple OR of these predicted frequencies in the combined table approximated ORMH well. The procedure is illustrated with an example at the end. 3. RESULTS The simple OR computed from the predicted cell frequencies in the combined table closely estimated in each case the ORMH, in terms of the ratio of ORMH to it. The average of this ratio was found (SD = 0.10). This means, the relative odds provided by ORMH and its obtained estimate from the predicted cell frequencies have been close enough. In 57% of cases however, underestimation was noted but with average of the considered ratio (SD = 0.08). The underestimation was found more in case, ORMH > 1with average ratio of (SD = 0.09) compared to (SD = 0.03), in case, ORMH < DISCUSSIONS The procedure reported here predicts individual cell frequencies in the combined table, had there been no confounding and as such, the simple OR computed from this table approximates ORMH. Earlier, this table used to provide crude rate ratio (CRR) only. This amounts to say that the distribution of cell frequencies in the original combined table (under the state of confounding) is so changed through the predicted cell frequencies that
4 414 A. K. Dixit & F. Ansari it becomes the distribution under the state of no confounding. The approximation is looked in terms of the ratio of ORMH to its obtained estimate, as the concern has been the assessment of the strength of association of the risk factor with the disease by the two. The average value of the ratio of ORMH to its estimate is found to be (SD = 0.10), which shows the closeness between the two as far as the assessment of the strength of association of risk factor with the disease by the two is concerned. It is observed that there is a bit underestimation, in particular in case, when ORMH > 1 but the magnitude of underestimation is just 0.08 times only. The procedure thus reasonably approximates ORMH, in the described sense. Though, in the studied situations, there was no such situation in which the ratio of ORMH to its estimate lied beyond the interval ( ) but as the procedure uses regression, outliers may occur at times. The procedure described has various applications. An immediate is to comprehend the distribution pattern of the cell frequencies in the combined table, had there been noconfounding, which is of much use in health promotion, where the targets are simultaneously set to reduce the prevalence of the risk factor among the diseased and to enhance the absence of it among the non diseased. The assessment of the proportion of diseased with the risk factor (exposure), when there is no confounding can also be done from here, which is used in public health planning . Like wise, the proportion of non- exposed with the disease could be assessed (in absence of confounding), which has its uses in risk management . In obtaining the predicted cell frequencies in the combined table, the described procedure first derives the strength of association of the two components of the confounder, which give rise to confounding and then nullifies it. The linear regression is now used between the (x, y) values; where x relates to a path value with presence of operative strength of the confounder and y, the corresponding value, with its absence. As linear regression, in a way between state of confounding and no-confounding yields the results; it exhibits such a relationship between the two states. In literature also, regression is used in addressing confounding , but it does not address the operative strength of the two components of confounder, rather it directly regresses the disease-exposure responses along with values of confounder. As such, with an obtained OR, adjusting for confounding; we are unable to predict back the individual cell frequencies in the combined table, which may produce this OR. 5. CONCLUSION The procedure described predicts well the distribution pattern of the cell frequencies in the combined table in absence of confounding, as the simple computed OR approximates well the ORMH. Knowing this distribution is of much use to health planners.
5 Prediction of Individual Cell Frequencies in the Combined 2 2 Table Under ILLUSTRATION OF THE PROCEDURE An example of a stratified case-control study cited here  presents the two 2 2 tables of disease (D) and risk factor (R) association, pertaining to two strata as per two categories of a considered confounder. The (R +, D + ), (R +, D ), (R, D + ) and (R, D ) counts for the presence (+) and absence ( ) of D and R combinations read respectively 300,,50,25 for stratum I and 50, 25,, for stratum II as depicted in left of Diagram 1. The respective counts in the combined table (given beneath) then are 350, 100, 125, 100. The row totals in stratum I are 3, and in stratum II are, 150. The respective column totals are 350, 100 and 125, 100. The value of CRR computed is ( / ) = 2.8 and that of ORMH(1) is (300 25/ /225)/( 50/ /225) = 2.0, where 450 and 225 are the totals of all the cell counts in the respective 2 2 tables. Diagram 1 Schematic Presentation of Given and Constructed 2 2 Tables Stratum I (C + ) Constructed 2 2 tables Constructed 2 2 tables Given 2 2 tables (Confounding state) (No confounding state) D + D Total D + D Total D + D Total R C C R C C Total Total Total Stratum II. (C ) OR 2 = 2.8 OR 2 = 1 D + D Total D + D Total D + D Total R R R R 150 R R Total Total Total Combined table OR 3 = 10.0 OR 3 = 1 Combined table with predicted frequencies D + D Total D + D Total R R R R Total Total CRR = 2.8 Estimated ORMH = 1.95 ORMH = 2.0
6 416 A. K. Dixit & F. Ansari Two more 2 2 tables to get OR 2 and OR 3 are now constructed. As D + count is more in stratum I (350), compared to stratum II (125), the 2 2 table in stratum I, is designated with presence of confounder (C + ) and then the table in the stratum II, with the absence of the confounder ( C ). These tables are then accordingly marked as C + and C with in parentheses. The two constructed 2x2 tables are presented in the middle of Diagram 1. Among these; table which provides OR 2 reads the counts (C +, D + ), (C +, D ), (C, D + ), and (C, D ) which are obtained from the original two 2 2 tables (given in the left ofdiagram 1). The first row of it is 350, 100 which are column totals in given 2 2 table in stratum 1 and its second row is 125, 100 which are column totals in stratum 2 of this table. The other table to provide OR 3 reads the counts (R +, C + ), (R +, C - ), (R, C + ), and (R, C ) which are also obtained from the original 2 2 tables. The first column of it is 3, which are row totals in stratum 1 of given table and the second column is, 150 which are row totals in stratum 2 of given table. The OR of the newly constructed tables respectively provided OR 2 = 2.8 and OR 3 =10. The cell frequencies in constructed tables are then replaced by their expected frequencies under independence of association (no confounding). This replacement is shown in extreme right of diagram 1. The OR computed from these tables, OR 2 = OR 3 = 1. Diagram 2 Pathways Through Presence and Absence of Confounder Confounding state No confounding state C + C R + 3 D + R + D + C C 158 C C R + D R + D C C C + C R D + D + R C C C + C R R D D C C 67
7 Prediction of Individual Cell Frequencies in the Combined 2 2 Table Under In diagram 2, the pathways to reach from, R + to D +, R + to D, R to D + and R to D ; through the presence (C + ) and absence (C ) of the confounder, in case with confounding (in left) and in case with no confounding (right) are presented with respective cell counts marked. To reach from R + to D +, we have two pathways through the presence and absence of the confounder, viz. (a) R + C + D + (b) R + C D +. The count (R +, C + ) is 300, as read from table providing OR 3, which is marked on the path R + C +, and likewise we had all respective markings. To get the predicted (R +, D + ) frequency in the combined original 2 2 table, the similar pathways counts marked under confounding and no confounding i.e. the marked counts on similar pathways in left and right of diagram 2 are paired, viz. we paired (3, 300), (350, 317), (, 150) and (125, 158). These ( x, y) pairs were regressed using linear regression y = a + bx. The regression constants estimated were a = 98.20, b = 0.5. To the estimated regression, when (R +, D + ) count of the original combined table (350) was put as x, it yielded y = (approx 300) which is predicted (R +, D + ) cell count in the original combined table to be used to get ORMH. Following the procedure, the combined table with predicted cell frequencies is given in right of the combined table used to get CRR in diagram 1. The OR computed from the original combined table with predicted cell frequencies is 1.95, which is the estimated ORMH value obtained from the combined table. The ratio of ORMH to its estimated value obtained is 1.02; which is quite close to ORMH in the sense discussed in the text. REFERENCES  Mantel N., and Haenszel W., (1959), Statistical Aspects of the Analysis of Data from Retrospective Studies of Diseases. J Natl. Cancer Inst., 22:  Abelin T., Brzezinski Z. J., and Carstairs Vera D. L., (1987), Measurement in Health Promotion and Protection, WHO Regional Pub., European series No.22.  Saracci R., and Vineis P., (2007), Disease Proportions Attributable to Environment, Environmental Health., 6: 38.  Coggon D., Rose G., and Barker D., (1997), Epidemiology for the Uninitiated, (Fourth Ed.), BMJ Publishing Group.  McNamee R., (2005), Regression, Modelling and Other Methods to Control Confounding, Occup Environ Med., 62:  Ahlbom A., and Norell S., (1990), Introduction to Modern Epidemiology, (Second Ed.), Epidemiology Resources Inc. A. K. Dixit F. Ansari Desert Medicine Research Centre, Banasthali University, Jodhpur, Rajasthan, India. Rajasthan, India. rediffmail.com