Multiple Regression Analysis of Twin Data

Similar documents
Chapter 4. Quantitative genetics: measuring heritability

The Human Genome. Genetics and Personality. The Human Genome. The Human Genome 2/19/2009. Chapter 6. Controversy About Genes and Personality

Litteratur. Lärandemål för undervisningstillfälle. Lecture Overview. Basic principles The twin design The adoption design

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

8 th European Conference on Psychological Assessment

Investigating the genetic basis for intelligence

Genetic and environmental contributions to alcohol dependence risk in a national twin sample: consistency of findings in women and men

AUTISM SPECTRUM RATING SCALES (ASRS )

2 The Use of WAIS-III in HFA and Asperger Syndrome

Risk Factors in the Development of Anxiety Disorders: Negative Affectivity. Peter J. Norton, Ph.D.

Heritability: Twin Studies. Twin studies are often used to assess genetic effects on variation in a trait

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

VALIDATION OF A TEACHER MEASURE OF SCHOOL READINESS WITH PARENT AND CHILD-CARE PROVIDER REPORTS

Temperament and Character Inventory R (TCI R) and Big Five Questionnaire (BFQ): convergence and divergence 1

A twin study of non-alcohol substance abuse

MATHEMATICS AS THE CRITICAL FILTER: CURRICULAR EFFECTS ON GENDERED CAREER CHOICES

Measurement and Measurement Scales

ALCOHOL CONSUMPTION AMONG HOSPITALITY AND NONHOSPITALITY MAJORS: IS IT AN ISSUE OF PERSONALITY

Family environment and the malleability of cognitive ability: A Swedish national home-reared and adopted-away cosibling control study

Encyclopedia of School Psychology Neuropsychological Assessment

Canonical Correlation Analysis

Estimate a WAIS Full Scale IQ with a score on the International Contest 2009

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

interpretation and implication of Keogh, Barnes, Joiner, and Littleton s paper Gender,

Mode and Patient-mix Adjustment of the CAHPS Hospital Survey (HCAHPS)

Genetic and Environmental Influences on Vocabulary IQ: Parental Education Level as Moderator

Longitudinal Genetic Analysis of Problem Behaviors in Biologically Related and Unrelated Adoptees

The Relationship between IQ Phenotypic Variance and IQ Heritability as a Function of. Environment

PARTIAL LEAST SQUARES IS TO LISREL AS PRINCIPAL COMPONENTS ANALYSIS IS TO COMMON FACTOR ANALYSIS. Wynne W. Chin University of Calgary, CANADA

Estimated Impacts of Number of Years of Preschool Attendance on Vocabulary, Literacy and Math Skills at Kindergarten Entry

A study of the Singapore math program, Math in Focus, state test results

WISC IV and Children s Memory Scale

The Genetic Epidemiology of Substance Abuse

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Genetic basis for comorbidity of alcohol and marijuana dependence

Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random

Applications of Structural Equation Modeling in Social Sciences Research

This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail.

Research on Adoption and Post-Adoption Services and Supports (PASS)

UNDERSTANDING THE TWO-WAY ANOVA

Running head: ASPERGER S AND SCHIZOID 1. A New Measure to Differentiate the Autism Spectrum from Schizoid Personality Disorder

A PUBLICATION OF THE NATIONAL COUNCIL FOR ADOPTION ADOPTION USA: SUMMARY AND HIGHLIGHTS OF A CHARTBOOK ON THE NATIONAL SURVEY OF ADOPTIVE PARENTS

THE CHALLENGES OF BUSINESS OWNERSHIP: A COMPARISON OF MINORITY AND NON-MINORITY WOMEN BUSINESS OWNERS

LOGISTIC REGRESSION ANALYSIS

ESTIMATION OF THE EFFECTIVE DEGREES OF FREEDOM IN T-TYPE TESTS FOR COMPLEX DATA

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

Normal and Abnormal Personality Traits: Evidence for Genetic and Environmental Relationships in the Minnesota Study of Twins Reared Apart

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Session 7 Bivariate Data and Analysis

twin studies: what can they tell us about nature and nurture?

Reception baseline: criteria for potential assessments

Additional sources Compilation of sources:

Adult Child Ratio in Child Care

Handling attrition and non-response in longitudinal data

Woodcock Reading Mastery Tests Revised, Normative Update (WRMT-Rnu) The normative update of the Woodcock Reading Mastery Tests Revised (Woodcock,

Chapter 7 COGNITION PRACTICE 240-end Intelligence/heredity/creativity Name Period Date

Chapter 7 Factor Analysis SPSS

National Academy of Sciences Committee on Educational Interventions for Children with Autism

National Disability Authority Resource Allocation Feasibility Study Final Report January 2013

Algebra 1 Course Information

Lean Six Sigma Analyze Phase Introduction. TECH QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Behavioral Entropy of a Cellular Phone User

The Early Development Instrument: A Tool for Monitoring Children s Development and Readiness for School

A Review of Methods. for Dealing with Missing Data. Angela L. Cool. Texas A&M University

Binary Logistic Regression

The Developing Person Through the Life Span 8e by Kathleen Stassen Berger

Psychology Courses (PSYCH)

Accountability Brief

Missing data in randomized controlled trials (RCTs) can

ACT Research Explains New ACT Test Writing Scores and Their Relationship to Other Test Scores

THE SELECTION OF RETURNS FOR AUDIT BY THE IRS. John P. Hiniker, Internal Revenue Service

Courses Descriptions. Courses Generally Taken in Program Year One

MULTIPLE REGRESSION WITH CATEGORICAL DATA

General Symptom Measures

AGE AND EMOTIONAL INTELLIGENCE

Undergraduate Catalog

West Virginia Children and Families Funding Study

Virtual Child Written Project Assignment. Four-Assignment Version of Reflective Questions

The test uses age norms (national) and grade norms (national) to calculate scores and compare students of the same age or grade.

Guidelines for Documentation of Attention Deficit/Hyperactivity Disorder In Adolescents and Adults

Module 3: Correlation and Covariance

In the past decade, U.S. secondary schools have

Stages of development

Master of Arts, Counseling Psychology Course Descriptions

Integration of Children with Developmental Disabilities in Social Activities. Abstract

Early Childhood Study of Language and Literacy Development of Spanish-Speaking Children

ORIGINAL ATTACHMENT THREE-CATEGORY MEASURE

Courses in the College of Letters and Sciences PSYCHOLOGY COURSES (840)

UNIVERSITY OF SOUTH ALABAMA PSYCHOLOGY

Student Course Evaluations: Common Themes across Courses and Years

Comprehensive Reading Assessment Grades K-1

A Test Of The M&M Capital Structure Theories Richard H. Fosberg, William Paterson University, USA

WORK DISABILITY AND HEALTH OVER THE LIFE COURSE

The Intergenerational Transmission of Schooling Are Mothers Really Less Important than Fathers?

Descriptive Statistics

Social determinants of mental health in childhood a gene-environment perspective.

Feifei Ye, PhD Assistant Professor School of Education University of Pittsburgh

Chapter 10 Personality Name Period Date. MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

Long-term impact of childhood bereavement

Transcription:

Behavior Genetics, Vol. 15, No. 5, 1985 Multiple Regression Analysis of Twin Data J. C. DeFries 1 and D. W. Fulker 1 Received 18 Feb. 1985--Final 10 May 1985 A multiple regression model for the analysis of twin data is described in which a cotwin's score is predicted from a proband's score and the coefficient of relationship (R = 1.0 and 0.5 for identical and fraternal twin pairs, respectively). This model is especially appropriate for the analysis of data on twins in which one member of each pair has been selected because of a deviant score, e.g., low reading performance. When the model is fitted to such data, the partial regression of the cotwin's score on the coefficient of relationship provides a powerful test of the extent to which the difference between the mean for probands and that for the unselected population is heritable, i.e., a test for genetic etiology. By fitting an augmented model containing an interaction term to either selected or unselected data sets, direct estimates of heritability and the proportion of variance due to shared environmental influences can also be obtained (subject, of course, to the usual assumptions underlying twin analyses, e.g., a linear polygenic model, little or n~o assortative mating, and equal shared environmental influences for identical and fraternal twins). KEY WORDS: heritability; multiple regression; reading disability; twins. INTRODUCTION Twin studies of psychopathology typically ascertain affected index cases (i.e., probands) and then assess the status of their cotwins. For categorical variables, such as the presence or absence of a psychiatric illness, a simple comparison of identical (MZ) and fraternal (DZ) concordance rates provides a sufficient test for genetic etiology. However, if the probands have This work was supported in part by a program project grant from the NICHD (HD-11681). 1 Institute for Behavioral Genetics, University of Colorado, Boulder, Colorado 80309. 467 0001-8244/85/0900-0467504.50/0 9 1985 Plenum Publishing Corporation

468 DeFries and Fulker been selected because of deviant scores on a continuous variable, the differential regression of cotwin's scores toward the mean of the unselected population is a more appropriate test. For example, when probands have been selected on the basis of low test scores, cotwins ofdz probands are expected to have higher scores than cotwins of MZ probands if the condition is heritable. Thus, a simple t test of the difference between the means for cotwins of MZ and DZ probands would suffice when the probands have identical mean scores. However, the simultaneous regression of the cotwin's score on the proband's score and coefficient of relationship (R = 1.0 and 0.5 for MZ and DZ pairs, respectively) yields a more general test. Furthermore, when the interaction between the proband's score and the coefficient of relationship is added to the regression equation during a second step in the analysis of these data, direct estimates of heritability (h e ) and the proportion of variance due to common or shared environmental influences (c z) potentially relevant to the unselected population are also obtained. In general, regression analyses of attenuated data are more appropriate than correlation analyses because regression coefficients are less influenced by restriction of range of the dependent variables (Cohen and Cohen, 1975, p. 65; Morton, 1982, p. 60). The primary objective of the present short communication is to illustrate how multiple regression models may be fitted to twin data to provide tests of the relative importance of genetic and environmental influences. Although multiple regression analysis provides a test for genetic etiology when applied to data on twins in which one member of each pair has been selected because of a deviant score, it can also be used to obtain direct estimates of h 2 and c 2 from both selected and unselected samples. The method that we propose for the analysis of twin data provides a simple and flexible alternative to more conventional model-fitting approaches (e.g., Fulker, 1981). Moreover, it can be applied to data on other genetic relationships and may be generalized to accommodate any linear model of genetic and environmental influences. MODELS The basic multiple regression model for the analysis of selected twin data is as follows: C = B1P + BzR + A, (1) where C is a cotwin's predicted score, P is the proband's score, R is the coefficient of relationship, and A is the regression constant. B1 is the partial regression of cotwin's score on proband's score and is a measure of twin resemblance that is independent of zygosity; Bz is the partial

Regression Analysis 469 regression of cotwin's score on the coefficient of relationship and equals twice the difference between the mean for MZ and that for DZ cotwins after covariance adjustment for any difference that may exist between MZ and DZ probands. Therefore, B2 provides a test of significance for genetic etiology analogous to that of differential twin concordance. Furthermore, the ratio of B2 to the difference between the mean for probands and that for the unselected population yields an index of the extent to which the condition is heritable, symbolized hg 2. When probands are selected because of deviant scores, the first step in our analysis is to fit the regression model in Eq. (1) to the data and then evaluate the significance of B2. Subsequently, direct estimates of h 2 and c 2 can be obtained when the following augmented regression model is fitted to the same data set: C = B3P + B4R + BsPR + A, (2) where PR is the product of the proband's score and the coefficient of relationship. Thus, during the second step in the analysis, the cotwin's score is simultaneously regressed on the proband's score, the coefficient of relationship, and their product. B5, the coefficient corresponding to the interaction term (PR), is equal to twice the difference between the MZ and the DZ regression coefficients; thus, B5 is a direct estimate of h 2 (assuming an additive model, little or no assortative mating, and equal shared environmental influences for MZ and DZ pairs) equivalent to that obtained using other estimation procedures. Moreover, when the augmented regression model [Eq. (2)] is fitted to twin data, B3 is a direct estimate of c 2 because it is a measure of twin resemblance that is independent of genetic resemblance as indexed by the interaction term. B4, on the other hand, does not have the simple expectation that B2 has when estimated from the basic model. Multiple regression analysis of twin data using the augmented model is similar to that previously reported by Ho et al. (1980) and Rose and Ditto (1983). However, they indexed zygosity using a categorical dummy variable instead of the coefficient of relationship and employed a hierarchical regression approach that focused largely on the significance of interaction terms. Although their method yields a significance test for genetic influence, it does not provide direct estimates of either h 2 or c a. Thus, a very simple modification of their approach facilitates estimates of genetic and environmental parameters and tests of their significance from data on both selected and unselected samples. APPLICATIONS To illustrate our method, the basic and augmented regression models were fitted to two different sets of data: a simulated data set and reading

470 DeFries and Fulker Table I. Simulated Twin Data to Illustrate Multiple Regression Analysis Data Cotwin Proband Coefficient of relationship 62 54 1.0 62 64 52 50 1.0 1.0 58 48 1.0 54 46 1.0 72 74 54 52 0.5 0.5 64 50 0.5 74 48 0.5 66 46 0.5 Summary statistics ~ Zygosity P C Sp 2 Sc 2 rcp bcp MZ 50 60 10 16 0.79 1.0 DZ 50 70 10 22 0.40 0.6 _P, proband mean; C, cotwin mean; s 2, variance; rcp, correlation; bop, regression of cotwin's score on proband's score. performance data from reading-disabled probands and their cotwins. First, consider the simulated MZ and DZ twin data presented in Table I. Note that the means for MZ and DZ probands are equal, whereas the mean for MZ cotwins is l0 units lower than that for DZ cotwins. As expected for a heritable disorder, a greater regression toward the mean of the unselected population has occurred for DZ cotwins than for MZ cotwins. Thus, when the basic model of Eq. (1) is fitted to these data, B2 = -20.0 -+ 4.8 (P = 0.004). B2 is exactly equal to twice the difference between the mean for MZ and that for DZ cotwins because the means for the MZ and DZ probands are equal in this example. If we assume that the mean of the unselected population is 100, then hg 2 = B2/(50-100) = 0.4 is an estimate of the extent to which the depressed scores of probands are due to heritable causes. It is important to note that this index may differ from h 2, a measure of the extent to which individual differences in the unselected population are heritable. Clearly, the etiology of the difference between the mean for probands and that for the unselected population may differ from that of differences among individuals within the normal range. For example, deviant scores may be due to a major-gene effect, a chromosomal anomaly, a special environmental insult, etc., whereas variation among individuals within

Regression Analysis 471 the normal range may be multifactorial in origin. If, on the other hand, probands merely represent the lower end of a normal distribution of individual differences, hg 2 would be expected to equal h 2 as estimated during the second step in our analysis. From Table I it may also be seen that the regression of the cotwin's score on the proband's score for MZ twins is 1.0, whereas that for DZ pairs is 0.6. Doubling the difference between these two regression coefficients yields an estimate for h 2 of 0.8, and subtracting this estimate from the MZ regression coefficient results in an estimate for c a of 0.2. As expected, when the augmented regression model [Eq. (2)] is fitted to these twin data, exactly the same results are obtained: B5 = 0.8 and B3 = 0.2. Additionally, when the analysis is performed using any available package of statistical computer programs, standard errors for these parameter estimates are automatically provided: 1.80 and 1.42 for h 2 and c 2, respectively. As with any other analysis of these data from five pairs of MZ and five pairs of DZ twins, the parameter estimates for h 2 and c ~ do not approach statistical significance. Given the summary statistics shown in Table I, a sample 20 times larger would be required to a yield a significant h 2. However, it is of some special interest to note that B2 was significant when the basic model of Eq. (1) was fitted to this small data set. This finding suggests that our method may be powerful for detecting a genetic etiology of a disorder even with a relatively small sample of MZ and DZ twin pairs. As expected with attentuated data, the correlations between cotwins' and probands' scores reported in Table I are lower than the corresponding regression coefficients. If we had doubled the difference between the MZ and the DZ correlations to estimate h 2, the resulting value (0.78) would have been similar to that obtained from our regression analysis. However, the estimate for c 2 of 0.01 would have been considerably different. This comparison demonstrates the advantage of regression methods over conventional correlation methods for the analysis of attenuated data sets. We have also recently used the regression models of Eqs. (1) and (2) to conduct some preliminary analyses of data on 29 MZ and 20 DZ twin pairs in which at least one member of each pair is reading disabled. As part of a study of the etiology of reading disability (DeFries, 1985), a newly developed test battery that includes measures of cognitive abilities, reading and language processes, and patterns of electrophysiological activity has been administered to this sample. [See Decker and Vandenberg (1985) for an overview of the twin component of this program project and alternative analyses of data from a smaller sample.] With regard to the Reading Recognition subtest of the Peabody Individual Achievement Test (Dunn and Markwardt, 1970), mean z scores of MZ and DZ probands are

472 DeFries and Fulker - 1.64 and - 1.68, respectively, whereas those for cotwins of the MZ and DZ probands are -1.34 and -0.93. When the basic regression model of Eq. (1) was fitted to these twin data, BE = --0.88 ----- 0.20. This result suggests that the difference between disabled and normal readers is due at least in part to heritable influences. If it is assumed that the mean for control subjects tested in this study (0.0) is equal to that for the unselected population, then he 2 = -0.88/(0.0-1.66) = 0.53. The regression of the MZ cotwin's Reading Recognition score on the proband's score is 0.83, whereas that for DZ pairs is 0.37. Thus, application of the augmented regression model [Eq. (2)] to the twin data yields estimates for h z and c z of 0.92 _+ 0.44 and -0.09 -+ 0.38, respectively. Although hg 2 is somewhat lower than h 2 in this example, they are not significantly different. In fact, the estimate for hg 2 may be too low because the control mean in this study probably exceeds that of the unselected population. DISCUSSION The multiple regression analysis of twin data outlined above has several distinct advantages over conventional model-fitting procedures. First, our method is easy to understand and apply. Multiple regression computer programs are readily available and are familiar to researchers in the behavioral sciences. Small data sets are even amenable to analysis on personal computers using currently available statistical packages such as SYSTAT (Wilkinson, 1984). Second, our models may also be fitted to data on other genetic relationships. For example, when data from nonadoptive and adoptive sibling pairs (coefficients of relationship of 0.5 and 0.0, respectively) are analyzed, B5 and B3 again provide direct estimates of h z and c 2. Third, it is possible to analyze data from more than two relationships simultaneously (e.g., MZ and DZ twins, siblings, parentoffspring pairs, etc.). However, it would be necessary to include additional coefficients as dummy variables producted by probands' scores to model shared environmental influences that vary as a function of relationship. Fourth, as originally proposed by Ho et al. (1980), the method can be easily extended to facilitate age adjustment and/or developmental analyses by including age as another independent variable. Finally, other independent variables, such as ethnic group, gender, socioeconomic status, various environmental indices, etc., can also be readily incorporated in a comprehensive multiple regression analysis. Our method is general for a variety of linear models and may be applied to either selected or unselected data sets but is especially applicable to data on twins in which one member of each pair is selected

Regression Analysis 473 because of a deviant score, e.g., tow reading performance. When the basic model [Eq. (1)] is fitted to such data, the partial regression of the cotwin's score on the coefficient of relationship (B2) provides a test of significance for genetic etiology. When the augmented model [Eq. (2)] is fitted to the same data set, B5 and B3 yield direct estimates of h 2 and c 2, respectively. This estimate of h 2 should be similar to our index of the extent to which the difference between probands and the unselected population is heritable (hg 2) if affected individuals represent the lower end of a normal distribution of individual differences. ACKNOWLEDGMENTS We thank Sadie N. Decker and Steven G. Vandenberg for recruiting the twin sample, George P. Vogler for analyzing the reading data, Robert Plomin for making many helpful suggestions, and Rebecca G. Miles for providing expert editorial assistance. REFERENCES Cohen, J., and Cohen, P. (1975). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences, Lawrence Erlbaum Associates, Hillsdale, N.J. Decker, S. N., and Vandenberg, S. G. (1985). Colorado Twin Study of Reading Disability. In Gray, D., and Kavanagh, J. (eds.), Biobehavioral Measures of Dyslexia, York Press, Parkton, Md. (in press). DeFries, J. C. (1985). Colorado Reading Project. In Gray, D., and Kavanagh, J. (edso), Biobehavioral Measures of Dyslexia, York Press, Parkton, Md. (in press). Dunn, L. M., and Markwardt, F. C. (1970). Examiner's Manual: Peabody Individual Achievement Test, American Guidance Service, Circle Pines, Minn. Fulker, D. W. (1981). The genetic and environmental architecture of psychoticism, extraversion, and neuroticism. In Eysenck, H. J. (ed.), A Model for Personality, Springer- Verlag, New York. Ho, H.- Z., Foch, T. T., and Plomin, R. (1980). Developmental stability of the relative influence of genes and environment on specific cognitive abilities daring childhood. Dev. Psychol. 16:340-346. Morton, N. E. (1982). Outline of Genetic Epidemioiogy, Karger, New York. Rose, R. J., and Ditto, W. B. (1983). A developmental-genetic analysis of common fears from early adolescence to early adulthood. Child Dev. 54:361-368. Wilkinson, L. (1984). SYSTAT: The System for Statistics, SYSTAT, Evanston, Ill Edited by C. Robert Cloninger