Size: px
Start display at page:



1 Annu. Rev. Genet : Copyright c 2001 by Annual Reviews. All rights reserved THE GENETIC ARCHITECTURE OF QUANTITATIVE TRAITS TrudyF.C.Mackay Department of Genetics, Box 7614, North Carolina State University, Raleigh, North Carolina 27695; trudy Key Words quantitative trait loci (QTL), quantitative trait nucleotides (QTN), QTL mapping, linkage disequilibrium mapping, single nucleotide polymorphisms (SNPs) Abstract Phenotypic variation for quantitative traits results from the segregation of alleles at multiple quantitative trait loci (QTL) with effects that are sensitive to the genetic, sexual, and external environments. Major challenges for biology in the post-genome era are to map the molecular polymorphisms responsible for variation in medically, agriculturally, and evolutionarily important complex traits; and to determine their gene frequencies and their homozygous, heterozygous, epistatic, and pleiotropic effects in multiple environments. The ease with which QTL can be mapped to genomic intervals bounded by molecular markers belies the difficulty in matching the QTL to a genetic locus. The latter requires high-resolution recombination or linkage disequilibrium mapping to nominate putative candidate genes, followed by genetic and/or functional complementation and gene expression analyses. Complete genome sequences and improved technologies for polymorphism detection will greatly advance the genetic dissection of quantitative traits in model organisms, which will open avenues for exploration of homologous QTL in related taxa. CONTENTS INTRODUCTION IDENTIFYING GENES AFFECTING QUANTITATIVE TRAITS Mutagenesis QTL Mapping From QTL to Gene From QTL to QTN PROPERTIES OF GENES AFFECTING QUANTITATIVE TRAITS Distribution of Effects Interaction Effects Pleiotropy FUTURE PROSPECTS /01/ $

2 304 MACKAY INTRODUCTION Most observable variation between individuals in physiology, behavior, morphology, disease susceptibility, and reproductive fitness is quantitative, with population variation often approximating a statistical normal distribution, as opposed to qualitative, where individual phenotypes fall into discrete categories. Since quantitative variation is so prevalent, an understanding of the genetic and environmental factors causing this variation is important in a number of biological contexts, including medicine, agriculture, evolution, and the emerging discipline of functional genomics. Quantitative genetics theory invokes an explicit underlying genetic model to describe the genetic architecture of a quantitative trait. Assuming that two alleles segregate at each of multiple QTL affecting variation in the trait, individual genotypes are fully specified by the homozygous (a) and heterozygous (d) effects of each locus, and pair-wise and higher-order interactions (epistasis) between loci (57, 150). However, in contrast to traits controlled by one or a few loci with large effects, variation in quantitative traits is caused by segregation at multiple QTL with individually small effects that are sensitive to the environment. For complex traits, the relationship between genotype and phenotype is not a simple ratio, and QTL genotypes cannot be determined from segregation of phenotypes in controlled crosses or pedigrees. Although individual QTL genotypes cannot be inferred from observations of the phenotype, deductions about the net effects of all loci affecting the trait can be made by partitioning the total phenotypic variance into components attributable to additive, dominance and epistatic genetic variance, variance of genotype-environment interactions, and other environmental variance. These variances are specific to the population being studied, due to the dependence of the genetic terms on allele frequencies at each of the contributing loci, and real environmental differences between populations. Thus, until recently, our knowledge of the genetic architecture of quantitative traits was confined to estimates of the heritability (the fraction of the total phenotypic variance attributable to additive genetic variance) and other variance components from correlations between relatives and response to selection, estimates of average degree of dominance from changes of mean on inbreeding, estimates of net pleiotropic effects from genetic correlations, and estimates of the total mutation rate from phenotypic divergence between inbred lines. If we are to determine the molecular genetic basis of susceptibility to common complex diseases and individual variation in drug response, more effectively select domestic crop and animal species for improved production traits, and understand the genetic basis of adaptation, we need to go beyond these statistical descriptions and to overcome the very impediment that necessitated the biometrical approach to the analysis of quantitative traits: We need to identify and determine the properties of the individual genes underlying variation in complex traits (200). A comprehensive understanding of the genetic architecture of any quantitative trait requires knowledge of: (a) the numbers and identities of all genes in

3 GENETIC ANALYSIS OF COMPLEX TRAITS 305 the developmental, physiological and/or biochemical pathway leading to the trait phenotype; (b) the mutation rates at these loci; (c) the numbers and identities of the subset of loci that are responsible for variation in the trait within populations, between populations, and between species; (d) the homozygous and heterozygous effects of new mutations and segregating alleles on the trait; (e) all two-way and higher-order epistatic interaction effects; ( f ) the pleiotropic effects on other quantitative traits, most importantly reproductive fitness; (g) the extent to which additive, dominance, epistatic, and pleiotropic effects vary between the sexes, and in a range of ecologically relevant environments; (h) the molecular polymorphism(s) that functionally define QTL alleles; (i) the molecular mechanism causing the differences in trait phenotype; and ( j) QTL allele frequencies. No quantitative trait has yet been described at this level of resolution, requiring as it does the integration of classical genetics, evolutionary and ecological genetics, molecular population genetics, molecular genetics, and developmental biology. Here, I review recent progress toward these goals, problems to be overcome, and future prospects. As it is not possible to summarize all of the vast literature on quantitative genetics in a few pages, I apologize in advance to authors whose work is not cited due to space constraints. IDENTIFYING GENES AFFECTING QUANTITATIVE TRAITS Mutagenesis Annotation of completed eukaryotic genome sequences (1, 75, 95, 225, 226, 231) has revealed how few loci have been characterized genetically, even in model organisms, and how many predicted genes exist with unknown functions. Mutagenesis is the classical genetic approach for determining the numbers and identities of all genes in the developmental, physiological, and/or biochemical pathway leading to the wild-type expression of a trait. Is it possible that the functions of the predicted, but unknown, genes have not been elucidated because mutational screens are typically biased towards mutations with large qualitative effects? The answer to this question depends on the nature of genetic variation for quantitative traits, the subject of considerable debate in the past (161, 162, 196, 203). Mather (161, 162) proposed that polygenes affecting quantitative variation were a different class of genetic entity from the oligogenes of Mendelian inheritance. The polygene theory can be phrased in modern parlance as the hypothesis that there is a category of loci at which the mutational spectrum is constrained to alleles with subtle, quantitative effects, as might be expected for genes in a pathway exhibiting functional redundancy. Support for the existence of such a class of genes comes from a mammoth experiment (242), in which 6925 strains of S. cerevisiae were constructed, each containing a precise deletion of 1 of 2026 open reading frames (about one third of the yeast genome). Only 17% of the deleted open reading frames were essential for viability under standard conditions, and 40% showed quantitative growth defects.

4 306 MACKAY In contrast, Robertson & Reeve (203) resisted the idea that polygenes were a different entity from oligogenes, and proposed that it is more parsimonious to consider that so-called oligogenes in fact have a distribution of pleiotropic mutational effects, from lethality to major morphological mutations to virtually indistinguishable isoalleles with quantitative effects. Quantitative variation could then arise as a pleiotropic side effect of segregating mutations with large effects on fitness (e.g., homozygous lethals) or on another trait. Experimental support for this contention comes from the repeated observation that response to artificial selection (and subsequent selection limits) in Drosophila is often attributable to lethal genes with quantitative heterozygous effects on the selected trait (30, 63). Alternatively, quantitative variation could result from the segregation of isoalleles with subtle phenotypic effects at the same loci at which major mutations affecting the trait occur. This possibility was later articulated as the candidate gene hypothesis (152, 201), but the nomination of candidate genes is biased by the (limited) pre-existing understanding of the underlying pathways leading to phenotypic expression of the trait. One part of the answer to these long-standing questions must come from mutagenesis studies where subtle effects of induced mutations on quantitative trait phenotypes are assessed. Such studies are much more labor-intensive than traditional F 1 mutant screens for dominant mutations with large effects that can be scored unambiguously on single individuals. Mutations with quantitative effects that are sensitive to environmental variation can only be detected if multiple individuals bearing the same mutation are evaluated for the trait phenotype. This in turn requires that the presence of a mutation can be recognized independently of its effect, or otherwise assured; that stocks are established for each mutagenized gamete (or chromosome); and that mutagenesis is conducted in a highly inbred (homozygous) genetic background. The sensitivity of the screen is proportional to the number of replicate individuals measured and is determined by the patience of the experimenter. Mobilizing transposons to generate insertional mutations in an inbred background satisfies these criteria. Further, the transposon acts as a molecular tag, enabling the cloning of the affected QTL (47, 123, 251). Given these constraints, few direct screens for mutations with quantitative phenotypic effects have been performed to date. Genetic variation for mouse body weight was significantly increased in lines harboring multiple retrovirus insertions relative to their co-isogenic control lines (103), but the genes responsible were not identified. In Drosophila, highly significant quantitative mutational variation was found for activities of enzymes involved in intermediary metabolism (27), sensory bristle number (148), and olfactory behavior (5) among single P transposable element insertions that were co-isogenic in a common inbred background. Insertion sites were not determined in the enzyme activity study, but statistical arguments suggested that the insertions were highly unlikely to be in enzyme-coding loci (27). Of the 50 insert lines with significant effects on bristle number, 9 were hypomorphic mutations at loci known to affect nervous system development, whereas the remaining 41 inserts did not map to cytogenetic regions containing loci with

5 GENETIC ANALYSIS OF COMPLEX TRAITS 307 previously described effects on adult bristle number (148). Most of the mutational variance for olfactory behavior was attributable to P element inserts in 14 novel smell-impaired (smi) loci (5). Several of the smi loci disrupted by the P element insertions are predicted genes with hitherto unknown olfactory functions, as well as candidate genes with more obvious roles in olfactory behavior (6). Screening for quantitative effects of induced mutations is a highly efficient method both for discovering new loci affecting quantitative traits and determining novel pleiotropic effects of known loci on these traits. The value of this approach has been explicitly touted for phenotype-driven mutagenesis screens in the mouse (174). It has been suggested (174) that mutagenesis can supplant mapping segregating QTL to determine the genetic basis of complex traits. This suggestion has merit if by genetic basis one is referring to identification of loci and pathways important for the normal expression of the complex trait. However, most of the applications of quantitative genetics require that we understand what subset of these loci affect variation of the trait in natural populations. In this case, QTL mapping is currently a necessary first step, since many loci at which mutations affecting the trait can occur will be functionally invariant in nature. QTL Mapping Since the effects of individual QTL are too small to be tracked by segregation in pedigrees, QTL are mapped by linkage disequilibrium with molecular markers that do exhibit Mendelian segregation. The principle of QTL mapping is simple, and was noted 80 years ago (208): If a QTL is linked to a marker locus, there will be a difference in mean values of the quantitative trait among individuals with different genotypes at the marker locus. More formally, consider an autosomal marker locus, M, and quantitative trait locus, Q, each with two alleles (M 1, M 2, Q 1, Q 2 ). The additive and dominance effects of the QTL are a and d, respectively, and the QTL is located c centimorgans (cm) away from the marker locus. If one crosses individuals with genotype M 1 M 1 Q 1 Q 1 to those with genotype M 2 M 2 Q 2 Q 2, and allows the F 1 progeny to breed at random, then the difference in the mean value of the quantitative trait between homozygous marker classes in the F 2 is a(1 2c). Similarly, the difference between the average mean phenotype of the homozygous marker classes and the heterozygote is d(1 2c) 2. If the QTL and marker locus are unlinked, c = 0.5 and the mean value of the quantitative trait will be the same for each of the marker genotypes. The closer the QTL and marker locus, the larger the difference in trait phenotype between the marker genotypes, with the maximum difference when the marker genotypes coincide exactly with the QTL. This suggests the basic principle of a genome scan for QTL. Given a population that is genetically variable for the quantitative trait and a polymorphic marker linkage map, one can proceed to test for differences in trait means between marker genotypes for each marker in turn. The marker in a local region exhibiting the greatest difference in the mean value of the trait is thus the one closest to the QTL. In such a single marker analysis, the estimates of the QTL effects are

6 308 MACKAY confounded by the map distance between the marker and QTL. This problem is not great with a dense marker map, and is avoided by localizing the QTL to intervals between adjacent linked markers (120, 228). The primary limitation for mapping QTL until relatively recently has been the dearth of marker loci. The availability of multiple visible Mendelian mutations in Drosophila facilitated early efforts to map QTL in this species (16, 210, 214, 228, 243); however, such markers were limiting even in Drosophila, and interpretation of the QTL effects was compromised by the potential direct effects of the mutations on the quantitative traits. The discovery of abundant polymorphism in natural populations at the level of electrophoretic mobility of proteins (82, 132) was the leading edge of a wave of studies uncovering genetic variation at the molecular level. We now know that there is abundant molecular polymorphism segregating in most species, at the level of variation at single nucleotides (single nucleotide polymorphisms, or SNPs), short di-, tri-, or tetra-nucleotide tandem repeats (microsatellites), longer tandem repeats (minisatellites), small insertions/deletions, and insertion sites of transposable elements. Methods of detection of molecular variation have evolved from restriction fragment length polymorphism analysis using Southern blots (121, 129), molecular cytogenetic methods to identify transposable element insertion site polymorphism (167) and direct sequencing (115), to high-throughput methods for both polymorphism discovery and genotyping (116). The recent discovery of over two million SNPs in the human genome (227, 231) will no doubt spur the further development of rapid, high-throughput, accurate and economical methods for genotyping molecular markers, including hybridization to high-density oligonucleotide arrays (241). Linkage disequilibrium (LD), or a correlation in gene frequencies, between invisible QTL alleles and visible marker alleles is generated experimentally by crossing lines with divergent gene frequencies at the QTL and markers. LD between marker and QTL alleles also occurs in naturally outbreeding populations within families or extended pedigrees that are segregating for the QTL genotypes. Traditionally, linkage mapping of QTL relies on the LD between markers and trait values that occurs within mapping populations or families. However, LD also occurs in outbred natural populations in which QTL alleles are in driftrecombination equilibrium (88, 238), in populations resulting from an admixture between two populations with different gene frequencies for the marker and mean values of the trait (57, 83, 238), and in populations that have not reached driftmutation equilibrium due to a recent mutation at the trait locus, or a recent founder event followed by population expansion (150, 238). Association mapping of QTL relies on marker-trait linkage disequilibria in such populations. The precision with which a QTL can be localized relative to a marker locus is directly proportional to the number of meioses sampled, which dictates the number of opportunities for recombination between the marker and the trait locus. Linkage analyses, which are typically restricted in the number of individuals or families sampled, thus have a lower level of resolution than do association studies, where many generations of recombination have occurred in nature. QTL mapping is thus

7 GENETIC ANALYSIS OF COMPLEX TRAITS 309 usually an iterative process, whereby an initial genome scan is performed using linkage analysis, followed by higher resolution confirmation studies of detected QTL, and culminating with recombination or association mapping to identify candidate genes. This stepwise approach, whereby one chooses populations for mapping in which the extent of LD is just right relative to the scale with which one wishes to localize the QTL, is not strictly necessary. Association studies at candidate genes can be conducted without a priori evidence for linkage. In the future, given dense polymorphic marker maps and the development of high-throughput genotyping technology, whole-genome scans with sufficient resolution to detect individual loci should be possible, in both linkage and association mapping (197) frameworks. GENOME SCANS Linkage mapping of QTL in organisms amenable to inbreeding begins by choosing parental inbred strains that are genetically variable for the trait of interest. Usually the parent strains will have different mean values for the trait, but this is not necessary, as two strains with the same mean phenotypic value can vary genetically owing to complementary patterns of positive and negative QTL allelic effects. A mapping population is then derived by back-crossing the F 1 progeny to one or both parents, mating the F 1 inter se to create an F 2 population, or constructing recombinant inbred lines (RIL) by breeding F 2 sublines to homozygosity. These methods are very efficient for detecting marker-trait associations, since crosses between inbred lines generate maximum LD between QTL and marker alleles, and ensure that only two QTL alleles segregate, with known linkage phase. The choice of method depends on the biology of the organism, and the power of the different methods given the heritability of the trait of interest (39, 57, 150). Traits with low heritabilities benefit from methods that allow assessment of the phenotype on multiple individuals of the same genotype, as is possible using RIL, or by progeny testing (120, 210, 211, 228). Because RIL are a permanent genetic resource, they need be genotyped only once and can be subsequently used to map a number of traits and to examine QTL by environment interactions. The map expansion in RIL due to the increased number of recombination events during their construction affords more precision of mapping than a similar number of backcross or F 2 individuals, but this advantage is offset by the increased difficulty and time necessary to construct RIL populations. A variant of line cross analysis is to artificially select lines from the F 2 of a cross between divergent inbred strains, and to map QTL by linkage to marker loci that systematically change in gene frequency across replicate selection lines (102, 104, 182, 183, 219). Concomitant with the increasing availability of abundant polymorphic molecular markers in recent years has been the improvement in statistical methods for mapping QTL and development of guidelines for experimental design and interpretation. Least-squares (LS) methods test for differences between marker class means using either ANOVA or regression (212). LS methods have the advantage that they can easily be extended to cope with QTL interactions and fixed effects

8 310 MACKAY using standard statistical packages, but the disadvantage that assumptions of homogeneity of variance may be violated. Maximum likelihood (ML) (110, 120) uses full information from the marker-trait distribution, and explicitly accounts for QTL data being mixtures of normal distributions. However, ML methods are less versatile and computationally intensive, and require specialized software packages. There is, in fact, little difference in power between LS and ML (80, 120), and ML interval mapping can be approximated using regressions (80, 160). Many QTL mapping protocols are one-at-a-time methods, whereby one evaluates the association of a QTL with a marker or marker interval while ignoring the effects of other segregating QTL in the mapping population. The presence of multiple linked QTL biases both single marker and interval mapping analysis (112), and segregation of unlinked QTL inflates the within-marker class phenotypic variance, thus reducing the power of QTL detection. Composite interval mapping (CIM) methods (97, 247) combine ML interval mapping with multiple regression, using marker cofactors to reduce the bias in estimates of QTL map positions and effects introduced by multiple linked QTL, and to increase the power to detect QTL by decreasing the within marker-class phenotypic variation. The principle of CIM has been extended to multiple traits (98), enabling the evaluation of the main QTL effects as well as QTL by trait and QTL by environment interactions. Strictly speaking, CIM methods are not multiple QTL methods, in that the model for evaluating the effects of each interval depends on the marker cofactors included, which varies across intervals. A true multiple interval mapping (MIM) method (99) has been developed that converges to a stable model providing estimates of positions and main and interaction effects of multiple QTL. An important caveat regarding CIM and MIM methods is that estimates of QTL positions and effects are highly model dependent, and can vary given different numbers of marker cofactors and window sizes (the region to either side of the test interval within which no marker cofactors are fitted) (187, 247, 249). These factors are under the control of the investigator, who must bear in mind that the best fitting model, identifying the most QTL, is not necessarily the closest approximation to reality. Outbred populations pose additional challenges for QTL mapping. In experimental populations derived from a cross of two inbred lines, all individuals are informative for both markers and QTL segregation, there are only two QTL alleles at each segregating locus, and the linkage phase between marker and QTL alleles is known. In outbred populations, one is restricted to obtaining information from existing families. Only parents that are heterozygous for both markers and linked QTL provide linkage information, and individuals may differ in QTL-marker linkage phase. Further, not all families will be segregating for the same QTL affecting the trait, and in the presence of genetic heterogeneity different families may show different associations. In outbred populations marker-trait associations are evaluated through the effects of the marker on the genetic variance of the trait, as the variance attributable to a marker is proportional to the additive variance of a linked QTL (150, 178). Methods that utilize pairs of relatives (typically sibs) have been developed for human populations (70, 84, 118, 150).

9 GENETIC ANALYSIS OF COMPLEX TRAITS 311 Two important statistical considerations transcend the experimental design and method of statistical analysis used to map QTL: power and significance threshold. In cases where power is low, not all QTL will be detected, leading to poor repeatability of results, and the effects of those that are detected can be overestimated (12). The second problem pertains to the multiple tests for marker-trait associations inherent in a genome scan. To maintain the conventional experiment-wise significance level of 0.05, one needs to set a more stringent significance threshold for each test performed based on the number of independent tests. With multiple correlated markers per chromosome, the number of independent tests in a genome scan is clearly greater than the number of chromosomes, but less than the total number of tests. Permutation (25, 48) or other resampling methods (249) are widely accepted as providing appropriate significance thresholds. Genome scans for QTL have become a cottage industry since the first report of QTL localizations using a high-resolution molecular marker map (190). Consistent with the economic incentive to improve production traits in agriculturally important crop and animal species, QTL have been mapped for traits of agronomic importance that differ between wild relatives and domestic, selected strains of tomato (188, 190); heterotic yield traits in crosses between elite inbred maize strains (218); growth and fatness in pigs (4, 113, 234); and milk production traits in dairy cattle (72, 213, 250). Knowledge of the loci underpinning variation in susceptibility to complex human disease is essential if the promise of personalized medicine is to be fulfilled. QTL have been mapped in humans for susceptibility to type 1 (41) and type 2 (81) diabetes mellitus, schizophrenia (18), obesity (33), Alzheimer s disease (54, 172), cholesterol levels (111), and dyslexia (21). Given the evolutionary conservation of genes and pathways across species, QTL for medically important and behavioral traits have been mapped in model organisms, with the hope that homologous loci and/or pathways will be implicated in humans. Examples include body size (17, 23, 104, 169, 194, 230), acute alcohol withdrawal (19) and emotionality in mice (59), and longevity in C. elegans (9, 209) and Drosophila (130, 184, 232). The best-case scenario for understanding the genetic basis of quantitative variation is for phenotypes that can be assessed with essentially no measurement error and for which the developmental basis is well understood. Consequently, our laboratory has mapped QTL for sensory bristle number in D. melanogaster (78, 79, 142, 180). Steps toward understanding the genetic basis of speciation have been taken by mapping QTL that affect morphological changes accompanying species divergence in maize (45, 46), monkeyflowers (15), and Drosophila (128, 136, 248). The numbers of QTL mapped in these studies are in all cases minimum estimates of the total number of loci that potentially contribute to variation in the traits (154). Increasing the sample size would enable mapping QTL with smaller effects and enable the separation of linked QTL by virtue of the larger number of recombinant events. The number of QTL detected in crosses between inbred lines is also a minimum because one can only map QTL at which different alleles are fixed in the two parent strains, which are a limited sample of the existing genetic

10 312 MACKAY variation. Methods to widen this genetic net include four-way (96, 246) and eightway (170, 221) cross designs, combining multiple line cross experiments (245), and utilizing parent strains derived by divergent artificial selection from a large base population (79, 142, 180, 237). QTL as defined by genome scans are not genetic loci, but relatively large chromosome regions containing one or more loci affecting the trait. For example, the average size of intervals containing significant QTL from seven recent studies in Drosophila (78, 79, 130, 142, 180, 184, 232) was 8.9 cm and 4459 kb, with a range from 0.1 to 44.7 cm and 98 to 19,284 kb. There are at least 13,600 genes in the 120 Mb of Drosophila eukaryotic DNA (1); an average gene is thus 8.8 kb. Therefore, an average QTL in these studies encompasses 507 genes, with a range from 11 to 2191 genes. The large number of genes in intervals to which QTL map, limited genetic inferences that can be drawn from analysis of most mapping populations, and poor repeatability of many initial QTL mapping efforts that had low power due to small samples have engendered some pessimism about the value of this approach (174). However, understanding the genetic basis of naturally occurring variation is too important a problem to ignore the challenge of mapping QTL to the level of genetic locus, and there are success stories to serve as beacons guiding us through the fog. HIGH-RESOLUTION MAPPING The size of the genomic region within which a QTL can be located is determined by the scale of LD between the gene corresponding to the QTL and marker loci flanking the QTL. High-resolution mapping entails successively reducing the size of the region exhibiting LD with the QTL ultimately to a single gene, either by generating recombinant genotypes between the two strains of interest, or by screening for historical recombination events in natural populations. Given a high-density polymorphic molecular marker map spanning the interval in which the QTL is located, and a population of recombinant genotypes with breakpoints between adjacent markers, it is a simple matter in principle to define the QTL position relative to a pair of flanking markers. The challenge posed for high-resolution QTL mapping, in contrast to mapping Mendelian loci, is that individual QTL are expected to have small effects that are sensitive to the environment, and therefore the phenotype of a single individual is not a reliable indicator of the QTL genotype. Successful high-resolution QTL mapping is thus contingent on increased recombination in the QTL interval, and accurately determining the QTL genotype (39). One way to increase the precision of mapping in line crosses is to use more advanced generations of the cross than the F 2 for the initial genome scan, since the number of recombinations increases proportionately with the number of generations (40). In organisms amenable to genetic manipulation, the effects of individual QTL can be magnified by constructing strains that are genetically identical except for a defined region surrounding the QTL. These strains are a permanent genetic resource from which multiple measurements of the trait phenotype can be obtained to increase the accuracy of determining the QTL genotype. The region surrounding

11 GENETIC ANALYSIS OF COMPLEX TRAITS 313 the QTL can be a whole chromosome (chromosome substitution lines) or a smaller interval [variously called introgression lines, interval-specific congenic strains (38), and near-isoallelic lines (NIL)]. NIL are constructed by recurrent backcrosses to one parent, accompanied by selection for the marker associated with the QTL and against other markers associated with the genotype of the nonrecurrent strain. QTL can be mapped with high resolution by constructing multiple independent introgressions, ascertaining recombination breakpoints for each using a molecular marker map of the introgressions, and determining the phenotypic effect of each introgression genotype (189). In theory (38), effort spent in systematically generating informative recombinants across a QTL interval, using molecular markers to track recombination breakpoints, can result in localizing QTL to 1-cM intervals. If constructing introgression lines is necessary for high-resolution mapping, why not dispense with the traditional low-resolution genome scan and develop methods that both map QTL and produce the first-generation introgression lines simultaneously? In Drosophila, the availability of balancer chromosomes enables the rapid construction of chromosome substitution lines. This technique has been adopted to substitute single homozygous chromosomes from a high-scoring strain into the homozygous genetic background of a low-scoring strain, and to subsequently map QTL one chromosome at a time (79, 142, 210, 237). Chromosome substitution strains have been constructed in mice (175), enabling chromosomal localization of QTL by screening just 20 lines. Interval-specific introgression was first used to map QTL for Drosophila bristle number (16). More recently, backcrossing with selection for molecular markers was used to construct introgression lines for mapping QTL for tomato fruit traits (55) and morphological traits associated with divergence between two Drosophila species (128). Segregating variation in intervals surrounding several Drosophila bristle number candidate QTL (154) was demonstrated by constructing NIL for approximately 50 different naturally occurring alleles of each candidate gene (140, 147, 149). Introgression can also be accomplished by selecting on the quantitative trait, rather than markers, to identify QTL of large effect while producing NILs for further analysis (87). When combined with progeny testing of recombinant genotypes within intervals (55, 210, 214, 228, 243), linked QTL with small effects within each interval can be individually identified. There are several examples of successful fine-scale recombination mapping of QTL. (a) Idd3, a susceptibility allele for type 1 diabetes mellitus in the mouse, maps to a 145-kb interval containing a single known gene, Il2 (151). (b) NIDDM1, a susceptibility gene for human type 2 diabetes mellitus, was mapped by linkage analysis to a 5-cM region that, fortuitously, is a region of high recombination corresponding to 1.7 Mb, containing 7 known genes and 15 ESTs (91). (c) The tomato fruit weight QTL, fw2.2, was localized to a 1.6-cM interval containing four unique transcripts (65). (d) A tomato fruit-specific apoplastic invertase gene Lin5 has been shown unambiguously to correspond to the Brix9-2-5 QTL, which affects fruit glucose and fructose contents, since recombinants in a 484-bp region within this gene cosegregate with the QTL phenotype (67).

12 314 MACKAY Drosophila geneticists have an additional tool for mapping QTL to sub-cm intervals: deficiency chromosomes. To fine map Mendelian mutations, the chromosome containing the mutant allele is crossed to the set of deficiencies whose breakpoints span the interval to which the gene has been localized, and complementation or failure to complement is recorded for the progeny containing the deficiency chromosomes. The location of the mutation is then delineated by the region of nonoverlap of deficiencies complementing the mutant phenotype with those that fail to complement the mutant. Deficiency complementation mapping has been extended to mapping QTL with small, additive effects and to control for the effects of other QTL outside the region uncovered by the deficiency (154, 187). This method has been used to resolve QTL for Drosophila life span that were initially detected by recombination mapping (184) into multiple linked QTL, one of which mapped to a 50-kb interval containing one known and two predicted genes (187). One caveat when interpreting quantitative failure to complement deficiencies is that, as for all complementation tests, the failure to complement could be attributable to an epistatic, rather than an allelic, interaction. Thus, QTL inferred using this method must be subsequently confirmed using a different approach. In organisms where the genetic background cannot be manipulated and controlled crosses cannot be made (e.g., humans), the only option for high-resolution mapping is LD mapping, which utilizes historical recombinations, as discussed in the next section. The level of difficulty in detecting QTL in natural populations is greater than the already considerable challenge posed in genetic model organisms, since the existence of segregating genetic variation at other QTL affecting the trait translates operationally to a smaller marginal effect of any one QTL, and thus very large samples are needed. FromQTLtoGene The high-resolution mapping methods discussed above will map QTL to genetic intervals containing several genes. How can one determine which genes correspond to the QTL? Co-segregation of intragenic recombinant genotypes in a candidate gene with the QTL phenotype, as demonstrated for the tomato apoplastic invertase gene and fruit sugar content QTL (67), constitutes clear genetic proof that the QTL corresponds to the candidate gene. Functional complementation, in which the trait phenotype is rescued in transgenic organisms, is another gold standard for gene identification. The fw2.2 tomato fruit weight QTL was shown to correspond to the ORFX gene by transforming the large-fruit domestic tomato with a cosmid containing the ORFX transcript and observing a significant reduction in fruit weight in the transformants (65). The multiplicity and size of intestinal tumors induced by the dominant Apc Min mutation in mice are modified by the Mom1 mutation. Intestinal tumors are suppressed in Apc Min mutant mice that are heterozygous for a cosmid transgene overexpressing the Mom1 candidate gene Pla2g2a (encoding a secretory phospholipase), strongly suggesting that Mom1 is an allele of Pla2g2a (34). These studies were facilitated

13 GENETIC ANALYSIS OF COMPLEX TRAITS 315 by the large effects of the QTL, and their dominant gene action. If the effect of the QTL is not large, it is necessary to construct multiple independent transgenic lines to account for changes in expression of the transgene due to different insertion sites. Not all QTL will be identifiable with the standard of rigorous proof provided by intragenic recombination and functional complementation. The only option in these cases is to collect multiple corroborating pieces of evidence, no single one of which is convincing, but which together consistently point to a candidate gene. These lines of evidence include genetic complementation tests, LD mapping, quantitative differences of gene expression and/or protein function, and tests for orthologous QTL in other species. GENETIC COMPLEMENTATION Perhaps the simplest method to infer identity of a candidate gene and a QTL is genetic complementation to a mutant allele of the candidate gene. For example, a QTL explaining most of the difference in inflorescence morphology between maize and teosinte mapped to a region including the maize gene, teosinte-branched1 (tb1), and the QTL effect was similar to the mutant phenotype of tb1 (44, 45). A NIL containing the teosinte QTL in a maize background failed to complement the maize tb1 mutant allele, suggesting that tb1 is the QTL (46). Genetic complementation of QTL and mutant alleles of candidate genes need not be restricted to recessive QTL with a qualitative phenotypic effect. The logic of a quantitative complementation test is the same as quantitative deficiency complementation mapping (79, 141, 155). QTL for Drosophila sensory bristle number quantitatively failed to complement mutations at candidate genes affecting peripheral nervous system development (79, 141, 147, 149, 155). Currently, quantitative complementation is only feasible for organisms in which controlled crosses can be made. An unbiased application of the method also requires the existence of (preferably null) mutant stocks at all loci in the region to which the QTL maps. The examples above were for cases where obvious candidate genes colocalized with the QTL. In many cases, QTL will map to regions where there are no obvious candidate genes (180), and, for some traits, one can implicate almost any locus as a candidate gene. For such regions, there is no option but to test all possible genetically defined loci. Ultimately, this must be done even for those regions containing obvious candidates, since some QTL may correspond to loci with undescribed and unexpected pleiotropic effects on the trait. Practical considerations dictate that the QTL interval must first be delimited to a manageable number of genes before a systematic quantitative complementation analysis of each gene in the region is attempted. Strains containing single gene knock-outs of all known and predicted genes in model organisms (42, 66, 76, 179, 216, 242) will be an exceptionally valuable resource for quantitative complementation tests in the future. One caveat regarding genetic complementation tests, whether qualitative or quantitative, is that failure to complement is indicative of a genetic interaction, but cannot discriminate whether the interaction is allelic or epistatic. Thus, failure

14 316 MACKAY to complement imputes candidate genes for further study, but does not constitute proof that the QTL and the candidate gene are the same entity. Corroborating evidence includes gene expression patterns (47) and associations of molecular polymorphisms in the candidate gene with phenotypic variation in the trait (139, 147). LINKAGE DISEQUILIBRIUM (ASSOCIATION) MAPPING When a new mutation occurs in a population at a locus affecting a quantitative trait, all other polymorphic alleles in that population will initially be in complete LD with the mutation. Over time, however, recombination between the mutant allele and the other loci will create the missing haplotypes and restore linkage equilibrium between the mutant allele at all but closely linked loci. The length of the genomic fragment (in cm) surrounding the original mutation in which LD between the QTL and other loci still exists depends on the average amount of recombination per generation experienced by that region of the genome, the number of generations that have passed since the original mutation, and the population size (57, 83, 88, 238). For old mutations in large equilibrium populations, strong LD is only expected to extend over distances of the order of kilobases or less. Larger tracts of LD are expected in expanding populations derived from a recent founder event or in population isolates with small effective size. Nevertheless, the scale of LD in outbred populations is expected to be small enough that only very closely linked markers will be associated, such that screening for LD between QTL alleles and polymorphic molecular markers in outbred populations is a natural solution to the problem of generating informative recombinants within loci and between adjacent loci. Association mapping can be used to systematically screen candidate loci in an interval defined by linkage mapping, or to evaluate associations at candidate loci even in the absence of linkage information. It has been suggested that the two million SNPs that are the fruit of the human genome project (227, 231) will herald a new era of detecting loci for susceptibility to complex human diseases using genome-wide LD mapping with a dense panel of SNP markers and a sample of individuals from a natural population, dispensing with traditional linkage mapping entirely (197). The simplest designs for evaluating association between markers at a candidate gene and a quantitative trait only require a sample of individuals from the population of interest, each of whom has been genotyped for the marker loci and evaluated for the trait phenotype. In the case-control design for dichotomous traits, such as disease susceptibility, the population sample is stratified according to disease status, and LD between a marker and the trait is revealed by a significant difference in marker allele frequency between cases and controls (20). For continuously distributed traits, the population sample is stratified by marker genotype, and marker-trait LD is inferred if there is a significant difference in trait mean between marker genotype classes. Testing for trait associations of multiple markers again requires that an appropriate downward adjustment of the significance threshold be made. Permutation tests developed in the context of other single-marker genome scans are ideal in this regard (25, 48, 139, 147).

15 GENETIC ANALYSIS OF COMPLEX TRAITS 317 Unfortunately, several factors complicate this picture. First, marker-trait associations in natural populations are not necessarily attributable to linkage, but can be caused by admixture between populations that have different gene frequencies at the marker loci and different values of the trait (57, 83). This problem can be alleviated by experimental designs that control for population structure and jointly test for linkage and association. For example, the transmission-disequilibrium test (TDT) (215) for dichotomous traits utilizes population samples of trios of affected offspring and their parents. Alleles at polymorphic marker loci linked to a gene affecting the trait will be preferentially transmitted to affected offspring. This design has been extended to various sampling designs for quantitative traits (3), larger nuclear families (195), genotypic data with multiple alleles (177), and half-sib families (244). Experimental designs to control for the potential confounding effects of admixture are more complicated and less attractive than simple samples from the population. Isolated populations descended from a small numbers of founders are thus thought to be advantageous for association studies, since LD should extend over larger distances, requiring a lower density of marker loci, and admixture can be assumed to be absent or negligible. For the case of rare disease susceptibility alleles, an additional advantage of isolated populations is that most disease-bearing chromosomes are likely to descend from a single ancestral chromosome in the founding population. A successful application of LD mapping in isolated populations is the positional cloning of the human sulfate transporter locus, associated with autosomal recessive diastrophic dysplasia (DTD) in Finland (85). If the genotypes at all polymorphic sites in the region of interest are determined, one of them must correspond to the site causing the phenotypic effect (the QTN, or quantitative trait nucleotide) (138, 139). The power to detect an association between the QTN and the trait phenotype is much higher than linkage studies, even after correcting for multiple tests for association (197). Thus, a diseasesusceptibility allele with a relative risk of 2 can be detected in association study with 1000 individuals. However, genotyping all polymorphic sites in 1 Mb (the size of interval to which fine-mapping can localize a QTL) for 1000 individuals is currently laborious and expensive. For a typical level of nucleotide heterozygosity of 0.001, the population genetic expectation is that there will be 5200 segregating sites per Mb in a sample of only 100 individuals (236). Consequently, association studies are typically based on a subset of markers in LD with, but probably not including, the QTN polymorphism. Analogous to single-marker analysis of segregating generations derived from line crosses, the estimate of the effect of a linked marker on the trait underestimates the effect of the polymorphism at the QTN (122, 177). Therefore, if one evaluates marker-trait associations for several linked marker loci in the region of a QTL, the basic theory predicts that the marker associated with the largest effect on the trait is closest to the casual polymorphic site. The consequence of conducting association studies with a subset of marker loci is that the power to detect an association depends not only on the number of

16 318 MACKAY individuals sampled, but also on the density of polymorphic markers. Samples of at least 500 individuals are necessary to detect a QTN contributing 5% of the total phenotypic variance with 80% power (138, 146). Further, potentially important associations will be missed unless the polymorphic markers are spaced such that adjacent markers show some degree of LD. In theory, this will be the physical distance corresponding to 4Nc, where N is the effective population size (117, 138), which can be estimated from plots of all possible pair-wise LD between markers in a genomic region against their physical distance (86, 94). Unfortunately from the perspective of developing general guidelines, the average physical distance corresponding to 4Nc varies by at least an order of magnitude among different gene regions, both in humans (20, 138) and in Drosophila (2, 165, 166). Further, the association between LD and physical distance breaks down over short physical distances, with some closely linked sites in linkage equilibrium and others in strong disequilibrium (28, 49, 165, 166, 176, 222). While some of the region-to-region and within-region variation in the relationship of LD to distance is attributable to statistical sampling error, much of the variation reflects the very real differences in recombination rates (2, 13), natural selection, genetic drift, marker mutations, and other genetic processes (e.g., gene conversion), both on a regional and local scale (89, 90, 238, 239). Optimally, one should adjust the spacing of markers in an association study relative to variation in LD across the region of interest. For most regions, multiple markers will be required per gene. In Drosophila regions of high polymorphism and recombination, the optimal spacing of markers could be as small as 200 base pairs (139). These considerations seriously compromise the feasibility of whole-genome LD mapping (117, 240), especially given simulation (117) and empirical results (49, 222) demonstrating that mean levels of LD and the relationship between LD of markers at intermediate frequency and physical distance are not significantly different in mixed and isolated populations. An example of the successful application of LD mapping is the positional cloning of the human calpain-10 (CAPN10) gene, a putative susceptibility gene for type 2 diabetes in the interval containing the NIDDM1 QTL (91). Associations between 21 SNPs and diabetes susceptibility in the 5-cM region to which NIDDM1 mapped narrowed the search to three genes in a 66-kb interval: CAPN10, G-protein coupled receptor 35 (GPR35), and a gene encoding a protein with homology to aminopeptidase B (RNPEPL1). Additional SNPs in this interval were detected by re-sequencing in 10 diabetic patients. Association tests for 63 SNPs indicated that 16 SNPs in CAPN10 and GPR35 were associated with diabetes, but that only 1 SNP in CAPN10 was associated both with the disease and evidence for linkage, implicating CAPN10 and not GPR35 as the disease-susceptibility locus. This conclusion is supported by a haplotype analysis indicating that haplotypes at three polymorphic SNPs in CAPN10 were associated with diabetes in the study population, a finding that was confirmed for two unrelated populations. ORTHOLOGOUS QTL Prospects for describing variation for quantitative traits in terms of contributions of individual loci are bright if the same loci affect variation

17 GENETIC ANALYSIS OF COMPLEX TRAITS 319 for the trait in different populations and across taxa. Classical quantitative genetic analyses and QTL mapping both suggest that many of the same loci cause variation in bristle number in different geographical populations. If different loci caused variation for bristle number in different populations, one would expect that the limit to selection from any one population would be less than limits achieved in synthetic populations derived by crossing; this is not the case (143, 144). Further, the same genomic regions containing QTL for Drosophila sensory bristle number recur in studies that span a period of nearly 40 years, for populations of diverse origin, and using different mapping methodologies (78, 180, 210, 214, 243). An intriguing observation of early QTL mapping studies was that QTL for the same traits mapped to similar locations in different species (188, 191). This suggests a strategy whereby genes corresponding to QTL can be identified in genetically tractable model organisms, then evidence for linkage and association for the orthologous locus sought in other species. The non-obese diabetic (NOD) mouse is a model for human type 1 diabetes. Interleukin-12 is functionally associated with disease progression in the NOD mouse, which motivated a search for association of the human homolog, IL12B, with susceptibility to type 1 diabetes (168). There was strong evidence for linkage in the region containing IL12B, and significant LD and linkage was confined to a 30-kb region in which IL12B is the only known gene. Conversely, one could map a QTL with high resolution in one species and conduct functional assays for correspondence of the genes in the interval with the QTL in a model organism. A human asthma QTL was mapped to 1-Mb interval containing 6 cytokine genes and 17 partially characterized genes. Quantitative changes in asthma-related phenotypes were observed in transgenic mice expressing either of two human candidate genes, interleukin 4 (IL4) and interleukin 13 (IL13), but not in mice transgenic for the other candidate genes, implicating IL4 and IL13 as the loci corresponding to the asthma QTL (220). FromQTLtoQTN A description of the genetic architecture of quantitative traits is not complete until we can specify which polymorphic sites in the gene we have identified as corresponding to the QTL actually cause the difference in the trait phenotype the quantitative trait nucleotides (QTN). To put this problem in perspective, consider the levels of polymorphism observed in the candidate genes discussed above. Direct sequence comparisons have revealed 42 nucleotide differences distinguishing the 2 parental alleles of the tomato fruit weight QTL, ORFX/fw2.2 (65); 11 molecular variants in 484 bp differentiate the parental alleles of the tomato QTL affecting fruit sugar content, Lin5/Brix9-2-5 (67); 88 variant sites in 71 individuals for the human lipoprotein lipase gene (176); and there are at least 108 variant sites in CAPN10 (91). Polymorphism at the Drosophila bristle number QTL scabrous (sca) and Delta (Dl) was determined using low-resolution restriction map surveys of approximately 50 alleles derived from nature. There were 27 polymorphisms

18 320 MACKAY in the sca gene region (122) and 53 polymorphisms at Dl (139). In contrast, there are no fixed differences between the maize and teosinte alleles of tb1, despite considerable polymorphism at this locus within each species (235). The same LD mapping methods used to focus on a candidate gene can be used to determine which of the polymorphic sites at the candidate gene is associated with the quantitative trait phenotype. Again, samples of 500 individuals or more are required to detect QTN accounting for 5% of the phenotypic variance or less. Ideally, one would obtain complete sequences of the candidate gene for each of these individuals. However, until re-sequencing technology becomes more rapid and cost-effective, associations within a candidate gene will be based on a subset of the polymorphic sites. The choice of sites should be motivated by the pattern of LD in the candidate gene region, and not by preconceived notions of what the causal sites might be (e.g., polymorphisms in coding regions). Three contrasting scenarios can be imagined. (a) If all polymorphic sites in the candidate gene are in complete LD, it is only necessary to examine associations with one of the sites. However, if an association is detected, it will not be possible to use LD mapping in that population to make inferences about which of the sites is causal. Either another population with a different evolutionary history must be sought, or alternate methods developed. (b) If there is no significant LD between any of the polymorphic sites in the candidate gene, all sites must be utilized in the association study, but associations detected at any one site are more likely to be causal. However, one can never preclude the existence of a true causal site in strong LD with the site associated with variation in the trait that is outside the surveyed region. (c) Some sites in the candidate gene are in LD, others are not. This is the usual case, for which sequence information for a sample of alleles would greatly inform the choice of markers. To date, no study of genotype-phenotype associations at candidate genes has utilized the optimal density of markers. Indeed, many studies in humans have been conducted with a single marker, leading to difficulty in replicating associations due to variation among populations in the degree of LD between the marker and the putative causal variant (20). The few studies in which multiple polymorphic markers within candidate genes have been examined for association with phenotypic variation reveal some interesting features. All kinds of molecular variation (transposable element insertions, small insertions/deletions, and SNPs) of Drosophila bristle number candidate genes have been associated with quantitative variation in bristle number, and all significant associations have been for molecular polymorphisms in introns and noncoding flanking regions (122, 139, 140, 147, 157). Polymorphisms in noncoding regions of the CAPN10 putative diabetes susceptibility locus in humans are associated with disease risk (91). It is thus possible that variation in regulatory sequences causes slight differences in message levels, timing and tissue-specificity of gene expression, and protein stability that lead to subtle quantitative differences in phenotype. The CAPN10 story is unexpectedly complex. Two haplotypes of three SNPs in this gene have significantly increased risk of diabetes, but the at-risk genotype is homozygous for one SNP and

19 GENETIC ANALYSIS OF COMPLEX TRAITS 321 heterozygous for the others; homozygotes for either haplotype do not have increased risk (91). This suggests that at least two interacting risk factors within a single gene are required to affect susceptibility to diabetes. Correlation is not causality, and the ultimate proof that a molecular variant is functionally associated with differences in phenotype will be biological, not statistical. One such functional test is to construct by in vitro mutagenesis alleles that differ for each of the putative QTN, separately and in combination, and to assess their phenotypic effects by germ-line transformation into a null mutant background (24, , 217). Mimicking complex effects such as those observed at CAPN10 could prove to be challenging. PROPERTIES OF GENES AFFECTING QUANTITATIVE TRAITS Distribution of Effects The mathematical formalization of Mather s polygene hypothesis is the infinitesimal model, which assumes genetic variation for quantitative traits is caused by a very large number of QTL with very small and equal allelic effects (57, 150). Although convenient for theoretical modeling of quantitative variation, this model makes little sense genetically, as it predicts all loci equally (and trivially) affect all conceivable quantitative traits. Robertson (200) proposed that the distribution of allelic effects should be more nearly exponential, whereby a few loci have large effects and cause most of the variation in the traits, with increasingly larger numbers of loci with increasingly smaller effects making up the remainder. If Robertson s model is correct, it will be feasible to understand the salient features of the genetic architecture of quantitative traits by a detailed study of those relatively few loci that contribute most of the genetic variation. If the infinitesimal model is largely correct, understanding quantitative genetic variation in terms of individual QTL is a hopeless enterprise. Under this model, the effect of each locus contributing to variation in a quantitative trait is so small that it cannot, by definition, be measured. Several lines of evidence indicate that variation for quantitative traits is not accounted for by a very large number of loci with equal and small effects. First, only a small fraction of the theoretical maximum response to artificial selection predicted by the infinitesimal model (199) is achieved experimentally (57). Second, distributions of effects of new P element insertional mutations on Drosophila quantitative traits are highly skewed and leptokurtic (148, 159), with most mutational variance attributable to a few mutations with large effects. Third, observed distributions of QTL effects are also in accord with Robertson s prediction (15, 50, 188, 191, 210, 223). This observation on its own is not a strong refutation of the infinitesimal model, however, since undetected QTL with small effects and overestimation of effects of detected QTL could be an artifact of small sample size (12) or LD between QTL, low heritability, and epistatic interactions between QTL (14). Further, observed unequal distributions of QTL effects could reflect variation

20 322 MACKAY in the precision of estimating the QTL positions. If this were true, QTL effects would be directly proportional to the fraction of the genome embraced by the confidence limits marking the location of the QTL. This is not generally true; in many cases, QTL with the largest effect map to the smallest genetic regions (180). Of course, high-resolution recombination mapping (47, 65, 67) and transformation of cloned QTL (65, 220) prove the existence of at least some QTL with large effects, and significant associations of molecular polymorphisms at candidate genes with moderate-to-large QTL effects is consistent with an exponential distribution of effects. Finally, simulation studies have shown that the distribution of effects of genes that are fixed during adaptation is exponential (185, 186). Interaction Effects QTL effects typically vary according to the genetic, sexual, and external environment. If the magnitude and/or the rank order of the differences in phenotypes associated with QTL genotypes also changes depending on the genetic or environmental context, then that QTL exhibits genotype by genotype interaction (epistasis), genotype by sex interaction (GSI), or genotype by environment interaction (GEI) (154). Extensive epistasis, GSI, and GEI have practical and theoretical consequences. First, estimates of QTL positions and effects are relevant only to the sex and environment in which the phenotypes were assessed and may not replicate across sexes and in different environments. Second, estimates of main QTL effects will be biased in the presence of epistasis. Third, genetic variation for quantitative traits can be maintained at loci exhibiting GSI and GEI (68, 74, 131). GENOTYPE BY GENOTYPE INTERACTIONS (EPISTASIS) Any developmental or biochemical pathway that culminates in the expression of a quantitative trait is comprised of networks of loci that interact at the genetic and molecular levels. Genetic variation at some of these loci is causally connected to phenotypic variation in the trait. To what extent does this translate to interaction between loci at the level of the phenotype, i.e., epistasis? In classical Mendelian genetics, epistasis between loci with alleles of large effect refers to the masking of genotypic effects at one locus by genotypes of another, as reflected by distorted segregation ratios in a di-hybrid cross (192). The usage of the term in quantitative genetics is much broader, and refers to any statistical interaction between genotypes at two (or more) loci (22, 31, 106, 163). Epistasis can refer to a modification of the homozygous or heterozygous effects of the interacting loci. Epistasis can be synergistic, whereby the phenotype of one locus is enhanced by genotypes at another locus; antagonistic, in which the difference between genotypes at a locus is suppressed in the presence of genotypes at the second locus; or even produce novel phenotypes. The theory for estimating the contribution of epistatic interactions to the total phenotypic variance of a trait in outbred populations and crosses among inbred lines is well developed. However, epistasis is difficult to detect in these designs, partly because even strong epistatic interactions contribute little to the epistatic

Review. Which genetic marker for which conservation genetics issue? Nucleic acids. 1 Introduction. Contents. Electrophoresis 2004, 25, 2165 2176 2165

Review. Which genetic marker for which conservation genetics issue? Nucleic acids. 1 Introduction. Contents. Electrophoresis 2004, 25, 2165 2176 2165 Electrophoresis 2004, 25, 2165 2176 2165 Review Qiu-Hong Wan 1 Hua Wu 1 Tsutomu Fujihara 2 Sheng-Guo Fang 1 1 College of Life Sciences, State Conservation Center for Gene Resources of Endangered Wildlife,

More information

Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R

Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R United States Office of Environmental EPA/240/B-06/002 Environmental Protection Information Agency Washington, DC 20460 Data Quality Assessment: A Reviewer s Guide EPA QA/G-9R FOREWORD This document is

More information

Guidance for Industry

Guidance for Industry Guidance for Industry Adaptive Design Clinical Trials for Drugs and Biologics DRAFT GUIDANCE This guidance document is being distributed for comment purposes only. Comments and suggestions regarding this

More information


106 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 11, NO. 2, SECOND QUARTER 2009. survivability. 106 IEEE COMMUNICATIONS SURVEYS & TUTORIALS, VOL. 11, NO. 2, SECOND QUARTER 2009 A Comparative Analysis of Network Dependability, Fault-tolerance, Reliability, Security, and Survivability M. Al-Kuwaiti,

More information

Molecular Genetic Analysis of Single Cells

Molecular Genetic Analysis of Single Cells 268 Molecular Genetic Analysis of Single Cells Francesco Fiorentino, Ph.D. 1 1 GENOMA - Molecular Genetics Laboratory, Rome, Italy Semin Reprod Med 2012;30:268 283 Address for correspondence and reprint

More information

Chapter 2 Survey of Biodata Analysis from a Data Mining Perspective

Chapter 2 Survey of Biodata Analysis from a Data Mining Perspective Chapter 2 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy, Jiawei Han, Lei Liu, and Jiong Yang Summary Recent progress in biology, medical science, bioinformatics, and biotechnology

More information

Introduction to r andomized c linical t rials in c ardiovascular d isease

Introduction to r andomized c linical t rials in c ardiovascular d isease CHAPTER 1 Introduction to r andomized c linical t rials in c ardiovascular d isease Tobias Geisler, 1 Marcus D. Flather, 2 Deepak L. Bhatt, 3 and Ralph B. D Agostino, Sr 4 1University Hospital Tübingen,

More information

Profiling Deployed Software: Assessing Strategies and Testing Opportunities. Sebastian Elbaum, Member, IEEE, and Madeline Diep

Profiling Deployed Software: Assessing Strategies and Testing Opportunities. Sebastian Elbaum, Member, IEEE, and Madeline Diep IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 31, NO. 8, AUGUST 2005 1 Profiling Deployed Software: Assessing Strategies and Testing Opportunities Sebastian Elbaum, Member, IEEE, and Madeline Diep Abstract

More information


CHALLENGES IN THE CLINICAL USE OF GENOME SEQUENCING CHALLENGES IN THE CLINICAL USE OF GENOME SEQUENCING Chapter in Ginsburg and Willard (eds) Genomic and Personalized Medicine, 2 nd Edition, 2012, in press. Robert C. Green, MD, MPH Heidi L. Rehm, PhD, FACMG

More information

Stochastic universals and dynamics of cross-linguistic distributions: the case of alignment types

Stochastic universals and dynamics of cross-linguistic distributions: the case of alignment types Stochastic universals and dynamics of cross-linguistic distributions: the case of alignment types Elena Maslova & Tatiana Nikitina 1. Introduction This paper has two goals. The first is to describe a novel

More information

Genetics then and now: breeding the best and biotechnology

Genetics then and now: breeding the best and biotechnology Rev. sci. tech. Off. int. Epiz., 2005, 24 (1), 31-49 Genetics then and now: breeding the best and biotechnology P.K. Basrur (1) & W.A. King (2) (1) Professor Emeritus, Department of Biomedical Sciences,

More information

C3P: Context-Aware Crowdsourced Cloud Privacy

C3P: Context-Aware Crowdsourced Cloud Privacy C3P: Context-Aware Crowdsourced Cloud Privacy Hamza Harkous, Rameez Rahman, and Karl Aberer École Polytechnique Fédérale de Lausanne (EPFL),,

More information

Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations

Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations Revisiting the Edge of Chaos: Evolving Cellular Automata to Perform Computations Melanie Mitchell 1, Peter T. Hraber 1, and James P. Crutchfield 2 In Complex Systems, 7:89-13, 1993 Abstract We present

More information

THE TWO major steps in applying any heuristic search algorithm

THE TWO major steps in applying any heuristic search algorithm 124 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 3, NO. 2, JULY 1999 Parameter Control in Evolutionary Algorithms Ágoston Endre Eiben, Robert Hinterding, and Zbigniew Michalewicz, Senior Member,

More information

The MIT Press Journals

The MIT Press Journals The MIT Press Journals This article is provided courtesy of The MIT Press. To join an e-mail alert list and receive the latest news on our publications, please visit:

More information

The Evaluation of Treatment Services and Systems for Substance Use Disorders 1,2

The Evaluation of Treatment Services and Systems for Substance Use Disorders 1,2 Artigos originais The Evaluation of Treatment Services and Systems for Substance Use Disorders 1,2 Dr. Brian Rush, Ph.D.* NEED FOR EVALUATION Large numbers of people suffer from substance use disorders

More information

Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes

Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes Evidence Report/Technology Assessment Number 160 Impact of Gene Expression Profiling Tests on Breast Cancer Outcomes Prepared for: Agency for Healthcare Research and Quality U.S. Department of Health and

More information

THE PROBLEM OF finding localized energy solutions

THE PROBLEM OF finding localized energy solutions 600 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 3, MARCH 1997 Sparse Signal Reconstruction from Limited Data Using FOCUSS: A Re-weighted Minimum Norm Algorithm Irina F. Gorodnitsky, Member, IEEE,

More information

Managing fatigue: It s about sleep

Managing fatigue: It s about sleep Sleep Medicine Reviews (2005) 9, 365 380 THEORETICAL REVIEW Managing fatigue: It s about sleep Drew Dawson*, Kirsty McCulloch Centre for Sleep Research, University of South

More information

Can political science literatures be believed? A study of publication bias in the APSR and the AJPS

Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Can political science literatures be believed? A study of publication bias in the APSR and the AJPS Alan Gerber Yale University Neil Malhotra Stanford University Abstract Despite great attention to the

More information

Misunderstandings between experimentalists and observationalists about causal inference

Misunderstandings between experimentalists and observationalists about causal inference J. R. Statist. Soc. A (2008) 171, Part 2, pp. 481 502 Misunderstandings between experimentalists and observationalists about causal inference Kosuke Imai, Princeton University, USA Gary King Harvard University,

More information

Causes that Make a Difference

Causes that Make a Difference Forthcoming in: The Journal of Philosophy Preprint Causes that Make a Difference Causes that Make a Difference C. Kenneth Waters Minnesota Center for Philosophy of Science, Department of Philosophy, University

More information


EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NON-LINEAR REGRESSION. Carl Edward Rasmussen EVALUATION OF GAUSSIAN PROCESSES AND OTHER METHODS FOR NON-LINEAR REGRESSION Carl Edward Rasmussen A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy, Graduate

More information

Algorithms for the multi-armed bandit problem

Algorithms for the multi-armed bandit problem Journal of Machine Learning Research 1 (2000) 1-48 Submitted 4/00; Published 10/00 Algorithms for the multi-armed bandit problem Volodymyr Kuleshov Doina Precup School of Computer Science McGill University

More information

E. F. Allan, S. Abeyasekera & R. D. Stern

E. F. Allan, S. Abeyasekera & R. D. Stern E. F. Allan, S. Abeyasekera & R. D. Stern January 2006 The University of Reading Statistical Services Centre Guidance prepared for the DFID Forestry Research Programme Foreword The production of this

More information

Quality-Based Pricing for Crowdsourced Workers

Quality-Based Pricing for Crowdsourced Workers NYU Stern Research Working Paper CBA-13-06 Quality-Based Pricing for Crowdsourced Workers Jing Wang Panagiotis G. Ipeirotis Foster Provost

More information



More information

General Principles of Software Validation; Final Guidance for Industry and FDA Staff

General Principles of Software Validation; Final Guidance for Industry and FDA Staff General Principles of Software Validation; Final Guidance for Industry and FDA Staff Document issued on: January 11, 2002 This document supersedes the draft document, "General Principles of Software Validation,

More information

Does Automated White-Box Test Generation Really Help Software Testers?

Does Automated White-Box Test Generation Really Help Software Testers? Does Automated White-Box Test Generation Really Help Software Testers? Gordon Fraser 1 Matt Staats 2 Phil McMinn 1 Andrea Arcuri 3 Frank Padberg 4 1 Department of 2 Division of Web Science 3 Simula Research

More information


PHYLOGENETIC ANALYSIS Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins, Second Edition Andreas D. Baxevanis, B.F. Francis Ouellette Copyright 2001 John Wiley & Sons, Inc. ISBNs: 0-471-38390-2 (Hardback);

More information