Genetics of Rheumatoid Arthritis Markey Lecture Series Al Kim akim@dom.wustl.edu 2012.09.06
Overview of Rheumatoid Arthritis
Rheumatoid Arthritis (RA) Autoimmune disease primarily targeting the synovium Polygenic autoimmune disease characterized by autoantibodies (rheumatoid factor, anti-ccp Abs) and inflammation of the synovium (synovitis) in numerous joints, leading to joint destruction McInnes, I.B. & Schett, G. NEJM, 365:2205 (2011)
Rheumatoid Arthritis (RA) If left untreated, can lead to permanent disability Patients with both RF and anti-ccp Abs highest risk of developing RA & having permanent disability 2012 American College of Rheumatology
Rheumatoid Arthritis (RA) Affects internal organs also, likely as a result of systemic inflammation Also 2-fold increase risk of non-hodgkin lymphoma McInnes, I.B. & Schett, G. NEJM, 365:2205 (2011)
RA pathophysiology Autoantibodies, synovial inflammation, joint deformity characterize RA AutoAbs Rheumatoid factor Anti-CCP Ab Synovial inflammation Joint destruction McInnes, I.B. & Schett, G. NEJM, 365:2205 (2011)
RA etiology Multifactorial in nature, but specific contributions remains unclear McInnes, I.B. & Schett, G. NEJM, 365:2205 (2011)
How do we know RA has a genetic component to disease susceptibility?
Proband, twin studies confirm genetics contributes to RA But concordance is not as high as one would think Well-established that the prevalence of RA in general population 0.24% to 1% Initial studies suggested up to 50% concordance rate between monozygotic twins (Thymann, 1957) Selection bias? -ID ed twins based on advertisement -Birth registries from same region later interrogated No concordance seen Small #s of twins analyzed
Proband, twin studies confirm genetics contributes to RA But concordance is not as high as one would think Well-established that the prevalence of RA in general population 0.24% to 1% Initial studies suggested up to 50% concordance rate between monozygotic twins (Thymann, 1957) Selection bias? -ID ed twins based on advertisement -Birth registries from same region later interrogated No concordance seen Small #s of twins analyzed True monozygotic twin concordance rate ~15-30%, dizygotic twin ~5% Estimated the heritability of RA ~60% Siblings of those with RA have 2-4% prevalence rate Wolfe, F., et al. J. Rheumatol. 15:400 (1988) Deighton, C.M., et al. Clin. Genet. 36:178 (1989) Hasstedt, S.J., et al. Am. J. Hum. Genet. 55:738 (1994)
The first RA risk allele: HLA-DRB1 Associated with odds ratio of 3-13 HLA-DRB1 is a HLA (MHC) class II molecule
The first RA risk allele: HLA-DRB1 What are HLAs? HLAs are human MHCs that display antigens to T cells HLA (human leukocyte antigen) is the name of the major histocompatibility complex (MHC) in humans HLA serve to display antigens to T cells Class I There are two major classes of HLA: Class II
The first RA risk allele: HLA-DRB1 What are HLAs? HLAs are human MHCs that display antigens to T cells T cell receptors (TCRs) noncovalently interact with the antigen:mhc complex to initiate T cell activation Class I Class II Interacts with CD8 + T cells Interacts with CD4 + T cells Garcia, K.C,, et al. Science, 279:5354 (1998) HLA-DRB1 is a HLA (MHC) class II molecule
The first RA risk allele: HLA-DRB1 HLA-DRB1 allele harboring the shared epitope associated with the highest risk of RA Peter Gregersen
The first RA risk allele: HLA-DRB1 HLA-DRB1 allele harboring the shared epitope associated with the highest risk of RA Peter Gregersen 1) All alleles associated with RA shared R-A-A at positions 72-74 2) Positions 70 & 71 modulated risk of RA -Position 71: K conferred highest risk, followed by R, then A/E -Position 70: Q/R > D Pos 70 interacts with T cell receptor Pos 71 interacts with antigen
The first RA risk allele: HLA-DRB1 Unclear how the shared epitope contributes to disease though Simplest hypothesis: alters antigen loading into HLA/MHC which may alter Thymic selection (where T cell tolerance is established) Presentation of arthritogenic peptides to T cells Failure to generate appropriate regulatory T cells (immunosuppressive) Aberrant CD4 + T cell activation Autoimmunity
The first RA risk allele: HLA-DRB1 Unclear how the shared epitope contributes to disease though Simplest hypothesis: alters antigen loading into HLA/MHC which may alter No data has Thymic selection (where T cell tolerance is established) Presentation demonstrated of arthritogenic the peptides validity to T cells Failure to generate appropriate regulatory T cells (immunosuppressive) of this hypothesis Aberrant CD4 + T cell activation Autoimmunity Structure-function studies have been inconclusive No known antigen peptide has been identified But the association is legitimate Mouse models also possess shared epitopes Association with antibodies to citrullinated peptides very high
Beyond HLA: Genomewide Analysis Experience
How do we identify variants in humans associated with phenotype? Four approaches have been used, each with varying success Four basic approaches to identifying genetic variants that contribute to human phenotype: 1) Candidate gene association studies 2) Linkage analysis in multiplex families 3) Genomewide association studies (GWAS) 4) Next-generation sequencing (whole genome, exome)
How do we identify variants in humans associated with phenotype? Four approaches have been used, each with varying success Four basic approaches to identifying genetic variants that contribute to human phenotype: 1) Candidate gene association studies 2) Linkage analysis in multiplex families 3) Genomewide association studies (GWAS) 4) Next-generation sequencing to identify rare variants Candidate gene approach: What gene(s) are likely responsible for phenotype? Gene of Interest affected family members non-affected family members Sequence and compare Variants unique to disease Early studies suffered from: 1) Inadequate statistical power due to small sample sizes 2) Poor matching of cases and controls PTPN22 only associated identified through candidate gene approach to be linked with any autoimmune disease
Next to HLA, PTPN22 possesses next highest genetic association with RA Odds ratio in RA 1.5-4 Identified initially in type I diabetes mellitus Focused on nonsynonymous amino acid polymorphism (R620W) thought to have functional correlates Later found to associated with RA (especially CCP Ab-positive individuals) Increases to 3-4 in homozygous individuals Susceptibililty association also found in: Graves disease Hashimoto thyroiditis Myasthenia gravis Juvenile idiopathic arthritis Systemic lupus erythematosus No association Multiple sclerosis Protective association Crohn s disease
PTPN22 is a tyrosine phosphatase Functions to attenuate immune cell activation PTPN22 (human gene name=lyp, mouse=pep) (protein tyrosine phosphatase nonreceptor type 22) (Lck) Siminovitch, K.A. Nat. Genetics, 36:1248 (2004) PEP KO mice demonstrated enhanced T cell activation
R620W PTPN22 variant further suppresses immune cell activation Is a gain-of-function variant
R620W PTPN22 variant reduces binding to Csk Contribution to RA susceptibility still unclear (Lck) Altered tolerance checkpoints? Siminovitch, K.A. Nat. Genetics, 36:1248 (2004)
R620W PTPN22 variant upregulated gene products critical for B cell activation Also noted higher levels of autoreactive B cells In summary, PTPN22 story is hypothesis generating, but exact mechanism for this association remains unclear
How do we identify variants in humans associated with phenotype? Four approaches have been used, each with varying success Four basic approaches to identifying genetic variants that contribute to human phenotype: 1) Candidate gene association studies 2) Linkage analysis in multiplex families 3) Genomewide association studies (GWAS) 4) Next-generation sequencing to identify rare variants Linkage analysis: Hypothesis-free approach looking at families with high disease burden Depending on cosegregation of chromosomal regions with phenotypic trait within families Paternal Maternal If A is the disease allele (and B & C are genetic markers aka polymorphisms), recombination will occur more likely between A & C than A & B Disease genes are mapped by measuring recombination against a panel of different polymorphisms throughout the genome Can narrow locus containing disease allele to a region of ~1-5 million bp Accession date: 2012.09.04 http://genome.wellcome.ac.uk/doc_wtd020778.html NOD2 (Crohn s disease) and STAT4 (RA, SLE) have been identified this way
How do we identify variants in humans associated with phenotype? Four approaches have been used, each with varying success Four basic approaches to identifying genetic variants that contribute to human phenotype: 1) Candidate gene association studies 2) Linkage analysis in multiplex families 3) Genomewide association studies (GWAS) 4) Next-generation sequencing to identify rare variants GWAS: Hypothesis-free approach looking at SNPs that are highly associated with disease
Foundation of GWAS: understanding the extent and pattern of variation in human genome Variant sites rare overall, but SNPs explain bulk of diversity 2001 1st human genome sequenced
Foundation of GWAS: understanding the extent and pattern of variation in human genome Variant sites rare overall, but SNPs explain bulk of diversity 2001 1st human genome sequenced HapMap project 90 individuals in families from three racial groups complied initial SNP library Catalogued hotspots for common (>5%) single nucleotide variation in human genome
Foundation of GWAS: understanding the extent and pattern of variation in human genome Variant sites rare overall, but SNPs explain bulk of diversity 2001 1st human genome sequenced 2005 2nd gen HapMap HapMap project Identified 3.1 million validated SNPs HapMap data opened up possibility to scan these common variants simultaneously for disease association
Foundation of GWAS: understanding the extent and pattern of variation in human genome Variant sites rare overall, but SNPs explain bulk of diversity 2001 1st human genome sequenced HapMap project 2003 Concept of tagging SNPs 2005 2nd gen HapMap To define most of the common variations among individuals, do not have to genotype all 3 million SNPs Linkage disequilibrium
Problems with GWAS Unless sufficiently powered, data from GWAS may not be trusted GWAS issues: 1) Does not directly identify variant associated with disease -Identifies region (locus) where variant is -Locus can harbor common variants with weak effect and rare variants with large effect- -Fine mapping of locus will yield more specific associations 2) Only detecting variants that are common (>5% in general population) -An issue for uncommon disease (RA is ~1%, yet variant 5x more common) 3) If proper matching of cases and controls are poor, forget about it (toss the data) -Ethnicies, and subsets within ethnicities (population stratification) 4) Sample size and statistical power -Must apply Bonferroni correction (the more comparisons you make, the higher likelihood you will find a rare event, i.e. type I error) p < 0.05 # of comparisons 0.05 10 6 SNPs p < 5*10-8 -How many samples (individuals) needed to reach that p value?
All autoimmune disease SNPs possess odds ratios of 1-2 (usually < 1.5) * Altshuler D. et al. Science, 322:881 (2008)
GWAS has identified several loci harboring genes that may associate with RA risk Effect sizes for common variants modest (OR < 1.6) Lessons from RA GWAS studies: 1) Common variants only lend minor contributions to disease burden 2) While GWAS can identify small regions for study, no identifiable causal variants have been uncovered 3) Most associations located in non-protein coding regions 4) GWAS can work, if powered correctly (more loci will be identified) So if these loci contribute so little to genetic heritability of RA, where is the rest of the heritability? Missing heritability (GWAS data explains ~10-20% of the total heritability of RA) McInnes, I.B. & Schett, G. NEJM, 365:2205 (2011)
How do we identify variants in humans associated with phenotype? Four approaches have been used, each with varying success Four basic approaches to identifying genetic variants that contribute to human phenotype: 1) Candidate gene association studies 2) Linkage analysis in multiplex families 3) Genomewide association studies (GWAS) 4) Next-generation sequencing to identify rare variants Sequencing: Utilize next-gen sequencing approaches to identify rare variants associated with disease Currently, no published data associating rare variants with RA
Has rare variant identification yielded associations with disease? Mental retardation story most compelling reason to examine 10 unique non-synonmymous de novo mutations in 9 genes identified likely explained mental retardation (no functional studies done)
Has rare variant identification yielded associations with disease? Multiple rare variants in TREX1 associated with sporadic SLE TREX1 is a DNA endonuclease originally implicated in Aicardi-Goutieres syndrome Caused mislocalization of TREX1
Conclusions RA possesses a significant genetic component Linkage and association studies have identified > 30 genes Unclear what the functional consequences of these variants are Significance of intronic variants? Key to future associations: powering the studies to achieve statistical significance Look beyond common variants Rare variants Copy number variations, epigenetics, chromatin topology
Ultimate need: identifying function to variants