NGS and complex genetics Robert Kraaij Genetic Laboratory Department of Internal Medicine r.kraaij@erasmusmc.nl
Gene Hunting Rotterdam Study and GWAS Next Generation Sequencing
Gene Hunting
Mendelian gene hunting: linkage Gregor Mendel (1822 1884) Linkage analysis
Simple Disease vs Complex Disease Simple Disease severe phenotype early onset rare Mendelian inheritance e.g.: cystic fibrosis, osteogenesis imperfecta Complex Disease mild phenotype late onset common complex inheritance e.g.: diabetes, asthma, osteoporosis Mutations (< 1%) Polymorphisms (> 1%)
AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG Single Nucleotide CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG Polymorphism? CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK
AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG Re-sequencing CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCTATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK
AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG Human Genome Project CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG Re-sequencing (dbsnp) CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC HapMap Project ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCTATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA ~ 12 million common DNA polymorphisms AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG in human genome TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC Hypothesis: GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG Common Variant Common Disease TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK
AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCTATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK
DNA differences cause phenotype differences
Twin studies demonstrate heritability Heritable diseases and traits: Diabetes Breast cancer Osteoarthrosis Menopause Height Infidelity Entrepreneurship Paget s Disease Depression Eye color Osteoporosis Longevity Eye diseases Etc. Rheumatoid arthritis Lung cancer BMI Weight Menarche cholesterol Uric acid Ankylosing spondylitis Myocardial Infarction Skin colour Stroke Smoking behaviour Etc.
Complex Genetics Simple Complex Genome wide Families Genome wide Populations linkage association
Rotterdam Study and GWAS
ERGO : The Rotterdam Study A single-centre, prospective population-based cohort study, started 1990 Base-line cohort = 7,983 men and women of age 55 yrs In 2007: 4 Follow-up measurements: ~1500 per subject each time Ethnically homogeneous: 99% Caucasian Computerized GP + pharmacy monitoring Study determinants and prevalence/incidence of chronic and disabling disease in the elderly: CVD, Neurodegenerative Disease, Endocrine diseases, Locomotor disease (osteoporosis, osteoarthritis), Eye End 2004: - 1200 coronary heart disease - 800 stroke - 1300 fractures - 1000 maculopathy - 800 dementia ~12.000 DNA samples available: 1990: ERGO base-line/ RSI: n=7,000 2000: ERGO plus/ RSII: n=3,000 (55+) 2004: ERGO young/ RSIII: 3,500 (45+)
RSI: ERGO Baseline Age 55-105 N = 6000; Illumina 550K 1990 1993 1998 2003 2008 RSI-1 RSI-2 RSI-3 RSI-4 GWAS Data available: JAN 2008 RSII: ERGO PLUS Age 55-65 N = 2500; Illumina 550K RSII-1 RSII-2 MAY 2009 RSIII: ERGO Young Age 45-55 N = 2800; Illumina 610K RSIII-1 JUL 2009 ERF (isolate) Age 18-95 N = 2600; Illumina 317K ERF APRIL 2009 Generation R Age 5-15 N = 6000; Illumina 610K GenR-1 NOV 2009
Genome-Wide Association Study (GWAS) DNA collection: e.g. 1000 cases vs. 1000 controls DATA ANALYSIS (e.g., PLINK): Illumina Affymetrix AA AB BB Each dot is one SNP in, e.g, 2000 subjects AA BB AB AA BB AB. AB SNP 1 SNP 2 SNP 3. SNP 550,000 1 2 3 4 5 6 7 8 14 18 X 10 12 Chromosomes Select SNPs (p-value, frequency) REPLICATION in other cohorts! Meta-Analysis of all data
LUMBAR SPINE BMD 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=5,000 Rivadeneira et al., Nat Genet., 2009
LUMBAR SPINE BMD LRP5 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=6,200 Rivadeneira et al., Nat Genet., 2009
LUMBAR SPINE BMD LRP5 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=8,500 Rivadeneira et al., Nat Genet., 2009
LUMBAR SPINE BMD RANK L 1p36 MHC C6ôrf10 OPG LRP5 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=15,000 Rivadeneira et al., Nat Genet., 2009
LUMBAR SPINE BMD RANK L 1p36 C6ôrf10 OPG LRP5 SP7 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=19,125 Rivadeneira et al., Nat Genet., 2009
allowing unprecedented leap in discoveries with > 800 studies on 150 human traits published to date
and that is definitively the case for our group in Rotterdam!! Publications: Nat Genet: 24 The Lancet: 6 Nature: 4 NEJM: 2 JAMA: 2... ~ 100 papers N = 12,000 N = 6,000 children Other consortia / isolated efforts
What are next steps after the success of GWAS? Unanswered Questions: Causative SNP? Causative gene? Mechanism? Biologic Pathways? Limited explained variance per trait/disease: dark matter The Hunt for Genetic Dark Matter : More common variants Not-yet-assessed common variants Rare (less frequent) variants (<5%, <1%) Copy Number Variations (CNV) Gene-gene interaction: (limited power) Gene-environment interaction: (limited power, standardization) Epi-genetics: methylation patterns of DNA
Next Generation Sequencing
The Human Genome Project Bill Clinton Tony Blair Craig Venter Francis Collins * 26 Juni 2000: Press conference Bill Clinton & Tony Blair: "working draft, 95% gesequenced * 14 april 2003: finished: 99% gesequenced. >>Cheaper and Faster!! Costs: $ 2.7 miljard (instead of $ 3 billion estimated costs) Timing: 1990-2003 (instead of 2005)
Next Generation Sequencing Illumina HiSeq2000 2 flowcells per machine 2 x 100 bp reads 8 days 100 Gb per flowcell
Future plans - GWAS on 2500 vertebral fracture cases from GENOMOS collection - GWAS parents Generation R => imprinting - Custom array LOCOMOTOR CHIP with already 50,000 candidate samples in GENOMOS collection - GWAS / 1000 GENOMES (Metabo-,Immuno- chips copncept) - Rare CNVs - Prioritization strategies (bioinformatics, eqtls, animal models) - Sequencing leads (regional, whole exome, whole genome) - Whole genome sequencing Rotterdam Study and Generation R individuals ~ 30,000 individuals
EU-BBMRI-NL: Dutch Genome Project : Trio design Full genome Sequence of 250 trios Caucasians with GWAS data, spread over NL Rotterdam Study => 34 trios ERF (Brabant), LifeLines (Groningen), Leiden Longevity Study Netherlands Twin Register (A dam/nl) Currently run at BGI
Possibilities Exome sequencing (CHARGE-S) Promising but the focus is not identifying the real variant underlying GWAS Proof of principle of involvement of gene identified by GWAS signal Targeted sequencing identified loci (CHARGE-S) Whole-genome sequencing (BB-MRI, RS individuals) Low pass sequencing Deep sequencing at Complete Genomics
Setup Illumina Compute Isilon storage 180 TB raw ~120 TB redundant Erasmus MC network Dell compute 128 cores 6 GB/core
Acknowledgements CHARGE