NGS and complex genetics

Similar documents
School of Nursing. Presented by Yvette Conley, PhD

Validation and Replication

Factors for success in big data science

A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm

Heritability: Twin Studies. Twin studies are often used to assess genetic effects on variation in a trait

Core Facility Genomics

Biomedical Big Data and Precision Medicine

Epigenetic variation and complex disease risk

TRACKS GENETIC EPIDEMIOLOGY

Big Data for Population Health

Genetics of Rheumatoid Arthritis Markey Lecture Series

Introduction to genetic testing and pharmacogenomics

Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa

How To Find Rare Variants In The Human Genome

GENETIC DATA ANALYSIS

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Big Data for Population Health and Personalised Medicine through EMR Linkages

Biological Sciences Initiative. Human Genome

Genetic Testing in Research & Healthcare

Sequencing and microarrays for genome analysis: complementary rather than competing?

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

SNP Essentials The same SNP story

Information leaflet. Centrum voor Medische Genetica. Version 1/ Design by Ben Caljon, UZ Brussel. Universitair Ziekenhuis Brussel

Mendelian inheritance and the

BI122 Introduction to Human Genetics, Fall 2014

Human Genome Organization: An Update. Genome Organization: An Update

The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Disease gene identification with exome sequencing

escience and Post-Genome Biomedical Research

Genetic diagnostics the gateway to personalized medicine

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters

Next Generation Sequencing: Technology, Mapping, and Analysis

Research Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources

Incorporating Research Into Sight (IRIS) Essentia Rural Health Institute Marshfield Clinic Penn State University

Genomes and SNPs in Malaria and Sickle Cell Anemia

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Delivering the power of the world s most successful genomics platform

DNA Sequencing and Personalised Medicine

14.3 Studying the Human Genome

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Automated DNA sequencing 20/12/2009. Next Generation Sequencing

ITT Advanced Medical Technologies - A Programmer's Overview

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

Genetic testing. The difference diagnostics can make. The British In Vitro Diagnostics Association

DNA-Analytik III. Genetische Variabilität

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Next Generation Sequencing; Technologies, applications and data analysis

European Educational Programme in Epidemiology

Complex Genetic Risk: The Implications for Insurance

Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems

Single Nucleotide Polymorphisms (SNPs)

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

Simplifying Data Interpretation with Nexus Copy Number

GENETICS AND INSURANCE: QUANTIFYING THE IMPACT OF GENETIC INFORMATION

From Immunotherapy of Cancer to the Discovery of Kidney Cancer Genes

Genomics and Family History Survey Questions Updated March 2007 Compiled by the University of Washington Center for Genomics & Public Health

Saving healthcare costs by implementing new genetic risk tests for early detection of cancer and prevention of cardiovascular diseases

RAW PREVALENCE FOR NORTHERN IRELAND AS AT 31 MARCH 2014

Next generation DNA sequencing technologies. theory & prac-ce

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Introduction to NGS data analysis

SHAHID AZIZ DO, FACOI.


SAP HANA Enabling Genome Analysis

Investigating the genetic basis for intelligence

The Human Genome Project

A Genetic Analysis of Rheumatoid Arthritis

Logistic Regression (1/24/13)

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Limited Pay Policy (L-222B) - Underwriting Guidelines

BRCA in Men. Mary B. Daly,M.D.,Ph.D. June 25, 2010

GENETIC TESTING FOR INHERITED MUTATIONS OR SUSCEPTIBILITY TO CANCER OR OTHER CONDITIONS MED

GENETIC STUDIES OF AUTOIMMUNE DISEASES. Benedicte Alexandre Lie Institute of Immunology Rikshospitalet University Hospital

The M.U.R.D.O.C.K. Study

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

Heredity - Patterns of Inheritance

A Primer of Genome Science THIRD

Transcription:

NGS and complex genetics Robert Kraaij Genetic Laboratory Department of Internal Medicine r.kraaij@erasmusmc.nl

Gene Hunting Rotterdam Study and GWAS Next Generation Sequencing

Gene Hunting

Mendelian gene hunting: linkage Gregor Mendel (1822 1884) Linkage analysis

Simple Disease vs Complex Disease Simple Disease severe phenotype early onset rare Mendelian inheritance e.g.: cystic fibrosis, osteogenesis imperfecta Complex Disease mild phenotype late onset common complex inheritance e.g.: diabetes, asthma, osteoporosis Mutations (< 1%) Polymorphisms (> 1%)

AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG Single Nucleotide CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG Polymorphism? CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK

AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG Re-sequencing CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCTATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK

AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG Human Genome Project CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG Re-sequencing (dbsnp) CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC HapMap Project ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCTATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA ~ 12 million common DNA polymorphisms AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG in human genome TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC Hypothesis: GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG Common Variant Common Disease TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK

AGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGT GACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTG ACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAG CTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATG CGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGA TGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAG CTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGG ATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTA GCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAG CGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCT AGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGAGTCTGACTGACC ATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTG CGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGAGGAGTCTGACTGACCATTGGA CTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGC TGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGAT GCTACCAGTCGATCTATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATAACCGG ATAAGGGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAA AATAGCGGTATTTTGGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTG ATCGATGCTAGTAAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGAT TACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACG TGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCT AGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGT GGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGACTGACCATTGGACTAGGGGATTGACCAGTA GGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGGATGCTGCGATGCTGGACTGAACGCCCCCCG GGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATC GATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTG ATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGC TAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCGATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGG TTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGGGTGCTTACCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAG CGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGATCATCGATCGTAGCTAGCTAGCTAGCTAGCTGATCGATCG ATGCTAGCTAGCTAGCTAGTCATCTGTGGTGGGGGGTTAAATGCGATTGCCGCTAGCTAGAACAAAATAGCGGTATTTTGGAGGAGTCTGA CTGACCATTGGACTAGGGGATTGACCAGTAGGCTGCGATTCGGATGCGGATTGACGATTAAAAAGGATTACGATTAGCTGTGACGTGCAGG ATGCTGCGATGCTGGACTGAACGCCCCCCGGGCTTCTTTATTAGCTGCTGACGTGCCAGATGCTGACGTGCAGTGCGGCTGACGGTGCTTA CCTGGATCGGATGCTACCAGTCGATCGATCGATCGTAGCGTAGCGTATGCTAGCTAGTGATCGATGCTAGTAGCTAGCTAGCTGATCGARK

DNA differences cause phenotype differences

Twin studies demonstrate heritability Heritable diseases and traits: Diabetes Breast cancer Osteoarthrosis Menopause Height Infidelity Entrepreneurship Paget s Disease Depression Eye color Osteoporosis Longevity Eye diseases Etc. Rheumatoid arthritis Lung cancer BMI Weight Menarche cholesterol Uric acid Ankylosing spondylitis Myocardial Infarction Skin colour Stroke Smoking behaviour Etc.

Complex Genetics Simple Complex Genome wide Families Genome wide Populations linkage association

Rotterdam Study and GWAS

ERGO : The Rotterdam Study A single-centre, prospective population-based cohort study, started 1990 Base-line cohort = 7,983 men and women of age 55 yrs In 2007: 4 Follow-up measurements: ~1500 per subject each time Ethnically homogeneous: 99% Caucasian Computerized GP + pharmacy monitoring Study determinants and prevalence/incidence of chronic and disabling disease in the elderly: CVD, Neurodegenerative Disease, Endocrine diseases, Locomotor disease (osteoporosis, osteoarthritis), Eye End 2004: - 1200 coronary heart disease - 800 stroke - 1300 fractures - 1000 maculopathy - 800 dementia ~12.000 DNA samples available: 1990: ERGO base-line/ RSI: n=7,000 2000: ERGO plus/ RSII: n=3,000 (55+) 2004: ERGO young/ RSIII: 3,500 (45+)

RSI: ERGO Baseline Age 55-105 N = 6000; Illumina 550K 1990 1993 1998 2003 2008 RSI-1 RSI-2 RSI-3 RSI-4 GWAS Data available: JAN 2008 RSII: ERGO PLUS Age 55-65 N = 2500; Illumina 550K RSII-1 RSII-2 MAY 2009 RSIII: ERGO Young Age 45-55 N = 2800; Illumina 610K RSIII-1 JUL 2009 ERF (isolate) Age 18-95 N = 2600; Illumina 317K ERF APRIL 2009 Generation R Age 5-15 N = 6000; Illumina 610K GenR-1 NOV 2009

Genome-Wide Association Study (GWAS) DNA collection: e.g. 1000 cases vs. 1000 controls DATA ANALYSIS (e.g., PLINK): Illumina Affymetrix AA AB BB Each dot is one SNP in, e.g, 2000 subjects AA BB AB AA BB AB. AB SNP 1 SNP 2 SNP 3. SNP 550,000 1 2 3 4 5 6 7 8 14 18 X 10 12 Chromosomes Select SNPs (p-value, frequency) REPLICATION in other cohorts! Meta-Analysis of all data

LUMBAR SPINE BMD 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=5,000 Rivadeneira et al., Nat Genet., 2009

LUMBAR SPINE BMD LRP5 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=6,200 Rivadeneira et al., Nat Genet., 2009

LUMBAR SPINE BMD LRP5 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=8,500 Rivadeneira et al., Nat Genet., 2009

LUMBAR SPINE BMD RANK L 1p36 MHC C6ôrf10 OPG LRP5 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=15,000 Rivadeneira et al., Nat Genet., 2009

LUMBAR SPINE BMD RANK L 1p36 C6ôrf10 OPG LRP5 SP7 5 x 10 8 Rotterdam Study ERF Study Twins UK decode Genetics Framingham Study N=19,125 Rivadeneira et al., Nat Genet., 2009

allowing unprecedented leap in discoveries with > 800 studies on 150 human traits published to date

and that is definitively the case for our group in Rotterdam!! Publications: Nat Genet: 24 The Lancet: 6 Nature: 4 NEJM: 2 JAMA: 2... ~ 100 papers N = 12,000 N = 6,000 children Other consortia / isolated efforts

What are next steps after the success of GWAS? Unanswered Questions: Causative SNP? Causative gene? Mechanism? Biologic Pathways? Limited explained variance per trait/disease: dark matter The Hunt for Genetic Dark Matter : More common variants Not-yet-assessed common variants Rare (less frequent) variants (<5%, <1%) Copy Number Variations (CNV) Gene-gene interaction: (limited power) Gene-environment interaction: (limited power, standardization) Epi-genetics: methylation patterns of DNA

Next Generation Sequencing

The Human Genome Project Bill Clinton Tony Blair Craig Venter Francis Collins * 26 Juni 2000: Press conference Bill Clinton & Tony Blair: "working draft, 95% gesequenced * 14 april 2003: finished: 99% gesequenced. >>Cheaper and Faster!! Costs: $ 2.7 miljard (instead of $ 3 billion estimated costs) Timing: 1990-2003 (instead of 2005)

Next Generation Sequencing Illumina HiSeq2000 2 flowcells per machine 2 x 100 bp reads 8 days 100 Gb per flowcell

Future plans - GWAS on 2500 vertebral fracture cases from GENOMOS collection - GWAS parents Generation R => imprinting - Custom array LOCOMOTOR CHIP with already 50,000 candidate samples in GENOMOS collection - GWAS / 1000 GENOMES (Metabo-,Immuno- chips copncept) - Rare CNVs - Prioritization strategies (bioinformatics, eqtls, animal models) - Sequencing leads (regional, whole exome, whole genome) - Whole genome sequencing Rotterdam Study and Generation R individuals ~ 30,000 individuals

EU-BBMRI-NL: Dutch Genome Project : Trio design Full genome Sequence of 250 trios Caucasians with GWAS data, spread over NL Rotterdam Study => 34 trios ERF (Brabant), LifeLines (Groningen), Leiden Longevity Study Netherlands Twin Register (A dam/nl) Currently run at BGI

Possibilities Exome sequencing (CHARGE-S) Promising but the focus is not identifying the real variant underlying GWAS Proof of principle of involvement of gene identified by GWAS signal Targeted sequencing identified loci (CHARGE-S) Whole-genome sequencing (BB-MRI, RS individuals) Low pass sequencing Deep sequencing at Complete Genomics

Setup Illumina Compute Isilon storage 180 TB raw ~120 TB redundant Erasmus MC network Dell compute 128 cores 6 GB/core

Acknowledgements CHARGE