Validation and Replication



Similar documents
Epigenetic variation and complex disease risk

Introduction To Epigenetic Regulation: How Can The Epigenomics Core Services Help Your Research? Maria (Ken) Figueroa, M.D. Core Scientific Director

NGS and complex genetics

Factors for success in big data science

School of Nursing. Presented by Yvette Conley, PhD

Core Facility Genomics

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Consistent Assay Performance Across Universal Arrays and Scanners

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

ITT Advanced Medical Technologies - A Programmer's Overview

Psychoonkology, Sept lifestyle factors and epigenetics

Genetic Testing in Research & Healthcare

GENETICS AND INSURANCE: QUANTIFYING THE IMPACT OF GENETIC INFORMATION

Big Data for Population Health and Personalised Medicine through EMR Linkages

The Influence of Infant Health on Adult Chronic Disease

OpenMedicine Foundation (OMF)

Analysis of Illumina Gene Expression Microarray Data

Issues with Tissues. Bertha delanda Celia Molvin/Kevin Murphy Research Compliance Office Stanford University

RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial

A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm

The Developing Person Through the Life Span 8e by Kathleen Stassen Berger

Validation parameters: An introduction to measures of

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Alpha-1 Antitrypsin Deficiency A future NBS candidate?

Nutrition and Toxicants in Autoimmune Disease: Implications for Prevention and Treatment

Dal germinale al somatico nella identificazione di tumori ereditari

Biomedical Big Data and Precision Medicine

Heritability: Twin Studies. Twin studies are often used to assess genetic effects on variation in a trait

The Human Genome. Genetics and Personality. The Human Genome. The Human Genome 2/19/2009. Chapter 6. Controversy About Genes and Personality

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

Big data in health research Professor Tony Blakely

Research Skills for Non-Researchers: Using Electronic Health Data and Other Existing Data Resources

LESSON 3.5 WORKBOOK. How do cancer cells evolve? Workbook Lesson 3.5

Title: Genetics and Hearing Loss: Clinical and Molecular Characteristics

About The Causes of Hearing Loss

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

A Primer of Genome Science THIRD

Bioruptor NGS: Unbiased DNA shearing for Next-Generation Sequencing

An Introduction to Genomics and SAS Scientific Discovery Solutions

Big Data for Population Health

The National Institute of Genomic Medicine (INMEGEN) was

Okami Study Guide: Chapter 3 1

patient education Fact Sheet PFS007: BRCA1 and BRCA2 Mutations MARCH 2015

Validation and Calibration. Definitions and Terminology

Single Nucleotide Polymorphisms (SNPs)

Worksheet - COMPARATIVE MAPPING 1

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Consultation Response Medical profiling and online medicine: the ethics of 'personalised' healthcare in a consumer age Nuffield Council on Bioethics

Newborn Screening Issues

The Human Genome Project

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Mortality Assessment Technology: A New Tool for Life Insurance Underwriting

Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals

The Genetic Epidemiology of Substance Abuse

Information for patients and the public and patient information about DNA / Biobanking across Europe

Introduction to genetic testing and pharmacogenomics

Patient Information. for Childhood

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

12.1 The Role of DNA in Heredity

Genomics and Family History Survey Questions Updated March 2007 Compiled by the University of Washington Center for Genomics & Public Health

Professor Gerlinde Metz. Transgenerational Epigenetic Programing of the Brain

GENETIC DATA ANALYSIS

Introductory genetics for veterinary students

Incorporating Research Into Sight (IRIS) Essentia Rural Health Institute Marshfield Clinic Penn State University

DNA as a Biometric. Biometric Consortium Conference 2011 Tampa, FL

Summary and conclusions. Chapter 8. Summary and conclusions

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

MCB 4934: Introduction to Genetics and Genomics in Health Care Section 125D Fall Credits

Manitoba EMR Data Extract Specifications

Version Module guide. Preliminary document. International Master Program Cardiovascular Science University of Göttingen

NGS data analysis. Bernardo J. Clavijo

In recent years the number of DNA genetic tests that you can

How much data is enough? Data prioritisation using analytics

Basic of Epidemiology in Ophthalmology Rajiv Khandekar. Presented in the 2nd Series of the MEACO Live Scientific Lectures 11 August 2014 Riyadh, KSA

Introduction To Real Time Quantitative PCR (qpcr)

Time series experiments

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Genetic testing. The difference diagnostics can make. The British In Vitro Diagnostics Association

Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center

The Adverse Health Effects of Cannabis

Over-the-counter Genetic Susceptibility Tests

European registered Clinical Laboratory Geneticist (ErCLG) Core curriculum

Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision

Oculopharyngeal muscular dystrophy (OPMD)

CCR Biology - Chapter 7 Practice Test - Summer 2012

Transcription:

Validation and Replication

Overview Definitions of validation and replication Difficulties and limitations Working examples from our group and others

Why? False positive results still occur. even after stringent QC, data pre-processing, complex analyses and alpha adjustments The best ways of ensuring an observation is in fact real and meaningful is to: validate and replicate the findings perform longitudinal and functional studies to determine the true causal/biological effects

Validation vs. Replication Validation Verify that the methylation data generated are accurate and the results are reliable Ideally, by repeating the experiment in the same samples but using different laboratory techniques Several factors could result in erroneous data. For instance: systematic errors associated with the laboratory methods experimental design issues (e.g. cases and controls on separate plates) handling errors (e.g. sample mix-ups) Validation enables you to ensure the findings are due to true biological variation and not some unknown experimental artefact

Replication vs. Validation Replication Reproduce the findings in a independent dataset, i.e. different samples Replication enables: verification of the findings in a different dataset the findings to be generalised to the wider population a more precise estimate of the findings to be measured further exploration

The ideal scenario Perform both Validation proves the results are reliable but not necessarily generalisable to the wider population Replication, if successful, proves the results are generalisable But, if unsuccessful, you will not know why technical error in the first and/or second stage lack of power in the second stage subtle sample/phenotypic differences quite simply, a false positive finding due to chance in the first stage

In reality Its not always possible to do both Epigenetic techniques are expensive Sites of interest may not be feasible on certain platforms Limited access to tissue samples Limited access to similar phenotypic cohorts Application of different study designs e.g. parent-offspring pairs, monozygotic twins, longitudinal studies may not be possible Any attempt at validation and/or replication is better than nothing

Summary so far Validation: Verify that the methylation data generated are accurate and the results are reliable same samples, different method Replication: Reproduce the findings in an independent dataset different samples Validation and replication are not the same thing, but both are valuable tools

Examples from our group We have utilised a number of different processes: Repeat the experiment in the same samples using a different methodology Repeat the experiment in the same samples using a different source of tissue but the same technique Include extra samples to increase robustness Assess different measures (e.g. expression, methylation, SNP genotypes) Independent replication i.e. different samples but same experimental method and study design

Identify methylation differences associated with Leber s hereditary optic neuropathy Example 1. Leber s Hereditary Optic Neuropathy (LHON) LHON is a common mitochondrial disorder characterised by loss of central vision Hypothesis: Oxidative stress arising from mitochondrial dysfunction alters DNA methylation of the nuclear genome with consequences for the regulation of gene expression We measured DNA methylation of the nuclear genome using 27k array to identify differences between those with LHON phenotype and unaffected carriers Samples from four pedigrees from the North East of England.

Identify methylation differences associated with Leber s hereditary optic neuropathy UK family pedigrees with Leber s hereditary optic neuropathy French family pedigrees Discovery Validation Validation/Replication Replication Blood samples Blood samples Blood samples Independent cohort 27k chip Identify differentially methylated CpG sites (n=28) Bisulphite modification & Pyrosequencing of 2 candidates (n=28) Bisulphite modification & Pyrosequencing of 2 candidates (n=49) Bisulphite modification & Pyrosequencing 2 CpG sites selected to take forward (p<0.05) Methylation levels strongly correlated (rho >0.6) between techniques and trends in association for both genes (p<0.1) With an additional 19 samples mainly from the same families, one candidate remained associated (p=0.006) the other did not (p>0.1) Hannah Elliott, ongoing

Postnatal growth and DNA methylation are associated with differential gene expression of TACSTD2 and childhood fat mass Example 2. Postnatal growth and DNA methylation are associated with differential gene expression of TACSTD2 and childhood fat mass microarray expression analysis to identify genes with differential expression in preterm-born children defined as slow or rapid growers. Identify potential candidates for methylation analysis

Postnatal growth and DNA methylation are associated with differential gene expression of TACSTD2 and childhood fat mass CHILDREN BORN PRETERM: Newcastle Preterm birth cohort Blood samples 11yrs Saliva samples 11yrs RNA DNA DNA expression microarray slow vs rapid postnatal growth (n=20) Bisulphite modification Bisulphite modification Validation of top hit using Real time PCR Pyrosequencing analysis of candidate gene (n=94) Pyrosequencing analysis of candidate gene (n=68) Analysis of relationship between methylation, expression and phenotype at age 11y Alix Groom et al, Diabetes 2012

Postnatal growth and DNA methylation are associated with differential gene expression of TACSTD2 and childhood fat mass CHILDREN BORN TERM: ALSPAC Cord blood samples Blood samples 7yrs DNA DNA Bisulphite modification Bisulphite modification Pyrosequencing analysis of candidate gene (n=173) Pyrosequencing analysis of candidate gene (n=178) Analysis of relationship between methylation and phenotype at age 9 and 15 years Alix Groom et al, Diabetes 2012

Smoking and methylation Example 3 (not from our group) 177 individuals from the population-based epidemiological ESTHER study: current smokers, former smokers, and those who had never smoked Illumina HumanMethylation 27K BeadChip

Smoking and Methylation 177 individuals from ESTHER study Further discovery Discovery Blood samples Validation Blood samples Replication Blood samples Looked at methylation in surrounding regions using Sequenom EpiTYPER 27k Chip Identify differentially methylated CpG sites Bisulphite modification & Sequenom EpiTYPER analysis of discovery samples Bisulphite modification & Sequenom EpiTYPER analysis of 328 nonoverlapping subjects 79 samples from the discovery study 1 CpG site selected to take forward Spearman correlation between methods: (rho =0.82) Smokers still hypomethylated at CpG site (P smoking = 1.07x10-28 ) Pronounced association with smoking remained Only CpG sites immediately next to the main hit were associated with smoking (41bp away)

Smoking and Methylation They then went on to test the same methylation site in a different cohort (Better replication?) Sequenom EpiTYPER analysis This time looking at whether F2RL3 methylation was related to a clinical outcome 1206 individuals from the KAROLA prospective cohort study Experienced acute coronary syndrome, myocardial infarction or coronary intervention Active follow up over 8 years

Smoking and Methylation Methylation at F2RL3 associated with mortality in patients in this cohort! The methylation data (CpG_4) reported in the main body of the paper IS NOT the same CpG site described in the original paper. This CpG is CpG_2 see supplementary data for results The strongest signal from the first round wasn t the strongest association when linked to clinical outcome in a second cohort

Conclusions Validation and replication are different Ideally, attempt to do both Plan for further functional work or analysis to identify true causal/biological effects If you can. Do it!

References Breitling LP et al., Eur Heart J. 2012 Apr 17: Smoking, F2RL3 methylation, and prognosis in stable coronary heart disease Breitling LP et al., Am J Hum Genet. 2011 Apr 8;88(4):450-7. Epub 2011 Mar 31: Tobacco-smoking-related differential DNA methylation: 27K discovery and replication Groom A et al., Diabetes 2012 Feb;61(2):391-400. Epub 2011 Dec 21: Postnatal growth and DNA methylation are associated with differential gene expression of the TACSTD2 gene and childhood fat mass Hirschhorn JN and Daly MJ. Nat Rev Genet. 2005 Feb;6(2):95-108: Genome-wide association studies for common diseases and complex traits Rakyan VK et al., Nat Rev Genet. 2011 Jul 12;12(8):529-41. doi: 10.1038/nrg3000: Epigenome-wide association studies for common human diseases

Validation and Replication