Fully powered polygenic prediction using summary statistics
|
|
- Rosaline Dickerson
- 7 years ago
- Views:
Transcription
1 Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health October 7, 015 To download slides of this talk: google Alkes HSPH
2 Summary statistics are widely available Nat Genet editorial, July 01
3 Outline 1. A brief history of summary statistic genetics. Introduction to polygenic prediction using summary statistics 3. LDpred method for polygenic prediction using summary statistics 4. Application of LDpred to real data sets
4 Outline 1. A brief history of summary statistic genetics. Introduction to polygenic prediction using summary statistics 3. LDpred method for polygenic prediction using summary statistics 4. Application of LDpred to real data sets
5 Definition of summary statistics Definition: Summary statistics consist of: GWAS association z-scores for each typed or imputed SNP + Sample sizes on which z-scores were computed (may vary by SNP) Note: Many applications also require LD information computed from a reference panel (e.g Genomes or UK10K) using a population very similar to the target sample.
6 Meta-analysis can be performed using summary statistics Evangelou & Ioannidis 013 Nat Rev Genet
7 Joint and conditional analysis can be performed using summary statistics Yang et al. 01 Nat Genet
8 Lee et al. 013 Bioinformatics; Pasaniuc et al. 014 Bioinformatics also see Park et al. 015 Bioinformatics, Lee et al. 015 Bioinformatics Imputation can be performed using summary statistics
9 Rare variant meta-analysis can be performed using summary statistics Lee et al. 013 AJHG; Hu et al. 013 AJHG; Liu et al. 014 Nat Genet also see Clarke et al. 013 PLoS Genet, Tang & Lin 015 AJHG
10 Genetic variance and covariance can be inferred using summary statistics Palla & Dudbridge 015 AJHG; Bulik-Sullivan et al. 015 Nat Genet
11 Functional enrichment can be inferred using summary statistics Pickrell 014 AJHG; Kichaev & Pasaniuc 015 AJHG; Finucane et al. 015 Nat Genet
12 Many projects at ASHG 015 using summary statistics Invited talks Pickrell, Pasaniuc, Im (this session) Platform talks 11 Gusev, 77 Cichonska, 0 Golan, 7 Park Posters 791 Kichaev, 797 Shi, 807 Roytman, 860 Salem, 868 Pare, 1301 Wu, 1334 Zhu, 1357 Chatterjee, 1477 Brown, 1618 Li, 1668 Khawaja, 1686 Lee, 1687 Zhao, 178 Torres, 1867 O Connor
13 Outline 1. A brief history of summary statistic genetics. Introduction to polygenic prediction using summary statistics 3. LDpred method for polygenic prediction using summary statistics 4. Application of LDpred to real data sets
14 Genetic prediction: why care? Erbe et al. 01 J Dairy Sci; Goss et al. 011 New Engl J Med
15 Using only genome-wide significant SNPs is a Stone Age genetic prediction method How should we conduct genetic prediction, Fred? ˆ k ˆ i x i (published SNPs) ik φ k = phenotype for sample k β i = effect size for SNP i x ik = genotype for SNP i, sample k Prediction r is less than half the r attained by polygenic prediction PGC-SCZ 014 Nature; Vilhjalmsson et al. 015 AJHG
16 Polygenic prediction can be performed using genome-wide summary statistics ˆ k ˆ i x i (all GWAS SNPs) ik φ k = phenotype for sample k β i = effect size for SNP i x ik = genotype for SNP i, sample k
17 Daetwyler et al. 008 PLoS ONE; Wray et al. 013 Nat Rev Genet also see Speed & Balding 014 Genome Res (multiblup) Is polygenic prediction using raw genotypes more accurate than using summary statistics? Answer: slightly. h g = heritability explained by SNPs M = number of (unlinked) SNPs N = number of training samples r h g h g h g M / N < r h g h g (1 h r g ) M / N using summary statistics: fit each SNP individually using raw genotypes: fit all SNPs simultaneously (BLUP prediction; Henderson 1975 Biometrics)
18 Accounting for non-infinitesimal architectures can improve polygenic prediction Infinitesimal (Gaussian) architecture: i ~ N 0, hg / M ˆ hg i ~ i N 0,1 / N => E( i ˆ i ) ˆ i hg M / N Uniform shrink on estimated effect sizes is appropriate ˆi
19 Accounting for non-infinitesimal architectures can improve polygenic prediction Non-infinitesimal architecture: (e.g. point-normal mixture, mixture of normals, etc.) Non-uniform shrink on estimated effect sizes is appropriate ˆi
20 Accounting for non-infinitesimal architectures can improve polygenic prediction Infinitesimal (Gaussian) architecture: i ~ N 0, hg / M ˆ hg i ~ i N 0,1 / N => E( i ˆ i ) ˆ i hg M / N Uniform shrink on estimated effect sizes is appropriate Non-infinitesimal architecture: (e.g. point-normal mixture, mixture of normals, etc.) Non-uniform shrink on estimated effect sizes ˆi is appropriate Standard heuristic approach: P-value thresholding ˆ ˆ k i x (Note: requires optimization of ik P T threshold in validation samples) i P-value < P T Purcell et al. 009 Nature; Chatterjee et al. 013 Nat Genet; Dudbridge 013 PLoS Genet ˆi
21 Purcell et al. 009 Nature; Stahl et al. 01 Nat Genet also see Rietveld et al. 013 Science (COJO) Accounting for linkage disequilibrium Problem: can improve polygenic prediction ˆ k ˆ i x i P-value < P T ik does not account for LD b/t SNPs Standard heuristic approaches: Random LD-pruning: prune SNPs (e.g. r < 0.), removing one of each pair of linked SNPs (decide randomly which SNP to remove) Informed LD-pruning (LD-clumping): prune SNPs, removing one of each pair of linked SNPs (remove SNP with less significant P-value in training data)
22 Pruning + Thresholding is widely used Purcell et al. 009 Nature; Lango Allen et al. 010 Nature; Ripke et al. 011 Nat Genet; Stahl et al. 01 Nat Genet; Deloukas et al. 013 Nat Genet; Ripke et al. 013 Nat Genet; Chatterjee et al. 013 Nat Genet; Dudbridge 013 PLoS Genet; PGC-SCZ 014 Nature
23 Pruning + Thresholding is widely used, but does not attain maximum prediction accuracy Simulations at different proportions p of causal SNPs: Non-infinitesimal Non-infinitesimal Infinitesimal Infinitesimal h g Vilhjalmsson et al. 015 AJHG
24 Outline 1. A brief history of summary statistic genetics. Introduction to polygenic prediction using summary statistics 3. LDpred method for polygenic prediction using summary statistics 4. Application of LDpred to real data sets
25 LDpred computes posterior means under a ˆ point-normal prior, accounting for LD k where E ( i ˆ i ) x i (all GWAS SNPs) E ˆ ) ( i i ik φ k = phenotype for sample k β i = effect size for SNP i x ik = genotype for SNP i, sample k are posterior mean effect sizes Vilhjalmsson et al. 015 AJHG
26 LDpred computes posterior means under a ˆ point-normal prior, accounting for LD k E ( i ˆ i ) x i (all GWAS SNPs) ik φ k = phenotype for sample k β i = effect size for SNP i x ik = genotype for SNP i, sample k where E( ˆ i i ) are posterior mean effect sizes based on point-normal prior with parameters: h g = heritability explained by SNPs (estimated from training data) p = proportion of causal SNPs (optimized in validation samples) LD from a reference panel Use validation samples as LD reference (restrict to SNPs with validation data) Vilhjalmsson et al. 015 AJHG
27 In the special case of no LD between SNPs, posterior means can be computed analytically E ˆ ( i i ) hg h g Mp / N p i ˆ i h g = heritability explained by SNPs p = proportion of causal SNPs M = number of (unlinked) SNPs N = number of training samples where p i h g / p Mp h g p / Mp 1/ N 1/ N e ( h g ˆ i e ( h / Mp 1/ N ) g ˆ i / Mp 1/ N ) 1 p 1/ N e ˆ i (1/ N ) is the posterior probability that i 0, i.e. SNP i is causal (generalizes uniform shrink when p = 1: infinitesimal prior, no LD)
28 In the special case of infinitesimal prior (with LD), posterior means can be computed analytically E( i ˆ ) i D M Nh g I 1 ˆ i h g = heritability explained by SNPs M = number of (unlinked) SNPs N = number of training samples where D is an LD matrix from a reference panel (generalizes uniform shrink when D = I: infinitesimal prior, no LD)
29 General case of non-infinitesimal prior with LD: posterior means cannot be computed analytically
30 General case of non-infinitesimal prior with LD: posterior means cannot be computed analytically Possible solutions: Assume 1 causal variant per locus
31 General case of non-infinitesimal prior with LD: posterior means cannot be computed analytically Possible solutions: Assume 1 causal variant per locus Iterative approach
32 General case of non-infinitesimal prior with LD: posterior means cannot be computed analytically Possible solutions: Assume 1 causal variant per locus Iterative approach MCMC
33 General case of non-infinitesimal prior with LD: posterior means cannot be computed analytically Solution: use MCMC. Initialize i = 0 At each big iteration For each SNP i Re-sample i based on Point-normal prior on i Observed ˆ ~ N( D, D / N) N T ˆ 1 D D ( ˆ D ) f ( i ˆ) ~ f ( i ) e, where f ) reflects point-normal prior (based on and p) ( i h g
34 General case of non-infinitesimal prior with LD: posterior means cannot be computed analytically Solution: use MCMC. Initialize i = 0 At each big iteration For each SNP i Re-sample i based on Point-normal prior on i Observed ˆ ~ N( D, D / N) 100 big iterations generally suffice for convergence Rao-Blackwellization: average the posterior means sampled Related MCMC methods for prediction from raw genotypes are described in Erbe et al. 01 J Dairy Sci, Zhou et al. 013 PLoS Genet, Moser et al. 015 PLoS Genet
35 LDpred performs well in simulations Simulations with real genotypes, 1% of SNPs causal
36 Understanding polygenic prediction Let s hide away and dance. -- Freddie K. Let s hide away with data. -- Alkes
37 Outline 1. A brief history of summary statistic genetics. Introduction to polygenic prediction using summary statistics 3. LDpred method for polygenic prediction using summary statistics 4. Application of LDpred to real data sets
38 Data from WTCCC 007 Nature. Results are similar to MCMC-based methods that require raw genotypes: Zhou et al. 013 PLoS Genet, Moser et al. 015 PLoS Genet LDpred performs well on within-cohort prediction of WTCCC traits
39 Data from WTCCC 007 Nature. Results are similar to MCMC-based methods that require raw genotypes: Zhou et al. 013 PLoS Genet, Moser et al. 015 PLoS Genet LDpred performs well on within-cohort prediction of WTCCC traits R nag R obs R liab (see Lee et al. 01 Genet Epidemiol)
40 Data from WTCCC 007 Nature. Results are similar to MCMC-based methods that require raw genotypes: Zhou et al. 013 PLoS Genet, Moser et al. 015 PLoS Genet LDpred performs well on within-cohort prediction of WTCCC traits Dominated by HLA
41 Data from WTCCC 007 Nature. Results are similar to MCMC-based methods that require raw genotypes: Zhou et al. 013 PLoS Genet, Moser et al. 015 PLoS Genet LDpred performs well on within-cohort prediction of WTCCC traits Do not validate in new cohort
42 but within-cohort prediction accuracy may be too good to be true R nag Training: WTCCC Validation: WTCCC Training: WTCCC Validation: WGHS CAD TD Results presented for LDpred; similar relative results for other methods Cryptic relatedness? Population structure? (Wray et al. 013 Nat Rev Genet)
43 LDpred performs well on summary statistics with independent validation cohorts Training N=70K PGC-SCZ 014 Nature; MGS replication sample
44 LDpred performs well on summary statistics with independent validation cohorts Training N=70K Training N=30K Training N=60K
45 LDpred performs well on summary statistics with independent validation cohorts Training N=70K Training N=30K Training N=60K Training N=70K Training N=90K
46 LDpred performs well on summary statistics with independent validation cohorts Height: complexities due to population stratification. Including PCs can improve prediction accuracy. (Chen et al. 015 Genet Epidemiol) Training N=130K (Lango Allen et al. 010 Nature)
47 Conclusions Explicitly modeling both LD and non-infinitesimal architectures improves polygenic prediction from summary statistics. Polygenic prediction should be evaluated using independent validation cohorts. Although polygenic predictions are not yet clinically useful, prediction accuracies will increase as sample sizes increase (bounded by heritability explained by SNPs; ). h g
48 and Future directions Polygenic prediction in non-european samples is challenging. How to combine training data from Europeans (large sample size) with training data from target population (small sample size)? (cross-population genetic correlation; Poster 1477 Brown) Enrichment of heritability in functional annotation classes could potentially be used to improve polygenic prediction (Poster 1357 Chatterjee) Methods for large raw genotype data sets (e.g. UK Biobank) should be developed in parallel with summary statistic methods (Platform talk 38 Loh; Platform talk 170 Young)
49 Acknowledgements Bjarni Vilhjalmsson + Vilhjalmsson et al. 015 AJHG co-authors Everyone in alkesgrp. Please check out our other ASHG 015 talks: Platform talk 11 Gusev Large-scale transcriptome-wide association study Platform talk 38 Loh Platform talk 196 Bhatia Platform talk 35 Galinsky Population differentiation analysis of 54,734 Platform talk 346 Hayeck Platform talk 354 Palamara Leveraging distant relatedness to quantify Contrasting regional architectures of schizophrenia Haplotypes of common SNPs explain a large Mixed model association with family-biased
Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
Online Supplement to Polygenic Influence on Educational Attainment Construction of Polygenic Score for Educational Attainment Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
More informationFactors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
More informationEHRs and large scale comparative effectiveness research
EHRs and large scale comparative effectiveness research September 16, 2014 Dana C. Crawford, PhD Associate Professor Epidemiology and Biostatistics Institute for Computational Biology Single Nucleotide
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationGENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING
GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING Theo Meuwissen Institute for Animal Science and Aquaculture, Box 5025, 1432 Ås, Norway, theo.meuwissen@ihf.nlh.no Summary
More informationInvestigating the genetic basis for intelligence
Investigating the genetic basis for intelligence Steve Hsu University of Oregon and BGI www.cog-genomics.org Outline: a multidisciplinary subject 1. What is intelligence? Psychometrics 2. g and GWAS: a
More informationGOBII. Genomic & Open-source Breeding Informatics Initiative
GOBII Genomic & Open-source Breeding Informatics Initiative My Background BS Animal Science, University of Tennessee MS Animal Breeding, University of Georgia Random regression models for longitudinal
More informationMarker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele
Marker-Assisted Backcrossing Marker-Assisted Selection CS74 009 Jim Holland Target gene = Recurrent parent allele = Donor parent allele. Select donor allele at markers linked to target gene.. Select recurrent
More informationFrom Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes
From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes Zhi Wei 1., Kai Wang 2., Hui-Qi Qu 3, Haitao Zhang 2, Jonathan Bradfield 2, Cecilia
More informationGEMMA User Manual. Xiang Zhou. May 18, 2016
GEMMA User Manual Xiang Zhou May 18, 2016 Contents 1 Introduction 4 1.1 What is GEMMA...................................... 4 1.2 How to Cite GEMMA................................... 4 1.3 Models............................................
More informationTowards running complex models on big data
Towards running complex models on big data Working with all the genomes in the world without changing the model (too much) Daniel Lawson Heilbronn Institute, University of Bristol 2013 1 / 17 Motivation
More informationPublication List. Chen Zehua Department of Statistics & Applied Probability National University of Singapore
Publication List Chen Zehua Department of Statistics & Applied Probability National University of Singapore Publications Journal Papers 1. Y. He and Z. Chen (2014). A sequential procedure for feature selection
More informationCombining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan
Combining Data from Different Genotyping Platforms Gonçalo Abecasis Center for Statistical Genetics University of Michigan The Challenge Detecting small effects requires very large sample sizes Combined
More informationGAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters
GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters Michael B Miller , Michael Li , Gregg Lind , Soon-Young
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationGlobal Alliance. Ewan Birney Associate Director EMBL-EBI
Global Alliance Ewan Birney Associate Director EMBL-EBI Our world is changing Research to Medical Research English as language Lightweight legal Identical/similar systems Open data Publications Grant-funding
More informationBuilding risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
More informationC-Reactive Protein and Diabetes: proving a negative, for a change?
C-Reactive Protein and Diabetes: proving a negative, for a change? Eric Brunner PhD FFPH Reader in Epidemiology and Public Health MRC Centre for Causal Analyses in Translational Epidemiology 2 March 2009
More informationVISUAL INTEGRATION OF RESULTS FROM A LARGE DNA BIOBANK (BIOVU) USING SYNTHESIS-VIEW *
VISUAL INTEGRATION OF RESULTS FROM A LARGE DNA BIOBANK (BIOVU) USING SYNTHESIS-VIEW * SARAH PENDERGRASS Center for Human Genetics Research, Department of Molecular Physiology and Biophysics, Vanderbilt
More informationData Science - A Glossary of Downloadabytes
Our future in big data science Damjan Vukcevic http://damjan.vukcevic.net/ 13 October 2015 SSA Canberra, Young Statisticians Workshop What is big data? You know it when you see it? Tell-tale signs: Need
More informationGenomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS
Genomic Selection in Dairy Cattle AQUAGENOME Applied Training Workshop, Sterling Hans Daetwyler, The Roslin Institute and R(D)SVS Dairy introduction Overview Traditional breeding Genomic selection Advantages
More informationGlobally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the
Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has
More informationSeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium
More informationSeeing Faces and History through Human Genome Sequences
Seeing Faces and History through Human Genome Sequences CAS/MPG Partner Group on the Human Functional Genetic Variations Shanghai-Leipzig, 2011.2.1 2016.1.31 Prof. Dr. TANG Kun (middle) with his cooperator,
More informationWork Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction
Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all
More informationNGS and complex genetics
NGS and complex genetics Robert Kraaij Genetic Laboratory Department of Internal Medicine r.kraaij@erasmusmc.nl Gene Hunting Rotterdam Study and GWAS Next Generation Sequencing Gene Hunting Mendelian gene
More informationFalse Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
More informationBasics of Marker Assisted Selection
asics of Marker ssisted Selection Chapter 15 asics of Marker ssisted Selection Julius van der Werf, Department of nimal Science rian Kinghorn, Twynam Chair of nimal reeding Technologies University of New
More informationSNPbrowser Software v3.5
Product Bulletin SNP Genotyping SNPbrowser Software v3.5 A Free Software Tool for the Knowledge-Driven Selection of SNP Genotyping Assays Easily visualize SNPs integrated with a physical map, linkage disequilibrium
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationCASSI: Genome-Wide Interaction Analysis Software
CASSI: Genome-Wide Interaction Analysis Software 1 Contents 1 Introduction 3 2 Installation 3 3 Using CASSI 3 3.1 Input Files................................... 4 3.2 Options....................................
More informationComputational Requirements
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationPredicting The Risk Of Rheumatoid Arthritis
Predicting The Risk Of Rheumatoid Arthritis Modelling Genetic And Environmental Risk Factors Ian Scott Arthritis Research UK Clinical Research Fellow Declaration Of Interests: No Competing Interests Describe
More informationEpigenetic variation and complex disease risk
Epigenetic variation and complex disease risk Caroline Relton Institute of Human Genetics Newcastle University ALSPAC Research Symposium 2 & 3 March 2009 Missing heritability Even when dozens of genes
More informationUKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015
UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory April, 2015 1 Contents Overview... 3 Rare Variants... 3 Observation... 3 Approach... 3 ApoE
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationDeterministic computer simulations were performed to evaluate the effect of maternallytransmitted
Supporting Information 3. Host-parasite simulations Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted parasites on the evolution of sex. Briefly, the simulations
More informationHigh-Order Interactions in Rheumatoid Arthritis Detected by Bayesian Method using Genome-Wide Association Studies Data
American Medical Journal 3 (1): 56-66, 2012 ISSN 1949-0070 2012 Science Publications High-Order Interactions in Rheumatoid Arthritis Detected by Bayesian Method using Genome-Wide Association Studies Data
More informationOne essential problem for population genetics is to characterize
Posterior predictive checks to quantify lack-of-fit in admixture models of latent population structure David Mimno a, David M. Blei b, and Barbara E. Engelhardt c,1 a Department of Information Science,
More informationTutorial on Markov Chain Monte Carlo
Tutorial on Markov Chain Monte Carlo Kenneth M. Hanson Los Alamos National Laboratory Presented at the 29 th International Workshop on Bayesian Inference and Maximum Entropy Methods in Science and Technology,
More informationHeritability: Twin Studies. Twin studies are often used to assess genetic effects on variation in a trait
TWINS AND GENETICS TWINS Heritability: Twin Studies Twin studies are often used to assess genetic effects on variation in a trait Comparing MZ/DZ twins can give evidence for genetic and/or environmental
More informationRobust procedures for Canadian Test Day Model final report for the Holstein breed
Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction
More informationEuropean Educational Programme in Epidemiology
European Educational Programme in Epidemiology 29 th RESIDENTIAL SUMMER COURSE FLORENCE, ITALY Pre-courses 13 17 JUNE 2016 1/13 European Educational Programme in Epidemiology Pre-Course: Introduction to
More informationGenetic Epidemiology Core Laboratory
2012 CGM Report Genetic Epidemiology Core Laboratory 卓 越 成 員 Remarkable member Wei J. Chen 陳 為 堅 Professor/ / EDUCATION AND POSITION HELD Bachelor of Medicine, College of Medicine, National Taiwan University,
More informationBig Data for Population Health
Big Data for Population Health Prof Martin Landray Nuffield Department of Population Health Deputy Director, Big Data Institute, Li Ka Shing Centre for Health Information and Discovery University of Oxford
More information-Power/Sample Size Considerations
-Power/Sample Size Considerations Jing Hua Zhao 1,2 1 MRC Unit 2 Institute of Metabolic Science Addenbrooke s Hospital Cambridge CB2 0QQ United Kingdom http://www.mrc-epid.cam.ac.uk/~jinghua.zhao E-mail:
More informationHow To Find Rare Variants In The Human Genome
UNIVERSITÀ DEGLI STUDI DI SASSARI Scuola di Dottorato in Scienze Biomediche XXV CICLO DOTTORATO DI RICERCA IN SCIENZE BIOMEDICHE INDIRIZZO DI GENETICA MEDICA, MALATTIE METABOLICHE E NUTRIGENOMICA Direttore:
More informationEuropean Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute
European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome
More informationPedigree-free descent-based gene mapping from population samples
Pedigree-free descent-based gene mapping from population samples Chris Glazner and Elizabeth Thompson Department of Statistics Technical Report # 632 University of Washington, Seattle, WA, USA January,
More informationGENOMIC information is transforming animal and plant
GENOMIC SELECTION Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking Hans D. Daetwyler,*,1 Mario P. L. Calus, Ricardo Pong-Wong, Gustavo de los Campos,
More informationGenotyping and quality control of UK Biobank, a large- scale, extensively phenotyped prospective resource
Genotyping and quality control of UK Biobank, a large- scale, extensively phenotyped prospective resource Information for researchers Interim Data Release, 2015 1 Introduction... 3 1.1 UK Biobank... 3
More informationAdvances in Natural and Applied Sciences
AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Clustering Algorithm Based On Hadoop for Big Data 1 Jayalatchumy D. and
More informationAssessing the Causal Relationship of Maternal Height on Birth Size and Gestational Age at Birth: A Mendelian Randomization Analysis
RESEARCH ARTICLE Assessing the Causal Relationship of Maternal Height on Birth Size and Gestational Age at Birth: A Mendelian Randomization Analysis Ge Zhang 1,2 *, Jonas Bacelis 3, Candice Lengyel 2,
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationB I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: GENOME-WIDE ASSOCIATION STUDIES 1 Setting
More informationDISCOVERY TOOL FOR GENOME-WIDE ASSOCIATION STUDIES
IPINBPA: AN INTEGRATIVE NETWORK-BASED FUNCTIONAL MODULE DISCOVERY TOOL FOR GENOME-WIDE ASSOCIATION STUDIES LILI WANG School of Computing, Queen s University 25 Union Street, Goodwin Hall, Kingston, Ontario,
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More information7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 5. Network Statistics, Chromatin Structure, Heritability, Association Testing (24 Points)
7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 5. Network Statistics, Chromatin Structure, Heritability, Association Testing (24 Points) Due: Thursday, May 1 st at noon. Python Scripts All Python scripts
More informationAN APPLICATION AND EMPIRICAL COMPARISON OF STATISTICAL ANALYSIS METHODS FOR ASSOCIATING RARE VARIANTS TO A COMPLEX PHENOTYPE
AN APPLICATION AND EMPIRICAL COMPARISON OF STATISTICAL ANALYSIS METHODS FOR ASSOCIATING RARE VARIANTS TO A COMPLEX PHENOTYPE VIKAS BANSAL *, ONDREJ LIBIGER *, ALI TORKAMANI * The Scripps Translational
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationBAPS: Bayesian Analysis of Population Structure
BAPS: Bayesian Analysis of Population Structure Manual v. 6.0 NOTE: ANY INQUIRIES CONCERNING THE PROGRAM SHOULD BE SENT TO JUKKA CORANDER (first.last at helsinki.fi). http://www.helsinki.fi/bsg/software/baps/
More informationWorkshop on Establishing a Central Resource of Data from Genome Sequencing Projects
Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing
More informationThe Human Genome. Genetics and Personality. The Human Genome. The Human Genome 2/19/2009. Chapter 6. Controversy About Genes and Personality
The Human Genome Chapter 6 Genetics and Personality Genome refers to the complete set of genes that an organism possesses Human genome contains 30,000 80,000 genes on 23 pairs of chromosomes The Human
More informationPopulation Genetics and Multifactorial Inheritance 2002
Population Genetics and Multifactorial Inheritance 2002 Consanguinity Genetic drift Founder effect Selection Mutation rate Polymorphism Balanced polymorphism Hardy-Weinberg Equilibrium Hardy-Weinberg Equilibrium
More informationThe Functional but not Nonfunctional LILRA3 Contributes to Sex Bias in Susceptibility and Severity of ACPA-Positive Rheumatoid Arthritis
The Functional but not Nonfunctional LILRA3 Contributes to Sex Bias in Susceptibility and Severity of ACPA-Positive Rheumatoid Arthritis Yan Du Peking University People s Hospital 100044 Beijing CHINA
More informationData Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis
Data Acquisition DNA microarrays The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering
More informationUniversity of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60
More informationAutomatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines
, 22-24 October, 2014, San Francisco, USA Automatic Mining of Internet Translation Reference Knowledge Based on Multiple Search Engines Baosheng Yin, Wei Wang, Ruixue Lu, Yang Yang Abstract With the increasing
More informationGWASrap User Manual v1.1
GWASrap User Manual v1.1 1 / 28 Table of contents Introduction... 3 System Requirements... 3 Welcome... 3 Features... 4 Create New Run... 5 GWAS Representation... 7 GWAS Annotation... 13 GWAS Prioritization...
More informationChapter 4. Quantitative genetics: measuring heritability
Chapter 4 Quantitative genetics: measuring heritability Quantitative genetics: measuring heritability Introduction 4.1 The field of quantitative genetics originated around 1920, following statistical
More informationGenetics of Rheumatoid Arthritis Markey Lecture Series
Genetics of Rheumatoid Arthritis Markey Lecture Series Al Kim akim@dom.wustl.edu 2012.09.06 Overview of Rheumatoid Arthritis Rheumatoid Arthritis (RA) Autoimmune disease primarily targeting the synovium
More informationBIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA
BIG DATA: CONVENTIONAL METHODS MEET UNCONVENTIONAL DATA Harvard Medical School & Harvard School of Public Health sharon@hcp.med.harvard.edu October 14, 2014 1 / 7 THE SETTING Unprecedented advances in
More informationStatistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual
Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western
More informationMissing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University
Missing Data in Longitudinal Studies: To Impute or not to Impute? Robert Platt, PhD McGill University 1 Outline Missing data definitions Longitudinal data specific issues Methods Simple methods Multiple
More informationLABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
More informationGENETIC STUDIES OF AUTOIMMUNE DISEASES. Benedicte Alexandre Lie Institute of Immunology Rikshospitalet University Hospital
GENETIC STUDIES OF AUTOIMMUNE DISEASES Benedicte Alexandre Lie Institute of Immunology Rikshospitalet University Hospital Autoimmune diseases Affects approximately 5 % of the population Results from an
More informationMethods for big data in medical genomics
Methods for big data in medical genomics Parallel Hidden Markov Models in Population Genetics Chris Holmes, (Peter Kecskemethy & Chris Gamble) Department of Statistics and, Nuffield Department of Medicine
More informationAssociation analysis for quantitative traits by data mining: QHPM
Ann. Hum. Genet. (2002), 66, 419 429 University College London DOI: 10.1017 S0003480002001318 Printed in the United Kingdom 419 Association analysis for quantitative traits by data mining: QHPM P. ONKAMO,,
More informationBioinformatics for cancer immunology and immunotherapy
Bioinformatics for cancer immunology and immunotherapy Zlatko Trajanoski Biocenter, Division for Bioinformatics Innsbruck Medical University Innrain 80, 6020 Innsbruck, Austria Email: zlatko.trajanoski@i-med.ac.at
More informationDigital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE
Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:
More informationGenomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future
Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future Daniel Masys, MD Professor and Chair Department of Biomedical Informatics Professor of Medicine Vanderbilt
More informationChildhood intelligence is heritable, highly polygenic
Molecular Psychiatry (2013), 1 6 & 2013 Macmillan Publishers Limited All rights reserved 1359-4184/13 www.nature.com/mp ORIGINAL ARTICLE Childhood intelligence is heritable, highly polygenic and associated
More informationAdmixture 1.23 Software Manual. David H. Alexander John Novembre Kenneth Lange
Admixture 1.23 Software Manual David H. Alexander John Novembre Kenneth Lange August 22, 2013 Contents 1 Quick start 1 2 Reference 3 2.1 How do I choose the correct value for K?................... 3 2.1.1
More informationGENETICS OF ALCOHOL USE AND LIVER ENZYMES:
GENETICS OF ALCOHOL USE AND LIVER ENZYMES: SUMMARY AND GENERAL DISCUSSION The studies described in this thesis aimed to unravel the genetic architecture of variation in alcohol use and blood levels of
More informationLitteratur. Lärandemål för undervisningstillfälle. Lecture Overview. Basic principles The twin design The adoption design
Litteratur Behavioral Genetics Twin and Adoptions studies Twin and adoption methods (Kapitel 5; sid 70-91) Henrik Larsson MEB Lärandemål för undervisningstillfälle - Studenten ska kunna redogöra för kvantitativa-genetiska
More informationPRINCIPLES OF POPULATION GENETICS
PRINCIPLES OF POPULATION GENETICS FOURTH EDITION Daniel L. Hartl Harvard University Andrew G. Clark Cornell University UniversitSts- und Landesbibliothek Darmstadt Bibliothek Biologie Sinauer Associates,
More informationTOWARD BIG DATA ANALYSIS WORKSHOP
TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.05-06 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)
More informationSamuel Zuvekas. Agency for Healthcare Research and Quality Working Paper No. 09003. August 2009
Validity of Household Reports of Medicare-covered Home Health Agency Use Samuel Zuvekas Agency for Healthcare Research and Quality Working Paper No. 09003 August 2009 Suggested citation: Zuvekas S. Validity
More informationINTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen
INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen Introduction to Genetic Epidemiology DIFFERENT FACES OF GENETIC EPIDEMIOLOGY 1 Basic epidemiology 1.a Aims of epidemiology 1.b
More informationUNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS
UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable
More informationMAGIC design. and other topics. Karl Broman. Biostatistics & Medical Informatics University of Wisconsin Madison
MAGIC design and other topics Karl Broman Biostatistics & Medical Informatics University of Wisconsin Madison biostat.wisc.edu/ kbroman github.com/kbroman kbroman.wordpress.com @kwbroman CC founders compgen.unc.edu
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More informationNovel Rheumatoid Arthritis Susceptibility Locus at 22q12 Identified in an Extended UK Genome-Wide Association Study
ARTHRITIS & RHEUMATOLOGY Vol. 66, No. 1, January 2014, pp 24 30 DOI 10.1002/art.38196 2014 The Authors. Arthritis & Rheumatology is published by Wiley Periodicals, Inc. on behalf of the American College
More informationSingle-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation
PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic
More informationRedwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 94305-5405.
W hittemoretxt050806.tex A Bayesian False Discovery Rate for Multiple Testing Alice S. Whittemore Department of Health Research and Policy Stanford University School of Medicine Correspondence Address:
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationHaplotype analysis of case-control data
Haplotype analysis of case-control data Yulia Marchenko Senior Statistician StataCorp LP 2010 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Haplotype analysis of case-control data September
More informationSupplementary Material: Covariate-adjusted matrix visualization via correlation decomposition
Supplementary Material: Covariate-adjusted matrix visualization via correlation decomposition Han-Ming Wu 1, Yin-Jing Tien 2, Meng-Ru Ho 3,4,5, Hai-Gwo Hwu 6, Wen-chang Lin 5, Mi-Hua Tao 5, and Chun-Houh
More informationAre differences in methylation in cord blood DNA associated with prenatal exposure to alcohol?
Are differences in methylation in cord blood DNA associated with prenatal exposure to alcohol? Luisa Zuccolo l.zuccolo@bristol.ac.uk MRC IEU, School of Social and Community Medicine Outline Background
More information