A Weighted SNP Correlation Network Analysis for the Estimation of Polygenic Risk Scores

Similar documents
Investigating the genetic basis for intelligence

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

Methods for network visualization and gene enrichment analysis July 17, Jeremy Miller Scientist I jeremym@alleninstitute.org

Tutorial for the WGCNA package for R I. Network analysis of liver expression data in female mice

Tutorial for the WGCNA package for R: I. Network analysis of liver expression data in female mice

Statistical Applications in Genetics and Molecular Biology

A General Framework for Weighted Gene Co-expression Network Analysis

Logistic Regression (1/24/13)

Bioinformatics: Network Analysis

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters

Epigenetic variation and complex disease risk

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Consistent Assay Performance Across Universal Arrays and Scanners

CCR Biology - Chapter 7 Practice Test - Summer 2012

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele

Healthcare Analytics. Aryya Gangopadhyay UMBC

Chapter 9 Patterns of Inheritance

Protein Protein Interaction Networks

How To Cluster

MAGIC design. and other topics. Karl Broman. Biostatistics & Medical Informatics University of Wisconsin Madison

VISUAL INTEGRATION OF RESULTS FROM A LARGE DNA BIOBANK (BIOVU) USING SYNTHESIS-VIEW *

Heritability: Twin Studies. Twin studies are often used to assess genetic effects on variation in a trait

DNA Determines Your Appearance!

Distances, Clustering, and Classification. Heatmaps

A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm

Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon

Summary and conclusions. Chapter 8. Summary and conclusions

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

NGS and complex genetics

Genomic Selection in. Applied Training Workshop, Sterling. Hans Daetwyler, The Roslin Institute and R(D)SVS

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Heredity. Sarah crosses a homozygous white flower and a homozygous purple flower. The cross results in all purple flowers.

1. Introduction Gene regulation Genomics and genome analyses Hidden markov model (HMM)

A 6-day Course on Statistical Genetics

A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes.

Chromosomes, Mapping, and the Meiosis Inheritance Connection

Name: 4. A typical phenotypic ratio for a dihybrid cross is a) 9:1 b) 3:4 c) 9:3:3:1 d) 1:2:1:2:1 e) 6:3:3:6

DISCOVERY TOOL FOR GENOME-WIDE ASSOCIATION STUDIES

Journal of Statistical Software

Chapter 4. Quantitative genetics: measuring heritability

GOBII. Genomic & Open-source Breeding Informatics Initiative

LIST OF TABLES. 4.3 The frequency distribution of employee s opinion about training functions emphasizes the development of managerial competencies

The Functional but not Nonfunctional LILRA3 Contributes to Sex Bias in Susceptibility and Severity of ACPA-Positive Rheumatoid Arthritis

Statistical Databases and Registers with some datamining

Exploratory Data Analysis with Categorical Variables: An Improved Rank-by-Feature Framework and a Case Study

Population Genetics and Multifactorial Inheritance 2002

How To Understand The Network Of A Network

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Data Mining - Evaluation of Classifiers

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING

TECHNOLOGIES, PRODUCTS & SERVICES for MOLECULAR DIAGNOSTICS, MDx ABA 298

Tests in a case control design including relatives

Univariate Regression

Terms: The following terms are presented in this lesson (shown in bold italics and on PowerPoint Slides 2 and 3):

From Disease Association to Risk Assessment: An Optimistic View from Genome-Wide Association Studies on Type 1 Diabetes

Core Facility Genomics

Genetics 1. Defective enzyme that does not make melanin. Very pale skin and hair color (albino)

DATA MINING TECHNIQUES AND APPLICATIONS

The Human Genome. Genetics and Personality. The Human Genome. The Human Genome 2/19/2009. Chapter 6. Controversy About Genes and Personality

Human Genome Organization: An Update. Genome Organization: An Update

Mendelian and Non-Mendelian Heredity Grade Ten

I. Genes found on the same chromosome = linked genes

Okami Study Guide: Chapter 3 1

AP Biology Essential Knowledge Student Diagnostic

C-Reactive Protein and Diabetes: proving a negative, for a change?

EHRs and large scale comparative effectiveness research

GWASrap User Manual v1.1

7.36/7.91/20.390/20.490/6.802/6.874 PROBLEM SET 5. Network Statistics, Chromatin Structure, Heritability, Association Testing (24 Points)

GEMMA User Manual. Xiang Zhou. May 18, 2016

Hacking Brain Disease for a Cure

Genetics Lecture Notes Lectures 1 2

Tutorial for the WGCNA package for R: I. Network analysis of liver expression data in female mice

EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS?

Basics of Marker Assisted Selection

Evolution by Natural Selection 1

Confounding, interaction, and mediation in multivariable/multivariate regression modeling

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

CHROMOSOMES AND INHERITANCE

Chapter 7 Factor Analysis SPSS

Data Mining: An Overview. David Madigan

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

INTRODUCTION TO GENETIC EPIDEMIOLOGY (EPID0754) Prof. Dr. Dr. K. Van Steen

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Variations on a Human Face Lab

Linear Models in STATA and ANOVA

A Demonstration of Hierarchical Clustering

Multiple regression - Matrices

Genetics Module B, Anchor 3

Lesson Plan: GENOTYPE AND PHENOTYPE

Elementary Statistics

Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg

The Human Genome Project

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

School of Nursing. Presented by Yvette Conley, PhD

The correct answer is c A. Answer a is incorrect. The white-eye gene must be recessive since heterozygous females have red eyes.

Hierarchical Cluster Analysis Some Basics and Algorithms

Transcription:

A Weighted SNP Correlation Network Analysis for the Estimation of Polygenic Risk Scores Morgan Levine Department of Human Genetics, UCLA

PERSONALIZED MEDICINE Genetic association studies were expected to revolutionize the diagnosis, prevention and treatment of human diseases. Unfortunately, identifying predictive genetic markers has proven to be more difficult than anticipated. Many results fail to replicate or only explain a very small proportion of the variance in a given trait.

GENETIC ASSOCIATION In GWAS, the association between the trait of interest and m SNPs (often in the millions) is assessed using m regression models, where for each model, the trait is regressed on a single SNP. However most complex traits are thought to be polygenic influenced by multiple loci.

POLYGENIC SCORES Polygenic methods have the ability to aid in genetic association studies by: Increasing statistical power to detect true effects via dimension reduction; Providing biological insight (e.g. pathways, mechanisms) In 2007, Wray et al. proposed a method for examining the additive influence of multiple genetic markers. Biology is complex! It is likely that genes operate in a non-linear manner to influence complex traits.

RESEARCH AIM Develop a network-based method for creating polygenic scores. Networks and systems biology approaches have been successfully used to associate gene expression and methylation to various traits. Use human height for proof of concept. Highly heritable (around 80%) Easy to measure Used in lots of genetic studies.

WSCNA Methodology In weighted SNP correlation network analysis (WSCNA), a network can be represented by an adjacency matrix A=[a kj ] that encodes a connection between a pair of nodes (SNPs k,j). x 1,1 x k,1 x 1,j x k,j Use genotype data or published GWAS results from multiple studies/cohorts to build the matrix. Genotype data from 10,466 persons of European ancestry who were participants in the Health and Retirement Study (HRS).

WSCNA Methodology 1. We randomly divided our sample into a training set (70%) and a test set (30%). 2. A GWAS for human height was run using the training data in order to prune SNPs based on linkage disequilibrium (LD). Most significant SNP per block was selected 3. Generated 60 subsamples of 500 participants each (with replacement) from training data. GWAS for human height was run for each of these subsamples using only SNPs selected from step 2. 4. Beta coefficients from GWAS used to populate n m matrices n refers to the number of examined SNPs m refers to the number of GWAS from which results have been gathered. We generated a 32,284 60 matrix.

Module Detection SNP Dendrogram n m matrix is used to create adjacency matrix from biweight midcorrelations raised to a power (b). Topological Overlap Matrix (based on shared SNP neighbors) calculated from adjacencies, defines dissimilarities for hierarchical clustering. Modules (colors) represent clusters (or networks) of densely interconnected SNPs.

Results 1. Fifty-five modules were identified. 2. Calculated eigen-nodes (network-specific polygenic scores). 3. Tested whether they were associated with human height in the validation samples. 4. Seven modules were found to be associated with height. Significant Modules Identified using WSCNA Modules SNPs SNPs in Genes b (P-value) Light Green 193 100 0.305 (3.84E-6) Salmon 235 90-0.236 (0.003) Ivory 109 48 0.187 (0.015) Navajo White2 63 37 0.162 (0.021) Violet 131 66-0.150 (0.028) Thistle2 85 38 0.145 (0.030) Dark orange2 93 49-0.137 (0.043) Beta Coefficients and P-values come from a single model with the residual of height (adjusting for age, sex and PC1-4) as the dependent variable and all 55 modules identified in WSCNA as the independent variables.

Results 1. Created traditional polygenic scores to compare against. 2. Examined the variance in height as explained by traditional vs. network polygenic scores. Variance in Human Height Explained by Models Containing Different Polygenic Scores Independent Variable/s in Each Model R 2 Adjusted R 2 PRS 0.05 (n=32,284) 0.0096 0.0093 PRS 0.005 (n=4,318) 0.0084 0.0082 PRS 0.0005 (n=507) 0.0083 0.0079 Light Green Module (n=193) 0.007 0.0066 The seven significant WSCNA modules (n=909) 0.0163 0.0139 n refers to the number of SNPs used to generate the polygenic score/s in each model.

Hub SNPs Calculate intra-modular connectivity to identify hubs in each network. Do hubs have higher significance in the GWAS? Yes (light green network) Are there any hubs previously associated with height? HHIP gene, which has a reported association with height in GWAS and microarray (gene expression)

Conclusions Incorporation of network structure in the analysis of large-scale genetic association data can be used to estimate genetic scores for specific traits, identify hub SNPs/genes, and lead to biological insight into the pathways involved. Scores generated from WSCNA better relate to phenotypes of interest in validation analysis compared to traditional polygenic risk scores. Next Steps: Use this methodology to examine genetic network by environment effects (GxE), and examine genetic pleiotropy. Create aging scores (e.g. networks identified by comparing SNP contributions across multiple aging-related conditions).

Acknowledgements Collaborators Steve Horvath, UCLA Peter Langfelder, UCLA NIH Funding NINDS T32NS048004 NIA 5R01AG042511-02