Statistical analysis of genome-wide association (GWAS) data
|
|
|
- Debra Norman
- 10 years ago
- Views:
Transcription
1 Statistical analysis of genome-wide association (GWAS) data Jim Stankovich Menzies Research Institute University of Tasmania
2 Outline Introduction Confounding variables and linkage disequilibrium Statistical methods to test for association in case-control GWA studies Allele counting chi-square test Logistic regression Multiple testing and power Example: GWAS for multiple sclerosis (MS) Data cleaning / quality control Results
3 GWA studies have been very successful since 2007 Prior to the advent of GWA studies, there was very little success in identifying genetic risk factors for complex multifactorial diseases GWA studies have identified over 200 separate associations with various complex diseases in the past two years Human Genetic Variation hailed as Breakthrough of the Year by Science magazine in Human genome project The SNP consortium SNP genotyping arrays The International HapMap Project GWA studies
4 This talk: case-control GWA studies Obtain DNA from people with disease of interest (cases) and unaffected controls Run each DNA sample on a SNP chip to measure genotypes at 300,000-1,000,000 SNPs in cases and controls Identify SNPs where one allele is significantly more common in cases than controls The SNP is associated with disease SNP: rs stroke
5 This talk: case-control GWA studies Obtain DNA from people with disease of interest (cases) and unaffected controls Run each DNA sample on a SNP chip to measure genotypes at 300,000-1,000,000 SNPs in cases and controls Identify SNPs where one allele is significantly more common in cases than controls The SNP is associated with disease Alternative strategy (Peter Visscherʼs talk): test for association between SNPs and a quantitative trait that underlies the disease (endophenotype) SNP: rs blood pressure stroke
6 Association does not imply causation Suppose that genotypes at a particular SNP are significantly associated with disease This may be because the SNP is associated with some other factor (a confounder), which is associated with disease but is not in the same causal pathway SNP near lactase gene multiple sclerosis (MS)
7 Association does not imply causation Suppose that genotypes at a particular SNP are significantly associated with disease This may be because the SNP is associated with some other factor (a confounder), which is associated with disease but is not in the same causal pathway Northern European ancestry SNP near lactase gene multiple sclerosis (MS)
8 Association does not imply causation Suppose that genotypes at a particular SNP are significantly associated with disease This may be because the SNP is associated with some other factor (a confounder), which is associated with disease but is not in the same causal pathway Possible confounders of genetic associations: Ethnic ancestry Genotyping batch, genotyping centre DNA quality Environmental exposures in the same causal pathway Nicotine receptors --> smoking --> lung cancer Hung et al, Nature 452: 633 (2008) + other articles in same issue Alcohol dehydrogenase genes --> alcohol consumption --> throat cancer Hashibe et al, Nature Genetics 40: 707 (2008)
9 Helpful confounding: linkage disequilibrium Linkage disequilibrium (LD) is the non-independence of alleles at nearby markers in a population because of a lack of recombinations between the markers 50,000 years ago: Today: ~50kb Haplotype block
10 Direct and indirect association testing Hirschhorn and Daly: Nature Reviews Genetics 6: 95 (2005) Functional SNP is genotyped and an association is found Functional SNP (blue) is not genotyped, but a number of other SNPs (red), in LD with the functional SNP, are genotyped, and an association is found for these SNPs
11 LD is helpful, because not all SNPs have to be genotyped Peʼer et al: Nature Genetics 38: 663 (2006)
12 Allele counting to test for association between SNP genotype and case / control status GG GT TT Total Cases r 0 r 1 r 2 R Controls s 0 s 1 s 2 S Total n 0 n 1 n 2 N Observed allele counts G T Total Cases 2r 0 +r 1 r 1 +2r 2 2R Controls 2s 0 +s 1 s 1 +2s 2 2S Total 2n 0 +n 1 n 1 +2n 2 2N
13 Allele counting to test for association between SNP genotype and case / control status GG GT TT Total Cases r 0 r 1 r 2 R Controls s 0 s 1 s 2 S Total n 0 n 1 n 2 N Observed allele counts G T Total Cases 2r 0 +r 1 r 1 +2r 2 2R Controls 2s 0 +s 1 s 1 +2s 2 2S Total 2n 0 +n 1 n 1 +2n 2 2N Expected allele counts G T 2R(2n 0 +n 1 )/(2N) 2R(n 1 +2n 2 )/(2N) 2S(2n 0 +n 1 )/(2N) 2S(n 1 +2n 2 )/(2N)
14 Allele counting to test for association between SNP genotype and case / control status GG GT TT Total Cases r 0 r 1 r 2 R Controls s 0 s 1 s 2 S Total n 0 n 1 n 2 N Observed allele counts G T Total Cases 2r 0 +r 1 r 1 +2r 2 2R Controls 2s 0 +s 1 s 1 +2s 2 2S Total 2n 0 +n 1 n 1 +2n 2 2N Expected allele counts G T 2R(2n 0 +n 1 )/(2N) 2R(n 1 +2n 2 )/(2N) 2S(2n 0 +n 1 )/(2N) 2S(n 1 +2n 2 )/(2N) Chi-square test for independence of rows and columns (null hypothesis): (Obs Exp) 2 Exp ~ χ 2 with 1 df PLINK --assoc option Other options (e.g. dominant/recessive models) --model
15 The odds ratio: a measure of effect size Odds of an event occurring = Pr(event occurs) / Pr(event doesnʼt occur) = Pr(event occurs) / [1 - Pr(event occurs)] Allele counts G T Cases a b Controls c d Consider all the G alleles in the sample, and pick one at random. The odds that the G allele occurs in a case: a/c Consider all the T alleles in the sample, and pick one at random. The odds that a T allele occurs in a case: b/d odds ratio = odds that G allele occurs in a case = a/c = a d odds that T allele occurs in a case b/d b c
16 Interpretation of the odds ratio G T Cases a b Controls c d odds ratio (OR) = odds that G allele occurs in a case = a d odds that T allele occurs in a case b c OR = increase in odds of being a case for each additional G allele OR = 1: no association between genotype and disease OR > 1: G allele increases risk of disease OR < 1: T allele increases risk of disease If the disease is rare (e.g. ~0.1% for MS), the odds ratio is roughly equal to the genotype relative risk (GRR): the increase in risk of disease conferred by each additional G allele e.g. if OR = 1.2, Pr(MS TT) = 0.1% Pr(MS GT) = 0.12% Pr(MS GG) = 0.144%
17 Logistic regression: more flexible analysis for GWA studies Similar to linear regression, used for binary outcomes instead of continuous outcomes Let Y i be the phenotype for individual i Y i = 0 for controls Y i = 1 for cases Let X i be the genotype of individual i at a particular SNP TT X i = 0 GT X i = 1 GG X i = 2
18 Logistic regression: more flexible analysis for GWA studies Similar to linear regression, used for binary outcomes instead of continuous outcomes Let Y i be the phenotype for individual i Y i = 0 for controls Y i = 1 for cases Let X i be the genotype of individual i at a particular SNP TT X i = 0 GT X i = 1 GG X i = 2 Basic logistic regression model Let p i = E(Y i X i ), expected value of pheno given geno Define logit(p i ) = log e [p i /(1- p i ) ]
19 Logistic regression: more flexible analysis for GWA studies Similar to linear regression, used for binary outcomes instead of continuous outcomes Let Y i be the phenotype for individual i Y i = 0 for controls Y i = 1 for cases Let X i be the genotype of individual i at a particular SNP TT X i = 0 GT X i = 1 GG X i = 2 Basic logistic regression model Let p i = E(Y i X i ), expected value of pheno given geno Define logit(p i ) = log e [p i /(1- p i ) ] logit(p i ) ~ β 0 + β 1 X i
20 Logistic regression: more flexible analysis for GWA studies Similar to linear regression, used for binary outcomes instead of continuous outcomes Let Y i be the phenotype for individual i Y i = 0 for controls Y i = 1 for cases Let X i be the genotype of individual i at a particular SNP TT X i = 0 GT X i = 1 GG X i = 2 Basic logistic regression model Let p i = E(Y i X i ), expected value of pheno given geno Define logit(p i ) = log e [p i /(1- p i ) ] logit(p i ) ~ β 0 + β 1 X i Test whether β 1 differs significantly from zero: roughly equivalent to allele counting chi-square test Estimate of odds ratio: exp(β 1 )
21 Logistic regression: more flexible analysis for GWA studies Similar to linear regression, used for binary outcomes instead of continuous outcomes Let Y i be the phenotype for individual i Y i = 0 for controls Y i = 1 for cases Let X i be the genotype of individual i at a particular SNP TT X i = 0 GT X i = 1 GG X i = 2 Add extra terms to adjust for potential confounders: e.g. ethnicity, genotyping batch, genotypes at other SNPs Let p i = E(Y i X i,c i, D i, ) logit(p i ) ~ β 0 + β 1 X i + β 2 C i + β 3 D i + PLINK --logistic
22 Multiple testing Suppose you test 500,000 SNPs for association with disease Expect around 500,000 x 0.05 = 25,000 to have p-value less than 0.05 More appropriate significance threshold p = 0.05 / 500,000 = 10-7 genome-wide significance In our MS GWAS we considered SNPs for follow-up if they had p- values less than To detect a smaller p-value need a larger study
23 The power to detect an association Suppose the G allele of a SNP has frequency 0.2. If each additional G allele increases odds of disease by 1.2, and 1618 cases and 3413 controls are genotyped, what is the power (chance) of detecting an association with significance p<0.001? Null hypothesis: true OR=1 Observed OR from this distribution p= Odds ratio (OR)
24 The power to detect an association Suppose the G allele of a SNP has frequency 0.2. If each additional G allele increases odds of disease by 1.2, and 1618 cases and 3413 controls are genotyped, what is the power (chance) of detecting an association with significance p<0.001? Null hypothesis: true OR=1 Observed OR from this distribution Alternative hypothesis: true OR=1.2 Observed OR from this distribution 1 p= Odds ratio (OR)
25 The power to detect an association Suppose the G allele of a SNP has frequency 0.2. If each additional G allele increases odds of disease by 1.2, and 1618 cases and 3413 controls are genotyped, what is the power (chance) of detecting an association with significance p<0.001? Null hypothesis: true OR=1 Observed OR from this distribution Alternative hypothesis: true OR=1.2 Observed OR from this distribution Power = blue shaded area = 59% 1 p= Odds ratio (OR)
26 Effect of increasing sample size Observed OR tends to be closer to true OR (narrower distributions) Null and alternative distributions become more separate Power increases Null hypothesis: true OR=1 Observed OR from this distribution Alternative hypothesis: true OR=1.2 Observed OR from this distribution Power is now greater p= Odds ratio (OR)
27 Multiple sclerosis - degradation of myelin sheath around nerve fibres demyelination.html
28 Multiple sclerosis neurodegenerative disease autoimmune attack on myelin sheaths around nerve cells more females affected than males (3:1) average age-at-onset ~30 ~16,000 people with MS in Australia ($2 billion p.a.) no cure
29 Epstein-Barr virus Risk factors Exposure to infant siblings (Ponsonby et al, JAMA, 2005) Latitude gradient, childhood sun exposure (van der Mei et al, Lancet, 2003) Only genetic risk factor known before 2007 (first GWAS): HLA-DRB1*1501 discovered in 1972 (60% MS and 30% controls) IL7R IL2RA CLEC16A CD226 KIF1B TYK2 CD58 EVI5/RPL Human genome project The SNP consortium SNP genotyping arrays The International HapMap Project GWA studies
30 Australian and New Zealand MS GWAS Assemble collection of DNA samples (all states + NZ) Genotype 1952 MS cases from around Australia and New Zealand with Illumina 370CNV BeadChips (Patrick Danoy, Matt Brown, Diamantina Institute, UQ) Analyse GWAS data Quality control (Devindri Perera, Menzies) Impute genotypes at millions of other SNPs (Sharon Browning, Univ of Auckland) Compare case genotypes with >3500 controls from the UK and US (publicly available data) Replication genotyping (Justin Rubioʼs lab, Howard Florey Institute, Univ of Melbourne)
31 Quality control - MS samples (PLINK) Start with 1952 samples Exclusions Samples with >2% of SNPs not called 70 --mind
32 Genotype call rate No call rate Oragene saliva DNA Samples
33 Quality control - MS samples (PLINK) Start with 1952 samples Exclusions Samples with >2% of SNPs not called 70 Suspect batch of samples 128 Uncertain phenotype 10 Duplicates / relatives 88 Ancestry outliers 35 --mind --genome
34 Quality control - ethnicity N European African Principal components analysis: EIGENSTRAT Price et al (2006). Nat Genet 38: 904 Japanese Chinese Use an independent set of ~77,000 SNPs --indep-pairwise 178 outliers removed: - 35 MS controls
35 Quality control - MS samples Start with 1952 samples Exclusions Samples with >2% of SNPs not called 70 Suspect batch of samples 128 Uncertain phenotype 10 Duplicates / relatives 88 Ancestry outliers 35 Sex discrepancies 3 Leaves 1618 samples --mind --genome --check-sex Quality control - SNPs Start with 310,504 SNPs in both case and control datasets Exclude SNPs Not called in >5% of samples In Hardy-Weinberg disequilibrium Where one allele has frequency < 1% Leaves 302,098 SNPs --geno --hwe --maf
36 Choice of 5% no-call threshold We originally planned to use a 10% threshold, but lots of SNPs with no call rate 5-10% showed deviations from Hardy-Weinberg equilibrium Closer look at SNPs with call rates between 5% and 10% suggested that they were unreliable
37 GWAS - results Total sample = 1618 MS cases controls HLA P=7 x P=10-7 # 50 # 500 P=0.001
38 Extra QC for associated SNPs: cluster plots UK controls ANZ cases both
39 The replication phase Selected 100 SNPs for replication genotyping 2,256 ANZ MS cases + 2,310 ANZ controls Two chromosome regions on chr 12 and chr 20 showed (almost) genome-wide significant (p<5 x 10-7 ) association with MS after combining GWAS and replication data SNPs in 13/53 other regions with replication p-values < 0.1: more than expected by chance (p=0.002)
40 Chromosome 12 association: the downside of LD P=4.1 x 10-6 P= P= rs GWAS P = 4.1 x 10-6 replication P = 1.4 x 10-6 GWAS + rep P = 5.4 x Allele frequency 0.33 Odds ratio 0.81 (protective)
41 Chromosome 12 association: the downside of LD P=4.1 x 10-6 P= P= KIF5A SNP associated with rheumatoid arthritis and type 1 diabetes
42 Chromosome 12 association: the downside of LD P=4.1 x 10-6 P= P= Logistic regression with both SNPs in the same model: rs p = rs p = 0.08
43 Chromosome 12 association: the downside of LD P=4.1 x 10-6 P= P= CYP27B1: most likely candidate??
44 CYP27B1 Cytochrome p450 gene family (drug metabolizing) Encodes 25-hydroxyvitamin D-1 alpha hydroxylase (1α OHase) Converts 25(OH)D 3 to bioactive 1,25(OH) 2 D 3 1,25(OH) 2 D 3 regulates calcium metabolism and the immune system via vitamin D receptor (VDR) 25(OH)D 3 Liver 7 dehydrocholesterol UVB Vit D Diet HLA-DRB1*1501 Adorini and Penna (2008) Nat Clin Prac Rheum 4:
45 The chromosome 20 association P= rs GWAS P = 2.5 x 10-5 replication P = 4.6 x 10-4 GWAS + rep P = 1.3 x 10-7 Allele frequency 0.25 Odds ratio 1.20 (increased risk)
46 CD40 Member of TNF receptor superfamily: regulates many cell- and antibody-mediated immune responses SNPs in CD40 are associated with risk of rheumatoid arthritis and Gravesʼ disease Functional SNP rs c>t, 1 base pair upstream of the ATG translation initiation codon Allelic heterogeneity T allele CD40 protein MS -1 ATG C allele CD40 protein RA GD
47 Another use of logistic regression: test for gene-gene interaction MS risk alleles Chr 12 = rs703842a Chr 20 = rs g Odds ratio Chr20 risk Chr12 risk Modest evidence that each risk allele has a bigger effect in the presence of the other risk allele (p = 0.03)
48 Summary Case-control GWA studies have been very successful in the past couple of years Linkage disequilibrium means that most, but not all, common human genetic variation is captured by genotyping a few hundred thousand SNPs Small effect sizes (e.g. OR 1.2) mean that GWA studies need to be large, with thousands of cases and controls --> big collaborations Methods of statistical analysis are fairly straightforward, but care is required to clean data The ultimate test of any association: replication in an independent population
49 Acknowledgments - MS GWAS Hobart: Melbourne: Newcastle: Devindri Perera Bruce Taylor Karen Drysdale Preethi Guru Brendan McMorran Simon Foote Justin Rubio Melanie Bahlo Helmut Butzkueven Vicky Perreau Laura Johnson Judith Field Cathy Jensen Ella Wilkins Caron Chapman Mark Marriott Niall Tubridy Trevor Kilpatrick Jeanette Lechner-Scott Rodney Scott Pablo Moscato Mathew Cox Sydney: Gold Coast: Brisbane: Perth: Graeme Stewart David Booth Robert Heard Jim Wiley Simon Broadley Lyn Griffiths Lotfi Tajouri Michael Pender Matthew Brown Patrick Danoy Johanna Hadler Karen Pryce Peter Csurshes Judith Greer Bill Carroll Alan Kermode Adelaide: New Zealand: Mark Slee Sharon Browning Brian Browning Deborah Mason Ernie Willoughby Glynnis Clarke Ruth McCallum Marilyn Merriman Tony Merriman The Australian and NZ MS Genetics Consortium (2009). Nat Genet 41: 824 Funding: MS Research Australia, John T Reid Charitable Trusts, Trish MS Research Foundation, Australian Research Council
Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the
Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has
The Functional but not Nonfunctional LILRA3 Contributes to Sex Bias in Susceptibility and Severity of ACPA-Positive Rheumatoid Arthritis
The Functional but not Nonfunctional LILRA3 Contributes to Sex Bias in Susceptibility and Severity of ACPA-Positive Rheumatoid Arthritis Yan Du Peking University People s Hospital 100044 Beijing CHINA
GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters
GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters Michael B Miller , Michael Li , Gregg Lind , Soon-Young
SNPbrowser Software v3.5
Product Bulletin SNP Genotyping SNPbrowser Software v3.5 A Free Software Tool for the Knowledge-Driven Selection of SNP Genotyping Assays Easily visualize SNPs integrated with a physical map, linkage disequilibrium
NGS and complex genetics
NGS and complex genetics Robert Kraaij Genetic Laboratory Department of Internal Medicine [email protected] Gene Hunting Rotterdam Study and GWAS Next Generation Sequencing Gene Hunting Mendelian gene
Factors for success in big data science
Factors for success in big data science Damjan Vukcevic Data Science Murdoch Childrens Research Institute 16 October 2014 Big Data Reading Group (Department of Mathematics & Statistics, University of Melbourne)
Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan
Combining Data from Different Genotyping Platforms Gonçalo Abecasis Center for Statistical Genetics University of Michigan The Challenge Detecting small effects requires very large sample sizes Combined
Multiple Sclerosis: a NZ Perspective. Dr Deborah Mason Neurologist Christchurch Hospital
Multiple Sclerosis: a NZ Perspective Dr Deborah Mason Neurologist Christchurch Hospital Multiple Sclerosis in NZ What is MS Epidemiology of MS in NZ Impact of the disease on PwMS Genetic and Environmental
Heritability: Twin Studies. Twin studies are often used to assess genetic effects on variation in a trait
TWINS AND GENETICS TWINS Heritability: Twin Studies Twin studies are often used to assess genetic effects on variation in a trait Comparing MZ/DZ twins can give evidence for genetic and/or environmental
Genetics of Rheumatoid Arthritis Markey Lecture Series
Genetics of Rheumatoid Arthritis Markey Lecture Series Al Kim [email protected] 2012.09.06 Overview of Rheumatoid Arthritis Rheumatoid Arthritis (RA) Autoimmune disease primarily targeting the synovium
Tutorial on gplink. http://pngu.mgh.harvard.edu/~purcell/plink/gplink.shtml. PLINK tutorial, December 2006; Shaun Purcell, [email protected].
Tutorial on gplink http://pngu.mgh.harvard.edu/~purcell/plink/gplink.shtml Basic gplink analyses Data management Summary statistics Association analysis Population stratification IBD-based analysis gplink
Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
Online Supplement to Polygenic Influence on Educational Attainment Construction of Polygenic Score for Educational Attainment Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
Risk Factors for Alcoholism among Taiwanese Aborigines
Risk Factors for Alcoholism among Taiwanese Aborigines Introduction Like most mental disorders, Alcoholism is a complex disease involving naturenurture interplay (1). The influence from the bio-psycho-social
Logistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
Differential privacy in health care analytics and medical research An interactive tutorial
Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could
Genotyping and quality control of UK Biobank, a large- scale, extensively phenotyped prospective resource
Genotyping and quality control of UK Biobank, a large- scale, extensively phenotyped prospective resource Information for researchers Interim Data Release, 2015 1 Introduction... 3 1.1 UK Biobank... 3
Validation and Replication
Validation and Replication Overview Definitions of validation and replication Difficulties and limitations Working examples from our group and others Why? False positive results still occur. even after
A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm
A Multi-locus Genetic Risk Score for Abdominal Aortic Aneurysm Zi Ye, 1 MD, Erin Austin, 1,2 PhD, Daniel J Schaid, 2 PhD, Iftikhar J. Kullo, 1 MD Affiliations: 1 Division of Cardiovascular Diseases and
School of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis
SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis Goal: This tutorial introduces several websites and tools useful for determining linkage disequilibrium
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects
Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing
Journal of Statistical Software
JSS Journal of Statistical Software October 2006, Volume 16, Code Snippet 3. http://www.jstatsoft.org/ LDheatmap: An R Function for Graphical Display of Pairwise Linkage Disequilibria between Single Nucleotide
Epigenetic variation and complex disease risk
Epigenetic variation and complex disease risk Caroline Relton Institute of Human Genetics Newcastle University ALSPAC Research Symposium 2 & 3 March 2009 Missing heritability Even when dozens of genes
Big Data for Population Health and Personalised Medicine through EMR Linkages
Big Data for Population Health and Personalised Medicine through EMR Linkages Zheng-Ming CHEN Professor of Epidemiology Nuffield Dept. of Population Health, University of Oxford Big Data for Health Policy
B I O I N F O R M A T I C S
B I O I N F O R M A T I C S Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg [email protected] CHAPTER 4: GENOME-WIDE ASSOCIATION STUDIES 1 Setting
SNP Essentials The same SNP story
HOW SNPS HELP RESEARCHERS FIND THE GENETIC CAUSES OF DISEASE SNP Essentials One of the findings of the Human Genome Project is that the DNA of any two people, all 3.1 billion molecules of it, is more than
Biomedical Big Data and Precision Medicine
Biomedical Big Data and Precision Medicine Jie Yang Department of Mathematics, Statistics, and Computer Science University of Illinois at Chicago October 8, 2015 1 Explosion of Biomedical Data 2 Types
GWAS Data Cleaning. GENEVA Coordinating Center Department of Biostatistics University of Washington. January 13, 2016.
GWAS Data Cleaning GENEVA Coordinating Center Department of Biostatistics University of Washington January 13, 2016 Contents 1 Overview 2 2 Preparing Data 3 2.1 Data formats used in GWASTools............................
SAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
SNP Data Integration and Analysis for Drug- Response Biomarker Discovery
B. Comp Dissertation SNP Data Integration and Analysis for Drug- Response Biomarker Discovery By Chen Jieqi Pauline Department of Computer Science School of Computing National University of Singapore 2008/2009
STATISTICS 8, FINAL EXAM. Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4
STATISTICS 8, FINAL EXAM NAME: KEY Seat Number: Last six digits of Student ID#: Circle your Discussion Section: 1 2 3 4 Make sure you have 8 pages. You will be provided with a table as well, as a separate
Haplotype analysis of case-control data
Haplotype analysis of case-control data Yulia Marchenko Senior Statistician StataCorp LP 2010 UK Stata Users Group Meeting Yulia Marchenko (StataCorp) Haplotype analysis of case-control data September
Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems
Technical Issues in Aggregating and Analyzing Data from Heterogeneous EHR Systems Josh Denny, MD, MS [email protected] Vanderbilt University, Nashville, Tennessee, USA 2/12/2015 EHR data are dense
The Developing Person Through the Life Span 8e by Kathleen Stassen Berger
The Developing Person Through the Life Span 8e by Kathleen Stassen Berger Chapter 3 Heredity and Environment PowerPoint Slides developed by Martin Wolfger and Michael James Ivy Tech Community College-Bloomington
Hardy-Weinberg Equilibrium Problems
Hardy-Weinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in Hardy-Weinberg equilibrium. (a) Calculate the percentage of
A Genetic Analysis of Rheumatoid Arthritis
A Genetic Analysis of Rheumatoid Arthritis Introduction to Rheumatoid Arthritis: Classification and Diagnosis Rheumatoid arthritis is a chronic inflammatory disorder that affects mainly synovial joints.
UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015
UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory April, 2015 1 Contents Overview... 3 Rare Variants... 3 Observation... 3 Approach... 3 ApoE
AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
Introduction to Statistics and Quantitative Research Methods
Introduction to Statistics and Quantitative Research Methods Purpose of Presentation To aid in the understanding of basic statistics, including terminology, common terms, and common statistical methods.
A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes.
1 Biology Chapter 10 Study Guide Trait A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes. Genes Genes are located on chromosomes
Autoimmunity and immunemediated. FOCiS. Lecture outline
1 Autoimmunity and immunemediated inflammatory diseases Abul K. Abbas, MD UCSF FOCiS 2 Lecture outline Pathogenesis of autoimmunity: why selftolerance fails Genetics of autoimmune diseases Therapeutic
Population Genetics and Multifactorial Inheritance 2002
Population Genetics and Multifactorial Inheritance 2002 Consanguinity Genetic drift Founder effect Selection Mutation rate Polymorphism Balanced polymorphism Hardy-Weinberg Equilibrium Hardy-Weinberg Equilibrium
A Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
Guide to Biostatistics
MedPage Tools Guide to Biostatistics Study Designs Here is a compilation of important epidemiologic and common biostatistical terms used in medical research. You can use it as a reference guide when reading
Figure 14.2 Overview of Innate and Adaptive Immunity
I M M U N I T Y Innate (inborn) Immunity does not distinguish one pathogen from another Figure 14.2 Overview of Innate and Adaptive Immunity Our first line of defense includes physical and chemical barriers
LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
Mutations: 2 general ways to alter DNA. Mutations. What is a mutation? Mutations are rare. Changes in a single DNA base. Change a single DNA base
Mutations Mutations: 2 general ways to alter DNA Change a single DNA base Or entire sections of DNA can move from one place to another What is a mutation? Any change in the nucleotide sequence of DNA Here
Incorporating Research Into Sight (IRIS) Essentia Rural Health Institute Marshfield Clinic Penn State University
Incorporating Research Into Sight (IRIS) Essentia Rural Health Institute Marshfield Clinic Penn State University Aim 1. Develop and validate electronic algorithms for ophthalmic conditions and efficacy
HLA data analysis in anthropology: basic theory and practice
HLA data analysis in anthropology: basic theory and practice Alicia Sanchez-Mazas and José Manuel Nunes Laboratory of Anthropology, Genetics and Peopling history (AGP), Department of Anthropology and Ecology,
Using Family History to Improve Your Health Web Quest Abstract
Web Quest Abstract Students explore the Using Family History to Improve Your Health module on the Genetic Science Learning Center website to complete a web quest. Learning Objectives Chronic diseases such
Use of the Chi-Square Statistic. Marie Diener-West, PhD Johns Hopkins University
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
CASSI: Genome-Wide Interaction Analysis Software
CASSI: Genome-Wide Interaction Analysis Software 1 Contents 1 Introduction 3 2 Installation 3 3 Using CASSI 3 3.1 Input Files................................... 4 3.2 Options....................................
Genomes and SNPs in Malaria and Sickle Cell Anemia
Genomes and SNPs in Malaria and Sickle Cell Anemia Introduction to Genome Browsing with Ensembl Ensembl The vast amount of information in biological databases today demands a way of organising and accessing
GENETIC STUDIES OF AUTOIMMUNE DISEASES. Benedicte Alexandre Lie Institute of Immunology Rikshospitalet University Hospital
GENETIC STUDIES OF AUTOIMMUNE DISEASES Benedicte Alexandre Lie Institute of Immunology Rikshospitalet University Hospital Autoimmune diseases Affects approximately 5 % of the population Results from an
ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual
ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual Di Guardo M, Micheletti D, Bianco L, Koehorst-van Putten HJJ, Longhi S, Costa F, Aranzana MJ, Velasco R, Arús P, Troggio
Insulin Receptor Substrate 1 (IRS1) Gene Variation Modifies Insulin Resistance Response to Weight-loss Diets in A Two-year Randomized Trial
Nutrition, Physical Activity and Metabolism Conference 2011 Insulin Receptor Substrate 1 (IRS1) Gene Variation Modifies Insulin Resistance Response to Weight-loss Diets in A Two-year Randomized Trial Qibin
DNA-Analytik III. Genetische Variabilität
DNA-Analytik III Genetische Variabilität Genetische Variabilität Lexikon Scherer et al. Nat Genet Suppl 39:s7 (2007) Genetische Variabilität Sequenzvariation Mutationen (Mikro~) Basensubstitution Insertion
GENETIC DATA ANALYSIS
GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made
Comparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Single Nucleotide Polymorphisms (SNPs)
Single Nucleotide Polymorphisms (SNPs) Additional Markers 13 core STR loci Obtain further information from additional markers: Y STRs Separating male samples Mitochondrial DNA Working with extremely degraded
DNA Copy Number and Loss of Heterozygosity Analysis Algorithms
DNA Copy Number and Loss of Heterozygosity Analysis Algorithms Detection of copy-number variants and chromosomal aberrations in GenomeStudio software. Introduction Illumina has developed several algorithms
Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual
Statistical Analysis for Genetic Epidemiology (S.A.G.E.) Version 6.2 Graphical User Interface (GUI) Manual Department of Epidemiology and Biostatistics Wolstein Research Building 2103 Cornell Rd Case Western
Basics of Marker Assisted Selection
asics of Marker ssisted Selection Chapter 15 asics of Marker ssisted Selection Julius van der Werf, Department of nimal Science rian Kinghorn, Twynam Chair of nimal reeding Technologies University of New
Alpha-1 Antitrypsin Deficiency A future NBS candidate?
Alpha-1 Antitrypsin Deficiency A future NBS candidate? Robert A. Sandhaus, MD, PhD Alpha-1 Antitrypsin Deficiency Condition: Alpha-1 antitrypsin deficiency, alpha-1 proteinase inhibitor deficiency, Alpha-1,
Investigating the genetic basis for intelligence
Investigating the genetic basis for intelligence Steve Hsu University of Oregon and BGI www.cog-genomics.org Outline: a multidisciplinary subject 1. What is intelligence? Psychometrics 2. g and GWAS: a
Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
Information leaflet. Centrum voor Medische Genetica. Version 1/20150504 Design by Ben Caljon, UZ Brussel. Universitair Ziekenhuis Brussel
Information on genome-wide genetic testing Array Comparative Genomic Hybridization (array CGH) Single Nucleotide Polymorphism array (SNP array) Massive Parallel Sequencing (MPS) Version 120150504 Design
Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control. Begin
User Bulletin TaqMan SNP Genotyping Assays May 2008 SUBJECT: Replacing TaqMan SNP Genotyping Assays that Fail Applied Biosystems Manufacturing Quality Control In This Bulletin Overview This user bulletin
Core Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
How To Find Rare Variants In The Human Genome
UNIVERSITÀ DEGLI STUDI DI SASSARI Scuola di Dottorato in Scienze Biomediche XXV CICLO DOTTORATO DI RICERCA IN SCIENZE BIOMEDICHE INDIRIZZO DI GENETICA MEDICA, MALATTIE METABOLICHE E NUTRIGENOMICA Direttore:
Heredity - Patterns of Inheritance
Heredity - Patterns of Inheritance Genes and Alleles A. Genes 1. A sequence of nucleotides that codes for a special functional product a. Transfer RNA b. Enzyme c. Structural protein d. Pigments 2. Genes
Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE
AP Biology Date SICKLE CELL ANEMIA & THE HEMOGLOBIN GENE TEACHER S GUIDE LEARNING OBJECTIVES Students will gain an appreciation of the physical effects of sickle cell anemia, its prevalence in the population,
(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = 0.0004 ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc
Advanced genetics Kornfeld problem set_key 1A (5 points) Brenner employed 2-factor and 3-factor crosses with the mutants isolated from his screen, and visually assayed for recombination events between
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Large Gene Interaction Analytics at University at Buffalo, SUNY
Large Gene Interaction nalytics at University at Buffalo, SUNY Giving researchers the ability to speed computations and increase data sets Overview The need Researchers required the ability to quickly
GEMMA User Manual. Xiang Zhou. May 18, 2016
GEMMA User Manual Xiang Zhou May 18, 2016 Contents 1 Introduction 4 1.1 What is GEMMA...................................... 4 1.2 How to Cite GEMMA................................... 4 1.3 Models............................................
Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
European Educational Programme in Epidemiology
European Educational Programme in Epidemiology 29 th RESIDENTIAL SUMMER COURSE FLORENCE, ITALY Pre-courses 13 17 JUNE 2016 1/13 European Educational Programme in Epidemiology Pre-Course: Introduction to
GENETICS AND INSURANCE: QUANTIFYING THE IMPACT OF GENETIC INFORMATION
GENETICS AND INSURANCE: QUANTIFYING THE IMPACT OF GENETIC INFORMATION Angus Macdonald Department of Actuarial Mathematics and Statistics and the Maxwell Institute for Mathematical Sciences Heriot-Watt
The Human Genome Project. From genome to health From human genome to other genomes and to gene function Structural Genomics initiative
The Human Genome Project From genome to health From human genome to other genomes and to gene function Structural Genomics initiative June 2000 What is the Human Genome Project? U.S. govt. project coordinated
Natalia Taborda Vanegas. Doc. Sci. Student Immunovirology Group Universidad de Antioquia
Pathogenesis of Dengue Natalia Taborda Vanegas Doc. Sci. Student Immunovirology Group Universidad de Antioquia Infection process Epidermis keratinocytes Dermis Archives of Medical Research 36 (2005) 425
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur
Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP. Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study.
Appendix G STATISTICAL METHODS INFECTIOUS METHODS STATISTICAL ROADMAP Prepared in Support of: CDC/NCEH Cross Sectional Assessment Study Prepared by: Centers for Disease Control and Prevention National
CNV Univariate Analysis Tutorial
CNV Univariate Analysis Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Overview 2 2. CNAM Optimal Segmenting 4 A. Performing CNAM Optimal Segmenting..................................
