UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015



Similar documents
Genotyping and quality control of UK Biobank, a large- scale, extensively phenotyped prospective resource

Factors for success in big data science

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using

ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual

TaqMan Genotyper Software v1.0.1 TaqMan Genotyping Data Analysis Software

SNPbrowser Software v3.5

Consistent Assay Performance Across Universal Arrays and Scanners

DNA Copy Number and Loss of Heterozygosity Analysis Algorithms

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode

Overview One of the promises of studies of human genetic variation is to learn about human history and also to learn about natural selection.

GeneChip Sequence Analysis Software (GSEQ) is used to analyze data from the Resequencing Arrays

Rules for conducting ISAG Comparison Tests (CT) for animal DNA testing.

HISTO SPOT SSO System

The Power of Next-Generation Sequencing in Your Hands On the Path towards Diagnostics

Simplifying Data Interpretation with Nexus Copy Number

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

A guide to the analysis of KASP genotyping data using cluster plots

Analysis of gene expression data. Ulf Leser and Philippe Thomas

CNV Univariate Analysis Tutorial

Basic Analysis of Microarray Data

Biorepository and Biobanking

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Whole genome sequencing of foodborne pathogens: experiences from the Reference Laboratory. Kathie Grant Gastrointestinal Bacteria Reference Unit

Molecular typing of VTEC: from PFGE to NGS-based phylogeny

Agilent CytoGenomics Software A Complete Solution for Cytogenetic Research Data Analysis

Genomic Testing: Actionability, Validation, and Standard of Lab Reports

GenomeStudio Data Analysis Software

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

User Bulletin. GeneMapper Software Version 4.0. Installation Options. In This User Bulletin. Overview

GenomeStudio Data Analysis Software

Quality Assessment of Exon and Gene Arrays

Tutorial on gplink. PLINK tutorial, December 2006; Shaun Purcell,

Illumina. LIMS Project Manager Guide. For Reasearch Use Only. ILLUMINA PROPRIETARY Catalog # SW Part # Rev. C

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

QuantStudio 12K Flex Real-Time PCR System. The all-in-one qpcr instrument

User Manual. Affymetrix GeneChip Command Console 3.0 User Manual. P/N Rev. 5

SeqScape Software Version 2.5 Comprehensive Analysis Solution for Resequencing Applications

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

GWAS Data Cleaning. GENEVA Coordinating Center Department of Biostatistics University of Washington. January 13, 2016.

The Quanterix products referenced in this document are for research use only and are not for diagnostic or therapeutic procedures.

A Statistician s View of Big Data

Copyright Soleran, Inc. esalestrack On-Demand CRM. Trademarks and all rights reserved. esalestrack is a Soleran product Privacy Statement

BAPS: Bayesian Analysis of Population Structure

MassARRAY Typer 3.4 Software User s Guide for iplex and hme

Single Nucleotide Polymorphisms (SNPs)

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

Overview of Next Generation Sequencing platform technologies

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

SAP HANA Enabling Genome Analysis

Step by Step Guide to Importing Genetic Data into JMP Genomics

Software Getting Started Guide

Version 5.0 Release Notes

DNA IQ TM Casework Pro Kit for Maxwell 16 A Validation Study

Forensic Statistics. From the ground up. 15 th International Symposium on Human Identification

GAIA: Genomic Analysis of Important Aberrations

ENHANCED CONFIDENCE INTERPRETATIONS OF GP BASED ENSEMBLE MODELING RESULTS

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Big Data for Population Health

Towards running complex models on big data

Three Methods for ediscovery Document Prioritization:

Protein Protein Interaction Networks

LightCycler 480 Real-Time PCR System

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Custom TaqMan Assays For New SNP Genotyping and Gene Expression Assays. Design and Ordering Guide

Automated Lab Management for Illumina SeqLab

Bayesian Penalized Methods for High Dimensional Data

Investigating the genetic basis for intelligence

Deliverable First report on sample storage, DNA extraction and sample analysis processes

TruSeq Custom Amplicon v1.5

MACHINE LEARNING BASICS WITH R

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM

Introduction To Real Time Quantitative PCR (qpcr)

CHAPTER 1 INTRODUCTION

How To Find Rare Variants In The Human Genome

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Agencourt AMPure XP. Xtra Performance Post-PCR clean UP

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Heritability: Twin Studies. Twin studies are often used to assess genetic effects on variation in a trait

Validation and Replication

1. Scope This SOP covers requirements for PHARMCO-AAPER s Quality Management System

TCRG TCRA/D IGH IGK/L

Complete Genomics Sequencing

Gene Expression Analysis

Transcription:

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory April, 2015 1

Contents Overview... 3 Rare Variants... 3 Observation... 3 Approach... 3 ApoE Markers... 4 SNP specific priors... 4 Observation... 4 Approach... 5 Exclusion list... 5 Observation... 5 Approach... 5 References... 6 2

Overview Genotyping analysis of the UK Biobank UKB_WCSGAX and UKBiLEVEAX datasets was generally done according to the Best Practices Workflow as described in the Axiom Genotyping Solution Data Analysis Guide (see Reference 1). This document provides details regarding advanced approaches used for analysis. To generate the genotypes in these datasets, samples were extracted from blood by UK Biobank personnel. The DNA was delivered to Affymetrix in barcoded 96-well microtiter plates. The samples were then processed in the approximate order received to produce genotype data using the Affymetrix Axiom platform with a custom-designed array described in the UK Biobank Axiom Array Content Summary (see Reference 2). Processing was done using a LIMS system to track instrumentation, Axiom consumables arrays and reagents and operators. The process is described in the Affymetrix UKB_WCSGAX Lab Processing document (see Reference 3). Genotype data was generated in batches of approximately 4800 individuals each. The batches were generally chosen based on data quality and/or samples for which reprocessing had already been attempted. Data for all samples from the same plate are not necessarily included in the same batch. Rare Variants Observation As mentioned above, for this project genotyping cluster batches consisted of approximately 4800 individuals or the equivalent of 50 plates of 96 samples. Additional analytic measures were undertaken to ensure proper calling of SNPs with rare minor alleles for cases when the number of heterozygote individuals is relatively small at the SNP site (less than three in the 50 plate batch or less than approximately 0.06%). Approach Affymetrix bioinformatics scientists determined empirically that genotyping samples grouped in a single 96 sample plate (single plate genotyping) would improve rare variant detection. Consequently, the process for calling rare variants employed for data delivery was as follows: 1) Identify all probesets with <6 minor allele calls (up to 5 heterozygotes) in any single batch of ~4,800 samples. This was done using data from batches 1-22. 3

2) Identify all single sample plates with >= 80 samples. 3) Re-call the probesets from (1) on the plates from (2), in single-plate genotyping mode. 4) Adjust calls a) For each batch, adjust the calls for each probeset from (1) that originally had MAC < 6 and now has MAC < 11 in the single-plate results summed over the batch in question. b) If the 50-plate genotype call for a given loci x sample was no-call, and the corresponding genotype in single-plate genotyping call is heterozygote, the genotype call was changed from no-call to het. Otherwise, the original 50- plate genotype call was kept (i.e., keep the 50-plate call for the vast majority of probeset/sample combinations). 5) Return data a) 50-plate genotype calls, confidence values, and posterior model files b) Single-plate genotyping adjusted genotype calls: for a given batch, return the mix of 50-plate and single-plate genotype calls for the probesets from (1) ApoE Markers The standard process for analyzing ApoE markers is outlined the document Analysis workflow for UK Biobank Axiom Array (Reference 4) Genotypes generated as described have been provided for all UK Biobank genotype batches. SNP specific priors Observation Use of SNP specific priors (described below) from a sample set on which the probeset cluster is consistent with established quality thresholds allows for genotyping consistency improvements for a subset of the probesets genotyped across many batches. Axiom genotyping is executed by AxiomGT1, a clustering algorithm that adapts pre-positioned genotype cluster locations called priors to the sample data in a Bayesian step and computes three posterior cluster locations. Priors can be generic, meaning the same pre-positioned location is provided for every SNP, or SNP specific, meaning the different pre-positioned locations are provided on a SNP by SNP basis. The default workflow uses generic priors to 4

determine cluster boundaries during genotyping. The use of SNP specific priors can improve genotype call consistency data for a subset of markers across batches. Approach Affymetrix bioinformatics scientists developed a workflow to generate SNP specific priors for a subset of variants analyzed. In some cases, performance was improved by selecting SNP specific priors from one of the first 16 batches that were processed. Genotypes generated using those priors are included in the results; however, if the probeset did not appear to be providing sufficient cluster resolution and passing established QC thresholds, it was added to the exclusion list (discussed below). Exclusion list Observation Since content on this array includes experimental probe designs, it was expected that some portion of them may not pass the strict QC metrics imposed. Results for all probesets not meeting QC requirements were investigated. Approach Probesets systematically below QC thresholds across batches were added to the exclusion list. Reasons for exclusion were primarily due to the experimental nature of specific probe designs as well as multiallelic variants which are currently not supported by Axiom analysis software. 5

References 1. Axiom Genotyping Solution Data Analysis Guide http://media.affymetrix.com/support/downloads/manuals/axiom_geno typing_solution_analysis_guide.pdf 2. UK Biobank Axiom Array Content Summary (design information document) http://www.ukbiobank.ac.uk/wp-content/uploads/2014/04/uk- Biobank-Axiom-Array-Content-Summary-2014.pdf 3. Affymetrix UKB_WCSGAX Lab Processing document http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=155583 4. Analysis workflow for UK Biobank Axiom Array http://media.affymetrix.com/support/downloads/manuals/ukbiobankar ray_analysis_note.pdf 6