Genome Informatics & Cloud Computing. Jeffrey Reid Festival of Genomics -- Nov 4, 2015



Similar documents
Presidential Council of Advisors on Science & Technology: Precision Medicine Initiative

AT&T Global Network Client for Windows Product Support Matrix January 29, 2015

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

COMPARISON OF FIXED & VARIABLE RATES (25 YEARS) CHARTERED BANK ADMINISTERED INTEREST RATES - PRIME BUSINESS*

Case 2:08-cv ABC-E Document 1-4 Filed 04/15/2008 Page 1 of 138. Exhibit 8

Analysis One Code Desc. Transaction Amount. Fiscal Period

BCOE Payroll Calendar. Monday Tuesday Wednesday Thursday Friday Jun Jul Full Force Calc

Enhanced Vessel Traffic Management System Booking Slots Available and Vessels Booked per Day From 12-JAN-2016 To 30-JUN-2017

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

P/T 2B: 2 nd Half of Term (8 weeks) Start: 24-AUG-2015 End: 18-OCT-2015 Start: 19-OCT-2015 End: 13-DEC-2015

Delivering the power of the world s most successful genomics platform

OPERATIONS SERVICE UPDATE

Ashley Institute of Training Schedule of VET Tuition Fees 2015

P/T 2B: 2 nd Half of Term (8 weeks) Start: 25-AUG-2014 End: 19-OCT-2014 Start: 20-OCT-2014 End: 14-DEC-2014

P/T 2B: 2 nd Half of Term (8 weeks) Start: 26-AUG-2013 End: 20-OCT-2013 Start: 21-OCT-2013 End: 15-DEC-2013

MONTHLY REMINDERS FOR 2013

Genomic Testing: Actionability, Validation, and Standard of Lab Reports

Computing & Telecommunications Services Monthly Report March 2015

Trimble Navigation Limited (NasdaqGS:TRMB) > Public Ownership > Officials' Trading

CENTERPOINT ENERGY TEXARKANA SERVICE AREA GAS SUPPLY RATE (GSR) JULY Small Commercial Service (SCS-1) GSR

1. Introduction. 2. User Instructions. 2.1 Set-up

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

SEO Presentation. Asenyo Inc.

Employers Compliance with the Health Insurance Act Annual Report 2015

Detailed guidance for employers

Interest Rates. Countrywide Building Society. Savings Growth Data Sheet. Gross (% per annum)

ACTIVE MICROSOFT CERTIFICATIONS:

PROJECT: DLS Website Redesign STATUS REPORT May-June 2015

Accenture Cyber Security Transformation. October 2015

Are you prepared to make the decisions that matter most? Decision making in healthcare

Proposal to Reduce Opening Hours at the Revenues & Benefits Coventry Call Centre

Cell and Molecular Biology 550 GENETIC PRINCIPLES Spring Semester 2014 Monday, Wednesday, Friday 10-11:30 am, 253 BRBII/III

Accident & Emergency Department Clinical Quality Indicators

Emergency Department Directors Academy Phase II. The ED is a Business: Intelligent Use of Dashboards

Department of Public Welfare (DPW)

REWRITING PAYER/PROVIDER COLLABORATION July 24, MIKE FAY Vice President, Health Networks

CAFIS REPORT

Choosing a Cell Phone Plan-Verizon

City of Minneapolis RCA Provision of IT Services. Committee of the Whole February 11, 2015

Consumer ID Theft Total Costs

The 100,000 genomes project

A STUDY ON THE INTEGRATION OF QFD-PMMM IN CLOUD COMPUTING SYSTEM QUALITY. Received June 2010; accepted September 2010

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats

Qi Liu Rutgers Business School ISACA New York 2013

Supervisor Instructions for Approving Web Time Entry

OMBU ENTERPRISES, LLC. Process Metrics. 3 Forest Ave. Swanzey, NH Phone: Fax: OmbuEnterprises@msn.

Financial Operating Procedure: Budget Monitoring

Executive Branch IT Reorganization Project Plan

ACTIVE MICROSOFT CERTIFICATIONS:

EMR and ehr Together for patients and providers. ehealth Conference October 3-4, 2014

ARIS 9 Highlights and Outlook

NASDAQ DUBAI TRADING AND SETTLEMENT CALENDAR On US Federal Reserve Holidays, no settlements will take place for USD.

Establishing the HKJC IT PMO. ISACA Forum. Roland Tesmer Head of IT Strategy and Planning The Hong Kong Jockey Club. 8 April 2008

PowerSteering Product Roadmap Your Success Is Our Bottom Line

US Health and Life Insurance Company Overview

LeSueur, Jeff. Marketing Automation: Practical Steps to More Effective Direct Marketing. Copyright 2007, SAS Institute Inc., Cary, North Carolina,

VA Medical Device Protection Program (MDPP)

Assignment 4 CPSC 217 L02 Purpose. Important Note. Data visualization

Resource Management Spreadsheet Capabilities. Stuart Dixon Resource Manager

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

Alexandria Overview. Sept 4, 2015

Standard of measurement by which efficiency, performance, progress, or quality of a plan, process, or product can be assessed 1.

SAMPLE. Insider Trading Chronology. Microsoft Corp (MSFT) Gates, William H. III -- 2,000,000 Shs. (0.25%)

DATA INTEGRATION APPROACH TO OIL &GAS LEGACY SYSTEMS WITH THE PPDM MODEL. Compete like never before. Consulting Technology Performance

Insurance and Banking Subcommittee

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Cisco 4Q11. Global Threat Report

May 2014 Texas School Bond Elections

OPERATING FUND. PRELIMINARY & UNAUDITED FINANCIAL HIGHLIGHTS September 30, 2015 RENDELL L. JONES CHIEF FINANCIAL OFFICER

Healthcare System Process Improvement Conference 2015

Centers of Academic Excellence in Cyber Security (CAE-C) Knowledge Units Review

Elimination of delays in consultant care

MCB 4934: Introduction to Genetics and Genomics in Health Care Section 125D Fall Credits

Lessons from McKesson s Approach to Maintaining a Mature, Cost-Effective Sarbanes-Oxley Program

Hyatt MDM Case Study: Increasing Revenue with Better Customer Insight. Chris Brogan VP Business Strategy Analytics Hyatt Hotel Corporation

How Can Institutions Foster OMICS Research While Protecting Patients?

WEATHERHEAD EXECUTIVE EDUCATION COURSE CATALOG

Final Project Report

Presentation Objectives

Metric of the Month: The Service Desk Balanced Scorecard

Informatics Strategies & Tools to Link Nursing Care with Patient Outcomes in the Learning Health Care System

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Eliminating inefficiencies with PerfectServe. SUCCESS STORY Elimination of delays in consultant care. perfectserve.com

GTA Board of Directors September 4, 2014

Transcription:

Genome Informatics & Cloud Computing Jeffrey Reid Festival of Genomics -- Nov 4, 2015

The internet isn t just for cat pictures anymore (but maybe it should be)! Jeffrey Reid Festival of Genomics -- Nov 4, 2015

Outline Cloud-based approaches to large-scale genome informatics challenges Production analysis at the 1600+ exome per week rate Enabling integration of clinical and genomic data 3

Outline Cloud-based approaches to large-scale genome informatics challenges Production analysis at the 1600+ exome per week rate Enabling integration of clinical and genomic data Cat pictures 4

What is the RGC? Launched in January 2014, including a partnership with the Geisinger Health System Goal: build a comprehensive genotype-phenotype resource combining genomic and EHR data from >250K people to aid drug development and enable genomic medicine Scientifically and medically, it s pretty exciting, said Dr. Leslie G. Biesecker, chief of the genetic disease research branch at the government s National Human Genome Research Institute, who is familiar with the project. As far as I m aware, it s the largest clinical sequencing undertaking in this country so far by a long shot. He added that the move of sequencing into general health care is going to change medicine. 5

RGC Collaborations and Projects Families General Population Founder Populations DRIFT Consortium 6

Regeneron Genetics Center - One of the Most Productive Exome Sequencing Facilities in the World 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Initiated production on July 1, 2014 Fully-automated exome sample preparation on September 1 Averaged 1,000 exomes per week through the last third of 2014 Averaged >1200 exomes per week through 1 st quarter of 2015 2 nd & 3 rd quarter of 2016 averaged >1750 exomes per week All data processing in the cloud 7

Innovative technologies & automation enable ultra high-throughput sequencing & analysis Automated Biobank (1.4M Samples) Library Prep Automation (>200K Samples/Yr) Illumina Fleet (>80K Exomes/Yr) Cloud Informatics (>100K Exomes/Yr) QC >85K exomes sequenced on pace for ~100K individuals by end of year >1750 exomes/week First 100% cloud-based genome center with fully automated analysis pipelines & data sharing 8

100% Cloud-based Many efforts use the cloud for some things 9

100% Cloud-based 10

Why so much cloud? Lack of legacy hardware Overburdened local IT resources Short-term need to ramp up to genomecenter scale production fast Long-term need for scalability Security (thanks to DNAnexus) Enables cloud-based EHR mining and integration with genomic data Data delivery for long/growing list of partners Support R&D and production analysis work easing evolution of analytical tools from the bench to the pipeline 11

A new method for WES copy number variant calling CLAMMS: Copy number estimation using Lattice- Aligned Mixture Models Normalize coverage data accounting for local GC content For each sample, identify a reference panel of similar samples based on sequencing QC metrics Use reference panel to model the expected coverage distribution at each exon, given different copy number states (Mixture Models) Identify CNVs from regions where the sample s coverage is likely non-diploid over contiguous exonic regions (HMM) Packer JS, et. al. CLAMMS: a scalable algorithm for calling common and rare copy number variants from exome sequencing data. Bioinformatics. 2015 Sep 17. pii: btv547. 12

Normalized Read Coverage Mixture model fits coverage distribution for GSTT1 Neighboring Exons of GSTT1 13

de novo pedigree reconstruction via PRIMUS PRIMUS uses pairwise comparisons to construct a pedigree from genetic data Estimate identity by descent (IBD) between individuals Predict familial relationship Fit them together like a Sudoku puzzle = no genetic data C A E PRIMUS B E C A D Edge Legend Red = parent/child Gold = full-sibling Blue = 2 nd degree Green = 3 rd degree G D F B Staples et al. (2014) AJHG G F 14

# of families (log scale) 1 5 10 50 100 500 3482 IBD1 GHSf40k Families 1.0 5104 parent-child 42880 samples produced: 19962 1 st - 2 nd degree 19489 samples involved 8939 1 st degree produced: 5062 family networks largest = size 21 0.8 0.6 0.4 3835 Full-sib ~11023 2 nd degree ~12933 3 rd degree 0.2 0.0 15 MZ twins 0.0 0.2 0.4 0.6 0.8 1.0 2 5 8 11 14 17 20 # of samples in each family IBD0 15

PRIMUS reconstructs a 21-person pedigree using 1 st degree relationships A (83) B (79) C (73) D (75) E (69) F (74) G (52) H (56) I (63) J (64) K (60) L (54) M (41) N (48) O (58) P (43) Q (49) R (49) S (54) T (40) U (24) V (21) = no genetic data 21 samples connected by 1 st degree relationships Includes ages of samples underneath patient IDs This is the only pedigree that fits the genetic data 16

Loss of function (LOF) variant in CCR5 is transmitted through the family A (83) B (79) C (73) D (75) E (69) F (74) G (52) H (56) I (63) J (64) K (60) L (54) M (41) N (48) O (58) P (43) Q (49) R (49) S (54) T (40) Het for frameshift LOF in CCR5 Hom for frameshift LOF in CCR5 Read stacks confirm a Frameshift LOF allele frequency = 9.7% CCR5 U (24) V (21) = no genetic data Cell surface receptor that HIV uses to enter & infect host cells LOF in CCR5 gives HIV and smallpox resistance 17

Visualize the CCR5 pedigree as an undirected graph Edge Legend Red = parent/child Gold = full-sibling Blue = 2 nd degree 18

connect to other pedigrees with 2 nd degree relationships Edge Legend Red = parent/child Gold = full-sibling Blue = 2 nd degree 19

We are starting to leverage all family networks, including 1107 person 2 nd degree family network Edge Legend Red = parent/child Gold = full-sibling Blue = 2 nd degree Individual 1 st degree pedigrees 21-person CCR5 LOF pedigree 20

Cat-clusions RGC is up and running as a leader in exome sequencing Producing ~7000 exomes a week across a variety of projects Mendelian/Familial disease projects ranging from single families to hundreds of trios with shared phenotypes (CUMC, TSK, UU, UC, etc.) Founder populations including studies in the Amish with UM and CSC Population sequencing with EHR data (DiscovEHR collab w/ghs) First 100% Cloud-based Genome Center Essential to scaling production efforts, secure data analysis & delivery also transformative for easy transition from bench to pipeline Part of an exciting moment as medicine and healthcare embrace the future 21

Acknowledgements Regeneron Genetics Center Aris Baras (co-head) Alan Shuldiner (co-head) Michael Norsen Lyndon Mitnaul Alejandra King Chi Onyewu Charlene Carlino Talita Silva John Overton Alex Lopez Caitlin Forsythe Erin Fuller Karina Toledo Mathew Smith Michael Lattari Maria Sotiropoulos-Padilla Sarah Wolf Thomas Schleicher Jeffrey Reid Chris Sprangel Rick Ulloa Martin Paradesi Kia Manoochehri Miro Georgiev Young Hahn Scott Jones John Penn Sheldon Bai Ke Huang Alicia Hawes Lukas Habegger Jeffrey Staples Evan Maxwell Ingrid Borecki Colm O Dushlaine Cristopher Van Hout Semanti Mukherjee Alex Li Omri Gottesman Brian Cajes Nilanjana Banerjee Rick Dewey Shannon Bruse Jonathan Chung Claudia Gonzaga-Jauregui Kavita Praveen Suganthi Balasubramanian Jan Freudenberg Julie Horowitz Aris Economides Ge Zhou Liz Misir Nehal Gosalia Kiran Nistala John Dronzek Angelo Pefanis Darshi Persaud RGC Steering Committee George Yancopoulos Scott Mellis Andrew Murphy Robert Phillips Neil Stahl Aris Economides RGC Collaborators The Whole GHS Team David Ledbetter David Carey Internet 4 All Catz & LOLz 22

CONFIDENTIAL 23