Global Alliance. Ewan Birney Associate Director EMBL-EBI



Similar documents
European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

The 100,000 genomes project

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane

Integration of genomic data into electronic health records

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats

Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences

Big Data for Population Health

NIH s Genomic Data Sharing Policy

Worldwide Collaborations in Molecular Profiling

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

SMART-on-FHIR Genomics: Enabling Precision Medicine by Bridging Clinical and Genomic Information

Reimbursement for Molecular Diagnostics

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequencing Variants

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Core Facility Genomics

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

SAP HANA Enabling Genome Analysis

Genetic profiles in relation to sports: a databased approach

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Genome Viewing. Module 2. Using Genome Browsers to View Annotation of the Human Genome

GOBII. Genomic & Open-source Breeding Informatics Initiative

School of Nursing. Presented by Yvette Conley, PhD

Attacking the Biobank Bottleneck

Connecting Basic Research and Healthcare Big Data

SEQUENCING INITIATIVE SUOMI (SISU) SYMPOSIUM SPEAKERS August 26, 2014

6 ELIXIR Domain Specific Services

The National Institute of Genomic Medicine (INMEGEN) was

BlueFuse Multi Analysis Software for Molecular Cytogenetics

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Delivering the power of the world s most successful genomics platform

DNA Sequencing and Personalised Medicine

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Information for patients and the public and patient information about DNA / Biobanking across Europe

GENETIC DATA ANALYSIS

Genomic CDS: an example of a complex ontology for pharmacogenetics and clinical decision support

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

ITT Advanced Medical Technologies - A Programmer's Overview

Outcome Data, Links to Electronic Medical Records. Dan Roden Vanderbilt University

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

Accelerating Clinical Trials Through Shared Access to Patient Records

> Semantic Web Use Cases and Case Studies

Clinical Research from EHR data

Request for Applications. Sharing Big Data for Health Care Innovation: Advancing the Objectives of the Global Alliance for Genomics and Health

A Primer of Genome Science THIRD

Investigating the genetic basis for intelligence

European registered Clinical Laboratory Geneticist (ErCLG) Core curriculum

PONTE Presentation CETIC. EU Open Day, Cambridge, 31/01/2012. Philippe Massonet

Healthcare data analytics. Da-Wei Wang Institute of Information Science

SeqArray: an R/Bioconductor Package for Big Data Management of Genome-Wide Sequence Variants

Data advertising and managin system for Biobanks A use case for the egenvar data management system.

Consultation Response Medical profiling and online medicine: the ethics of 'personalised' healthcare in a consumer age Nuffield Council on Bioethics

TRANSLATIONAL BIOINFORMATICS 101

FINLAND ON A ROAD TOWARDS A MODERN LEGAL BIOBANKING INFRASTRUCTURE

Clinical Research Infrastructure

Genomics and Health Data Standards: Lessons from the Past and Present for a Genome-enabled Future

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1

Challenges associated with analysis and storage of NGS data

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Blood Biobanking Chances and Risks. The Bavarian Red Cross Blood Donor Biobank

GenomeStudio Data Analysis Software

Genomes and SNPs in Malaria and Sickle Cell Anemia

The Blood Donor BIOBANK

EMBL. International PhD Training. Mikko Taipale, PhD Whitehead Institute/MIT Cambridge, MA USA

BIOINF 525 Winter 2016 Foundations of Bioinformatics and Systems Biology

Differential privacy in health care analytics and medical research An interactive tutorial

SNPbrowser Software v3.5

NGS and complex genetics

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE E15

Global Alliance for Genomics & Health Data Sharing Lexicon

Towards the construction of an integrated Wheat Information System

Transcription:

Global Alliance Ewan Birney Associate Director EMBL-EBI

Our world is changing

Research to Medical Research English as language Lightweight legal Identical/similar systems Open data Publications Grant-funding Practicing Medicine National language Heavy legal framework Very different systems Closed data Not published Contract-funding

Health Care systems NHS Gesetzliche und private Krankenversicherungen Single Payer Single organisation outside of GPs (eg, NICE payment rules) Commissioning moving to primary health care Multiple payer Hospitals and independent GPs and consultants (Facharzt) Commissioning by insurance companies Standards by IQWiG Standards by NICE

Research as a secondary use Disease Associations Molecular biology resource Cohort of Patients Data from EHRs (EHR as phenotype) Feedback to patients Actionable variants Iceland, Denmark, Faroe, Finland, Vanderbuilt, Dundee; UK BioBank; Kasier Permante, VA, Estonia (many others)

Human Heart study 947 genotyped individuals 9.4 Mio SNPs Digital Heart Project rs1000, A, G, 1,0,1,0, 1.9,0,0 rs4356, C, T, 1,0,2,1.9, 0,0,0 rs4356, C, T, 1,0.1,2,1, 0,0,0 rs85937, T, C,0,1,1,0 0,0,0,0 SNP genotypes SNP array SNP calling QC Imputation Illumina HumanOmniExpress Gencall Per-individual/marker Population stratification Shapeit/impute2 with UK10K and 1000Genomes reference Genome-wide association study 1,530 healthy volunteers High-dimensional cardiac phenotypes

Distinct Local Structures within Cardiac Morphology for PEER Factors Factor 1 Factor 2 Factor 3 Factor 4 z z z z Z-score of weight

PEER Factor Contribution Reflected in Raw Wall Thickness Data Linear Model Color scheme:

Do we need to federate?

Sample size is king Rare disease - Matchmaker 1:10,000 1:500,000 incidence But 100s to 1000s of alleles per gene Modifiers elsewhere Find a single second match of the same allele is transformative Common disease Modifiers and Epistasis Cancer Somatic x Germline x Environment effects with followup

Global Reach is also critical Genetic drift means different alleles have moved to different frequencies in different locales (can rule out some penetrant pathogenic calls) People move (!) Cosmopolitian populations demand a cosmopolitian approach Environment is different Different penetrance of Gene x Environment effects Infectious disease Viruses and bacteria do not respect borders!

EMBL-EBI s engagement with GA4GH

Stephen Keenan

European Genome-Phenome Archive (EGA) Secure Archive of controlled access human data consented for research use Jointly hosted by EMBL-EBI and CRG Organised by study and dataset Access to data controlled by individual Data Access Committees (DAC) EGA does not grant / deny / revoke dataset access Access granted on a per dataset level Heterogeneous access policies Previously data access solely by file download EBI and CRG EGA Archives Study D A C 14

EGA Beacon EGA Beacon currently https://ega.crg.eu/beacon/ Compliant with GA4GH V0.2 Beacon API Beacon implements 3 tier access Public, Registered, Controlled User login to access registered or controlled level data Heterogeneity in data returned Currently Allele existence at all levels Extend to frequencies, genotypes? 2 datasets have a fully public beacon 3 registered datasets Expressions of support from 4 further DACs Working towards a common rule for registered access 15

Ensembl GA4GH Variant endpoint on Ensembl REST server Will implement the additional GA4GH APIs Co-designed Sequence Annotation API. Engaging with the graph references when they become available/stable

CRAM CRAM 2 released June 2013 CRAM 3 released May 2015 CRAM 3 features - Significantly faster and better compression - Options for better but slower compression - Block level checksums - Efficient storage of unmapped data - Content digests - Lossless representation of conflicting data 17

Variant Annotation Task Team Mission: develop common standards for reporting variant annotation Includes format of results, ontologies and vocabularies for different classes of annotation Why: consistent reporting critical for benchmarking and evaluation. Progress: PR request for the initial proposal for variant annotation support: VariantAnnotation and AlleleAnnotation for annotation derived by comparing a Variant or Allele to a set of reference data AnnotationSets group VariantAnnotation/ AlleleAnnotation records and hold full details of all software and reference data sets used. Two methods protocols are: alleleannotationmethods.avdl supports the mining of pre-

19 Reflections on GA4GH ~one year in

The good GA4GH has a model that works between Research and Healthcare Many others proposed do not! GA4GH has created a new space to discuss and agree mundane but important now FileFormat group complex but important in the future Reference Graphs GA4GH scope is manageable Ethics through technical Genomics scope can be expanded slowly

The bad GA4GH is an excellent forum and convening mechanism This is not yet implementation forum We have to shift to implementation groups as well Implementation requires engineers => funding Practicing Medicine is large and diverse Far, far larger than research, or genetics We need engagement and outreach in a long lasting way We have to be in it for the long game Urgency is good for motivation, but lasting change is what we need to aim for

The (somewhat) ugly GA4GH is Anglo Acknowledge and internalise that diversity is good Diversity runs very deep in some thinking We need to balance clarity and models (Avro) with stuffthat-works (SAM/BAM/CRAM) I worry we re not creating performant I/O solutions I/O bottlenecks for analysis secure data with distributed authentication is the singlest biggest technical problem I see in current clinical genomics

23 Opportunity for GA4GH Opportunity for Europe

In a perfect world Global Alliance would like You would want a collaborative group of healthcare systems Embedded in a collaborative group of countries With a diverse set of systems With a strong biomolecular research community With well worked out ethics for access of healthcare data for research With strong electronic health records

In a perfect world European healthcare would like An open forum to discuss and share state of the art in genomics Transparency and collective ownership Access to the worldwide brain trust on genomics Ability to provide external justification and validation to governments to review boards to regulators

Diverse set of healthcare systems Single Payer, Multi Payer, Insurance based, State based Ethical access to data Denmark, Scotland; many other European countries Strong biomolecular community Max Plank, Karolinska, Wellcome Trust, CNRS ELIXIR (basic research informatics infrastructure) Electronic health records Estonia, Denmark

but let s never lose the American can do

nor forget that this is one world

Celebrate and leverage our diversity In systems. In technical details In ethical/social positioning In talented individuals present worldwide To achieve a healthier world for all its inhabitants

Thank you!

Thank you! Follow me on twitter: @ewanbirney I blog regularly (Google Ewan Birney) 6/15/2015 31