Global Alliance Ewan Birney Associate Director EMBL-EBI
Our world is changing
Research to Medical Research English as language Lightweight legal Identical/similar systems Open data Publications Grant-funding Practicing Medicine National language Heavy legal framework Very different systems Closed data Not published Contract-funding
Health Care systems NHS Gesetzliche und private Krankenversicherungen Single Payer Single organisation outside of GPs (eg, NICE payment rules) Commissioning moving to primary health care Multiple payer Hospitals and independent GPs and consultants (Facharzt) Commissioning by insurance companies Standards by IQWiG Standards by NICE
Research as a secondary use Disease Associations Molecular biology resource Cohort of Patients Data from EHRs (EHR as phenotype) Feedback to patients Actionable variants Iceland, Denmark, Faroe, Finland, Vanderbuilt, Dundee; UK BioBank; Kasier Permante, VA, Estonia (many others)
Human Heart study 947 genotyped individuals 9.4 Mio SNPs Digital Heart Project rs1000, A, G, 1,0,1,0, 1.9,0,0 rs4356, C, T, 1,0,2,1.9, 0,0,0 rs4356, C, T, 1,0.1,2,1, 0,0,0 rs85937, T, C,0,1,1,0 0,0,0,0 SNP genotypes SNP array SNP calling QC Imputation Illumina HumanOmniExpress Gencall Per-individual/marker Population stratification Shapeit/impute2 with UK10K and 1000Genomes reference Genome-wide association study 1,530 healthy volunteers High-dimensional cardiac phenotypes
Distinct Local Structures within Cardiac Morphology for PEER Factors Factor 1 Factor 2 Factor 3 Factor 4 z z z z Z-score of weight
PEER Factor Contribution Reflected in Raw Wall Thickness Data Linear Model Color scheme:
Do we need to federate?
Sample size is king Rare disease - Matchmaker 1:10,000 1:500,000 incidence But 100s to 1000s of alleles per gene Modifiers elsewhere Find a single second match of the same allele is transformative Common disease Modifiers and Epistasis Cancer Somatic x Germline x Environment effects with followup
Global Reach is also critical Genetic drift means different alleles have moved to different frequencies in different locales (can rule out some penetrant pathogenic calls) People move (!) Cosmopolitian populations demand a cosmopolitian approach Environment is different Different penetrance of Gene x Environment effects Infectious disease Viruses and bacteria do not respect borders!
EMBL-EBI s engagement with GA4GH
Stephen Keenan
European Genome-Phenome Archive (EGA) Secure Archive of controlled access human data consented for research use Jointly hosted by EMBL-EBI and CRG Organised by study and dataset Access to data controlled by individual Data Access Committees (DAC) EGA does not grant / deny / revoke dataset access Access granted on a per dataset level Heterogeneous access policies Previously data access solely by file download EBI and CRG EGA Archives Study D A C 14
EGA Beacon EGA Beacon currently https://ega.crg.eu/beacon/ Compliant with GA4GH V0.2 Beacon API Beacon implements 3 tier access Public, Registered, Controlled User login to access registered or controlled level data Heterogeneity in data returned Currently Allele existence at all levels Extend to frequencies, genotypes? 2 datasets have a fully public beacon 3 registered datasets Expressions of support from 4 further DACs Working towards a common rule for registered access 15
Ensembl GA4GH Variant endpoint on Ensembl REST server Will implement the additional GA4GH APIs Co-designed Sequence Annotation API. Engaging with the graph references when they become available/stable
CRAM CRAM 2 released June 2013 CRAM 3 released May 2015 CRAM 3 features - Significantly faster and better compression - Options for better but slower compression - Block level checksums - Efficient storage of unmapped data - Content digests - Lossless representation of conflicting data 17
Variant Annotation Task Team Mission: develop common standards for reporting variant annotation Includes format of results, ontologies and vocabularies for different classes of annotation Why: consistent reporting critical for benchmarking and evaluation. Progress: PR request for the initial proposal for variant annotation support: VariantAnnotation and AlleleAnnotation for annotation derived by comparing a Variant or Allele to a set of reference data AnnotationSets group VariantAnnotation/ AlleleAnnotation records and hold full details of all software and reference data sets used. Two methods protocols are: alleleannotationmethods.avdl supports the mining of pre-
19 Reflections on GA4GH ~one year in
The good GA4GH has a model that works between Research and Healthcare Many others proposed do not! GA4GH has created a new space to discuss and agree mundane but important now FileFormat group complex but important in the future Reference Graphs GA4GH scope is manageable Ethics through technical Genomics scope can be expanded slowly
The bad GA4GH is an excellent forum and convening mechanism This is not yet implementation forum We have to shift to implementation groups as well Implementation requires engineers => funding Practicing Medicine is large and diverse Far, far larger than research, or genetics We need engagement and outreach in a long lasting way We have to be in it for the long game Urgency is good for motivation, but lasting change is what we need to aim for
The (somewhat) ugly GA4GH is Anglo Acknowledge and internalise that diversity is good Diversity runs very deep in some thinking We need to balance clarity and models (Avro) with stuffthat-works (SAM/BAM/CRAM) I worry we re not creating performant I/O solutions I/O bottlenecks for analysis secure data with distributed authentication is the singlest biggest technical problem I see in current clinical genomics
23 Opportunity for GA4GH Opportunity for Europe
In a perfect world Global Alliance would like You would want a collaborative group of healthcare systems Embedded in a collaborative group of countries With a diverse set of systems With a strong biomolecular research community With well worked out ethics for access of healthcare data for research With strong electronic health records
In a perfect world European healthcare would like An open forum to discuss and share state of the art in genomics Transparency and collective ownership Access to the worldwide brain trust on genomics Ability to provide external justification and validation to governments to review boards to regulators
Diverse set of healthcare systems Single Payer, Multi Payer, Insurance based, State based Ethical access to data Denmark, Scotland; many other European countries Strong biomolecular community Max Plank, Karolinska, Wellcome Trust, CNRS ELIXIR (basic research informatics infrastructure) Electronic health records Estonia, Denmark
but let s never lose the American can do
nor forget that this is one world
Celebrate and leverage our diversity In systems. In technical details In ethical/social positioning In talented individuals present worldwide To achieve a healthier world for all its inhabitants
Thank you!
Thank you! Follow me on twitter: @ewanbirney I blog regularly (Google Ewan Birney) 6/15/2015 31