How Real-time Analysis turns Big Medical Data into Precision Medicine?

Similar documents
Towards Integrating the Detection of Genetic Variants into an In-Memory Database

SAP HANA Enabling Genome Analysis

A leader in the development and application of information technology to prevent and treat disease.

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Big Data Analytics- Innovations at the Edge

Big Data jako součást našeho života. Zdenek Panec: June, 2015

Healthcare Industry Improvement with Business Intelligence

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

Delivering the power of the world s most successful genomics platform

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

IMPLEMENTING BIG DATA IN TODAY S HEALTH CARE PRAXIS: A CONUNDRUM TO PATIENTS, CAREGIVERS AND OTHER STAKEHOLDERS - WHAT IS THE VALUE AND WHO PAYS

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

Vision for the Cohort and the Precision Medicine Initiative Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Precision

Big Data and the Data Lake. February 2015

Personalized Medicine and IT

Big Data Trends A Basis for Personalized Medicine

In-Memory Data Management for Enterprise Applications

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Find the signal in the noise

Groundbreaking Collaborative Clinical Trial Launched

Big Data Analytics in Health Care

ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS

Enabling the Health Continuum with Informatics. Jeroen Tas Healthcare Informatics.Solutions.Services

SAP HANA. Markus Fath, SAP HANA Product Management June 2013

Integration of genomic data into electronic health records

New Clinical Research & Care Opportunities Through Big Data Informatics

Uncovering Value in Healthcare Data with Cognitive Analytics. Christine Livingston, Perficient Ken Dugan, IBM

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Personalized medicine in China s healthcare system

> Semantic Web Use Cases and Case Studies

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

How Can Institutions Foster OMICS Research While Protecting Patients?

Cancer Genomics: What Does It Mean for You?

Cloud-Based Big Data Analytics in Bioinformatics

Vad är bioinformatik och varför behöver vi det i vården? a bioinformatician's perspectives

Electronic Medical Records and Genomics: Possibilities, Realities, Ethical Issues to Consider

Big Data Challenges in Bioinformatics

#Aim2Innovate. Share session insights and questions socially. Genomics, Precision Medicine and the EHR 6/17/2015

SQL Server 2012 Performance White Paper

COACH Clinician Forum 2015

Names Michael Willumsen Eficode Jesper Karup Eficode Mek Nielsen Agfa HealthCare

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Data Mining and Machine Learning in Bioinformatics

The Future of Personalized Medicine: Powered by Real World Data. Kris Joshi, PhD Global Vice President

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

SAP Business Suite powered by SAP HANA

Genomic Medicine The Future of Cancer Care. Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America

EMC DOCUMENTUM CONTENT ENABLED EMR Enhance the value of your EMR investment by accessing the complete patient record.

DeCyder Extended Data Analysis (EDA) Software

Balancing Big Data for Security, Collaboration and Performance

Integration of Genetic and Familial Data into. Electronic Medical Records and Healthcare Processes

SAP HANA PLATFORM Top Ten Questions for Choosing In-Memory Databases. Start Here

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

Big Data An Opportunity or a Distraction? Signal or Noise?

User Needs and Requirements Analysis for Big Data Healthcare Applications

Individualizing Your Lung Cancer Care: Informing Decisions Through Biomarker Testing

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Visual Mining for Big Data

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

Cisco Data Preparation

PHARMACEUTICAL BIGDATA ANALYTICS

How To Change Medicine

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Saving healthcare costs by implementing new genetic risk tests for early detection of cancer and prevention of cardiovascular diseases

Big Data in Medicine

ASCO s CancerLinQ aims to rapidly improve the overall quality of cancer care, and is the only major cancer data initiative being developed and led by

ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials)

KNOWLEDGENT WHITE PAPER. Big Data Enabling Better Pharmacovigilance

Foundations of Business Intelligence: Databases and Information Management

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Big Data Mining Services and Knowledge Discovery Applications on Clouds

Find your future in the history

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

Big Data Analytics for Healthcare

The National Institute of Genomic Medicine (INMEGEN) was

The 100,000 genomes project

ViewPoint 6. Clarify your View

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

SOLUTION BRIEF. IMAT Enhances Clinical Trial Cohort Identification. imatsolutions.com

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

Lofan Abrams Data Services for Big Data Session # 2987

Smarter Research. Joseph M. Jasinski, Ph.D. Distinguished Engineer IBM Research

Integrating Bioinformatics, Medical Sciences and Drug Discovery

Embedded Systems in Healthcare. Pierre America Healthcare Systems Architecture Philips Research, Eindhoven, the Netherlands November 12, 2008

High Performance Computing Initiatives

Transcription:

Medical Data into Dr. Matthieu-P. Schapranow GLOBAL HEALTH, Rome, Italy August 27, 2014

Important things first: Where to find additional information? Online: Visit http://we.analyzegenomes.com for latest research results, tools, and news Offline: Read more about it, e.g. High-Performance In-Memory Genome Data Analysis: How In-Memory Database Technology Accelerates Personalized Medicine, In-Memory Data Management Research, Springer, ISBN: 978-3-319-03034-0, 2014 In Person: Join us for the International Workshop on Big Data in Bioinformatics and Healthcare Informatics (BBH14) in Washington D.C. on Oct 27, 2014 (http://bbh14.analyzegenomes.com) 2

Who you are dealing with? Software Engineer by training (B.Sc., M.Sc., PhD) 2006-2014 with Since 2007 at Chair of Prof. Hasso Plattner, HPI Since 2009 focusing on Life Sciences / E-Health 3

Where do I work? Hasso Plattner Institute, Potsdam, Germany 4

Hasso Plattner Institute Key Facts Founded as a public-private partnership in 1998 in Potsdam near Berlin, Germany Institute belongs to the University of Potsdam Ranked 1 st in CHE since 2009 500 B.Sc. and M.Sc. students 10 professors, 150 PhD students Course of study: IT Systems Engineering 5

Hasso Plattner Institute Programs Full university curriculum Bachelor (6 semesters) Master (4 semesters) Orthogonal Activities: E-Health Consortium School of Design Thinking Research School 6

Hasso Plattner Institute Enterprise Platform and Integration Concepts Group Chair of Prof. Dr. h.c. Hasso Plattner Research focuses on the technical aspects of enterprise software and design of complex applications In-Memory Data Management for Enterprise Applications Enterprise Application Programming Model Scientific Data Management Human-Centered Software Design and Engineering Industry cooperations, e.g. SAP, Siemens, Audi, and EADS Research cooperations, e.g. Stanford, MIT, and Berkeley Partner of Stanford Center for Design Research Partner at UC Berkeley RAD / AMP Lab Partner of MIT in Supply Chain Innovation and CSAIL Partner of SAP AG 7

The Setting Actors in Oncology Patients Individual anamnesis, family history, and background Require fast access to individualized therapy Clinicians Researchers Identify root and extent of disease using laboratory tests Evaluate therapy alternatives, adapt existing therapy Conduct laboratory work, e.g. analyze patient samples Create new research findings and come-up with treatment alternatives 8

Our Motivation Make Precision Medicine Come Routine in Real Life Motivation: Can we enable patients to: Understand and monitor their diseases to document the impact on their lives, Receive latest information about their (chronic) diseases, Cooperatively exchange with physicians and patients to improve quality of living 9

Our Motivation Make Precision Medicine Come Routine in Real Life (cont d) Motivation: Can we enable clinicians to take their therapy decisions: Incorporating all available specifics about each individual patient, Referencing latest lab results and worldwide medical knowledge, and Interactively during their ward round? 10

IT Challenge: How to Integrate Distributed and Heterogeneous Sources of Big Medical Data Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB Hospital information systems Often more than 50GB PubMed database >23M articles Medical sensor data Scan of a single organ in 1s Cancer patient records creates 10GB of raw data >160k records at NCT Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov 11

Our Methodology Design Thinking Methodology 12

Our Technology: In-Memory Database Technology Enabling Real-time Data Analysis + Combined column and row store Map/Reduce Single and multi-tenancy Insert only for time travel Real-time replication Working on integers Active/passive data store Minimal projections Group key Dynamic multithreading Bulk load of data Objectrelational mapping Lightweight compression SQL + + + P v + + + No aggregate tables Data partitioning Any attribute as index On-the-fly extensibility Analytics on historical data Multi-core/ parallelization t SQL interface on columns and rows x x + Reduction of software layers T disk Text retrieval and extraction engine No disk 13

Our Approach: Analyze Genomes A Cloud Platform Enabling Real-time Analysis of Big Medical Data Oncolyzer Clinical Trial Assessment Pathway Topology Analysis Drug Response Analysis Cohort Analysis Medical Knowledge Cockpit... Real-time Analysis Access Control, Data Protection Data Exchange, App Store Statistical Tools Fair Use Extensions for Life Sciences App-spanning User Profiles Genome Data Resarch Publications Genome Metadata Pipeline and Analysis Models Cellular Pathways Combined and Linked Data Drugs and Interactions In-Memory Database 14

Real-time Processing of Event Data from Medical Sensors Processing of sensor data, e.g. from Intensive Care Units (ICUs) or wearable sensor devices (quantify self) Multi-modal real-time analysis to detect indicators for severe events, such as heart attacks or strokes t Comparison of waveform data with history of similar patients Incorporates machine-learning algorithms to detect severe events and to inform clinical personnel in time Successfully tested with 100 Hz event rate, i.e. sufficient for ICU use 15

Drug Safety Statistical Analysis of Drug Side Effects Data Combines confirmed side effect data from different data sources Interactive statistical analysis, e.g. apriori rules, to discover still unknown interactions Integrates personal prescription data and directly report side effects Unified access to international side effect data Work together with your doctor to prevent interaction with already prescribed drugs + + + On-the-fly extension of database schema to add side effect databases 16

Medical Knowledge Cockpit Search for affected genes in distributed and heterogeneous data sources Immediate exploration of relevant information, such as Gene descriptions, Molecular impact and related pathways, Scientific publications, and Unified access to structured and un-structured data sources Automatic clinical trial matching build on text analysis features Suitable clinical trials. No manual searching for hours or days: In-memory technology translates searching into interactive finding! 17

Medical Knowledge Cockpit Publications In-place preview of relevant data, such as publications and publication meta data Incorporating individual filter settings, e.g. additional search terms 18

Medical Knowledge Cockpit Latest Clinical Trials Personalized clinical trials, e.g. by incorporating patient specifics Classification of internal/external trials based on treating institute 19

Medical Knowledge Seamless Integration of Patient Specifics Google-like user interface for searching data Seamless integration of individual EMR data Search various sources for biomarkers, literature, and diseases 20

Medical Knowledge Cockpit Publications Interactively explore relevant publications, e.g. PDFs Improved ease of exploration, e.g. by highlighted medical terms and relevant concepts 21

Medical Knowledge Cockpit Pathway Topology Analysis Search in pathways is limited to is a certain element contained today Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA Implemented graph-based topology exploration and ranking based on patient specifics Unified access to multiple formerly disjoint data sources Pathway analysis of genetic variants with graph engine Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start 22

Medical Knowledge Cockpit Search in Structured and Unstructured Medical Data Extended text analysis feature by medical terminology Genes (122,975 + 186,771 synonyms) Medical terms and categories (98,886 diseases, 47 categories) Pharmaceutical ingredients (7,099) T Unified access to structured and unstructured data sources Clinical trial matching using text analysis features Indexed clinicaltrials.gov database (145k trials/ 30,138 recruiting) Extracted, e.g., 320k genes, 161k ingredients, 30k periods Select studies based on multiple filters in less than 500ms 23

Cloud-based Services for Processing of DNA Data Control center for processing of raw DNA data, such as FASTQ, SAM, and VCF Personal user profile guarantees privacy of uploaded and processed data Supports reproducible research process by storing all relevant process parameters Standardized Modeling and runtime environment for analysis pipelines Implements prioritized data processing and fair use, e.g. per department or per institute Supports additional service, such as data annotations, billing, and sharing for all Analyze Genomes services Honored by the 2014 European Life Science Award 24

Interactive Genome Browser Genome Browser enables interactive comparison of multiple genomes Combined knowledge by integrating latest international annotations and literature, e.g. from NCBI, dbsnp, and UCSC Detailed exploration of genome locations and existing associations Unified access to multiple formerly disjoint data sources Matching of genetic variants and relevant annotations Ranked variants, e.g. accordingly to known diseases Links always back to primary data sources to guarantee validity of discovered findings 25

Analysis of Patient Cohorts In a patient cohort, a subset does not respond to therapy why? Clustering using various statistical algorithms, such as k-means or hierarchical clustering Calculation of all locus combinations in which at least 5% of all TCGA participants have mutations: 200ms for top 20 combinations Fast clustering directly performed within the inmemory database Individual clusters are calculated in parallel directly within the database K-means algorithm: 50ms (PAL) vs. 500ms (R) 26

Oncolyzer t Unified access to formerly disjoint oncological data sources Flexible analysis on patient s longitudinal data Research initiative for exchanging relevant tumor data to improve personalized treatment Real-time analysis of tumor data in seconds instead of hours Information available at your fingertips: Inmemory technology on mobile devices, e.g. ipad Interdisciplinary cooperation between clinicians, clinical researchers, and software engineers Honored with the 2012 Innovation Award of the German Capitol Region 27

Oncolyzer Patient Details Screen Combines patient s longitudinal time series data with individual analysis results Real-time analysis across hospital-wide data using always latest data when details screen is accessed http://epic.hpi.unipotsdam.de/home/ HanaOncolyzer 28

Oncolyzer Patient Analysis Screen Allows real-time analysis on complete patient cohort Supports identification of clinical trial participants based on their individual anamnesis Flexible filters and various chart types allow graphical exploration of data on mobile devices 29

Drug Response Analysis Data Sources and Matching Collect Patient Data Sequence Tumor Conduct Xenograft Experiments Metadata e.g. smoking status, tumor classification and age 1MB-100MB Genome Data e.g. raw DNA data and genetic variants 100MB-500GB Perform Manual Data Exploration And Analysis Experiment Results e.g. medication effectivity obtained from wet laboratory 10MB-1GB 30

Drug Response Analysis Interactive Data Exploration Drug response depends on individual genetic variants of tumors Challenge: Identification of relevant genetic variants and their impact on drug response is a ongoing research activity, e.g. Xenograft models Exploration of experiment results is timeconsuming and Excel-driven Interactive analysis of correlations between drugs and genetic variants In-memory technology enables interactive exploration of experiment data to leverage new scientific insights 31

Interactive Clinical Trial Recruitment Switch from trial-centric to patient-centric clinical trials Real-time matching and clustering of patients and clinical trial inclusion/exclusion criteria No manual pre-screening of patients for months: In-memory technology enables interactive prescreening process Assessment of patients preconditions for clinical trials Reassessment of already screened or already participating patient reduces recruitment costs 32

Join us for upcoming projects! Analysis of waveform data from intensive care stations (MIT) Design and discovery of clinical trials (Cytolon) Discovery of new drugs (Bayer) Detect cardiovascular diseases and evaluate treatment options (BMBF) Use health insurance data to improve health care (BMWi) Processing of big medical data (Bachelor s project) Interdisciplinary Design Thinking Teams You? 33

What to take home? Test-drive it yourself: http://we.analyzegenomes.com For patients Identify relevant clinical trials and medical experts Start most appropriate therapy as early as possible For clinicians Preventive diagnostics to identify risk patients early Indicate pharmacokinetic correlations For researchers Scan for similar patient cases, e.g. to evaluate therapy Enable real-time analysis of medical data and its assessment, e.g. assess pathways to identify impact of detected variants Combined free-text search in publications, diagnosis, and EMR data, i.e. structured and unstructured data 34

Keep in contact with us after returning from Rome Dr. Matthieu-P. Schapranow schapranow@hpi.de http://we.analyzegenomes.com/ Hasso Plattner Institute Enterprise Platform & Integration Concepts (EPIC) Program Manager E-Health Dr. Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany 35