Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients

Size: px
Start display at page:

Download "Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients"

Transcription

1 Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Brandy Bernard PhD Senior Research Scientist Institute for Systems Biology Seattle, WA Dr. Bernard s research interests are in cancer drug discovery and clinical genomics. He is currently a part of the ISB Genome Data Analysis Center (GDAC) within The Cancer Genome Atlas (TCGA) network. During this time, he has developed novel computational methods and analyses in support of TCGA network research and publications, and has provided scientific guidance for the data exploration tools and algorithms developed by the team. Dr. Bernard has led the group s research efforts and contributions to several TCGA Analysis Working Groups, particularly in the area of heterogeneous data integration and graph analysis. In collaboration with experts in functional genomics he has integrated TCGA and RNAi screening data to prioritize novel targets and tumor types for drug discovery and repurposing. His research in the area of cancer genomics has resulted in several proffered presentations at TCGA symposia and AACR meetings on distinct topics, a First Prize in the YarcData Graph Analytics Challenge, and a Life Science Discovery Fund grant to further the development of our cancer genomics web portals. In the area of clinical genomics, Dr. Bernard co-leads a collaboration with Inova Translational Medicine Institute (ITMI) to provide analytic support and develop scalable infrastructure for the integration of clinical data with whole genome sequences and molecular data from thousands of patients. Related to this effort, Dr. Bernard has worked with the PRE-EMPT Global Pregnancy Collaboration (CoLab) as well as the Crohn s and Colitis Foundation of America (CCFA) to advise in the study design and infrastructure of large-scale clinical genomics programs. Annual Quality Congress Breakout Session, Sunday, October 4, 2015 Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Objective: Define systems biology and relate this concept to the NICU context.

2 Disclosure Clinical Genomics: Scalable Analysis Across Thousands of Patients Sr. Research Scientist Brady Bernard does not have any financial arrangement or affiliations with a commercial entity. Brady Bernard will not be discussing the unlabeled use of a commercial product in her presentation. Example Big Science Projects The Cancer Genome Atlas (TCGA) Genomic and molecular characterization of 30 cancers across thousands of primary tumor samples Clinicians Researchers Software engineers Bioinformaticians TCGA Research Network Inova Translational Medicine Institute (ITMI) Analysis of thousands of whole genome sequences integrated with clinical data TCGA data and biospecimen flow Inova Translational Medicine Institute (ITMI) ITMI aims to assemble one of the world s largest collections of whole genome sequences in a single database to enable personalized healthcare and spur biomedical research Example projects: Families with full and preterm births Longitudinal study first 1000 days of life Congenital anomalies October 4,

3 Hi-level clinical genomics workflow Clinical EMR Survey Phenotypes Data cleansing Feature merging Functional cores Highly Collaborative Study design Phenotype prioritization Patient/family selection Data generation protocols Focused subgroup meetings Scalable models (dataset creation, analysis result exploration, ) Validation (predictors, variants, ) Publication and Visualization Genomic Sequencing QC/QA Data management Annotation Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Analysis Characterization (especially clinical) Genotype/phenotype associations Clinically relevant prediction Data integration Interactive exploration (web portals) Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Clinical considerations EMR and survey formulation Consistency (formalism) in the data Organization, LIMS, and metadata Aggregation of common data elements Precise phenotyping and sample size Prediction frameworks October 4,

4 EMR and Survey formulation Timing of events with respect to some reference point Easy to overlook Important for analysis and prediction Does lack of response mean no, don't know, didn't want to answer EMR and Survey formulation Timing of events with respect to some reference point Easy to overlook Important for analysis and prediction Does lack of response mean no, don't know, didn't want to answer Consistency (formalism) in the data Structured data dictionary for: Consistency across clinical or research sites More seamless automation Feature name matching (e.g., data dictionary and column name in ecrf data) Misspellings and synonyms (e.g., drugs) Mixed delimiters Data provenance and versioning Excel files? Organization, LIMS, and metadata Excel files are tempting, will not work for large consortium LIMS systems can capture meta-data in a structured and queryable form Metadata examples: source tissue known variations across batches (software changes etc.) mapped sample ids across data types date sample was taken date sample was processed Aggregation of common data elements Precise phenotyping and sample size Different sources of evidence for premature rupture of membranes Antenatan_Steroids_Indication:pprom Antenatan_Steroids_Indication:prom Delivery_Result_of_Other_Reason:pprom Delivery_Result_of_Other_Reason:prom Other_Medical_Conditions:pprom Other_Medication_Indication:pprom Prom Reason_for_C-Section_mc:pprom Reason_for_C-Section_mc:prom Tocolytic_Therapy_Indication:pprom Tocolytic_Therapy_Indication:prom Was_the_Delivery_a_Result_of:prom October 4,

5 Precise phenotyping and sample size Precise phenotyping and sample size Precise phenotyping and sample size Prediction frameworks Goal: predict phenotypes or outcomes given clinical, genomic, and molecular data Non-linearity in data, classifiers should account for this Many clinical data elements highly correlated or irrelevant for prediction, should be black-listed Cross-validation and independent data sets should be considered in advance Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Genomic & Molecular Data Annotations Confounding factors Batch effects [will happen] Ancestry [is very important] October 4,

6 Annotations Current, common, and updated: Reference genome builds Gene definitions Software and annotation versions Batch effects: Methylation plate position Mendelian inheritance errors Extremely heterozygous variants Missing calls Commonly mutated segments Batch effects: Example variant associated with PTB Batch effects: Example variant associated with PTB 23% 45 Samples ordered by date variant variant preterm 5% 19 FTB PTB n=401 n=198 p = 2e-11 preterm A admixture A admixture , same pattern in ISB in house CGI genomes Graphic summary Ancestry and population stratification BATCH PTB October 4,

7 Ancestry and population stratification Family-based genomic study design Transmission disequilibrium Identify variants that are transmitted to effected offspring more frequently than expected by chance Population associated variants and class imbalance lead to likely false positives Advantage Accounts for population stratification Larger pedigrees can be helpful, though phenotypic and genomic data may not be available Family genomics: phasing and candidate genes Family genomics: phasing and candidate genes Roach et al. (2010). Science Roach et al. (2010). Science Mendelian Inheritance Errors (MIEs) Mendelian Inheritance Errors (MIEs) Can be real de novo mutations Most likely explanation is sequencing error MIEs, while infrequent, are observed orders of magnitude more than the expected de novo mutation rate October 4,

8 Accuracy and Variant call quality Percent accuracy (100 %MIE) Less than 0.05% of all calls are MIE Less then 0.002% MIE above quality score of 80 With family trios, sequencing errors can be identified and spurious associations/type 1 error is mitigated, enabling utility of whole genome sequences in the clinical setting Variant call quality score Genomic & Molecular Data Recommendations Clinical analysis can inform genomic study design Phenotype definition and prioritization is critical Matched and balanced cases and controls minimize batch effects improve statistical strength mitigate confounding factors Larger segregating pedigrees reduce number of candidate genes potentially hybrid approaches Run nuclear families (and multigenerational, if possible) together on the same batch Quality Control Re-run controls across batches Maintain highly detailed annotations on dates, reagents, sequencing runs, software versions, tissues, External data sets from data provider to mitigate batch effects SNP arrays for sample identifiability and case/control matching Additional data considerations Staging and production environments As data is being generated, upload to staging environment then QC and structure for analysis/consumption Data freezes Create common data sets, annotation pipelines, and files for collaborative analysis SNP arrays for sample identifiability and case/control matching, especially as number of data types and source sites increases Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis De-identification, HIPAA, and Universally unique identifiers (UUIDs) Computing and collaborative projects NCI cancer cloud pilot Cloud computing Consolidate data to a centralized source Scalable computing Data backup Minimize IT Workflow management systems Web portals October 4,

9 NCI cancer cloud pilot Scalable Genomics Technologies and Architecture 7,000 genomes Staging File Server Google Standard Archival storage 100GB/subject Reads (BAM) 2GB/subject Variants (VCF) *Petabyte scale data Google Nearline Bioinformatic Pipelines Quality Control Re process Batch Assessment Data Normalization Organize/Structure Parallel Compute/Analysis Google Compute Engine In house cluster Distributed Databases Billions of Unique Variants Terabytes of Data (High Access) Google Genomics / BigQuery + Annotations Public (significant ETL and data modeling) ISB Proprietary (aggregated over thousands of control sequences) Open source workflow management systems Assist with provenance, data access, analysis, complex workflows, reproducible science Web portals Project summaries, reports, auditing Data access Research & dissemination Interactive exploration & dynamic analysis October 4,

10 Additional computational considerations Security/authentication/access controls Wikis/project pages (e.g., Confluence) and subteams (e.g., analysis working groups) Issue and project tracking (e.g., JIRA) Listservs and management Code repositories (e.g., GitHub) Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Concluding thoughts There are many potential [and avoidable] pitfalls Infrastructure required to establish and support Big Science consortium projects is significant and easy to underestimate Roles and responsibilities, maintenance, costs, support, data generation, QC, QA, With TCGA and ITMI, questions to be addressed with the data far exceeds the bandwidth of direct participants significant value to community in curated clinical, genomic, and molecular data what will consortium (and IRB?) guidelines be for access control and use by the broader community brady.bernard@systemsbiology.org 2bnh: alpha beta horseshoe 1hv9: left handed beta helix 1m30: SH3 like barrel October 4,

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

Balancing Big Data for Security, Collaboration and Performance

Balancing Big Data for Security, Collaboration and Performance Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World

More information

Computational Requirements

Computational Requirements Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density

More information

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015 19th of March 2015 MediSapiens Ltd Because data is not knowledge Bio-IT solutions for improving cancer patient care Sami Kilpinen, Ph.D Co-founder, CEO MediSapiens Ltd Copyright 2015 MediSapiens Ltd. All

More information

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova

Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel

More information

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats

Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats Genomic Medicine 8 meeting Alexa McCray Christopher G Chute Rex

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

TRANSLATIONAL BIOINFORMATICS 101

TRANSLATIONAL BIOINFORMATICS 101 TRANSLATIONAL BIOINFORMATICS 101 JESSICA D. TENENBAUM Department of Bioinformatics and Biostatistics, Duke University Durham, NC 27715 USA Jessie.Tenenbaum@duke.edu SUBHA MADHAVAN Innovation Center for

More information

Enabling the Big Data Commons through indexing of data and their interactions

Enabling the Big Data Commons through indexing of data and their interactions biomedical and healthcare Data Discovery Index Ecosystem Enabling the Big Data Commons through indexing of and their interactions 2 nd BD2K all-hands meeting Bethesda 11/12/15 Aims 1. Help users find accessible

More information

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS

ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS INCORPORATE GENOMIC DATA INTO CLINICAL R&D KEY BENEFITS Enable more targeted, biomarker-driven clinical trials Improves efficiencies, compressing

More information

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis

More information

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome

More information

Big Data Visualization for Genomics. Luca Vezzadini Kairos3D

Big Data Visualization for Genomics. Luca Vezzadini Kairos3D Big Data Visualization for Genomics Luca Vezzadini Kairos3D Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to

More information

Introduction to Arvados. A Curoverse White Paper

Introduction to Arvados. A Curoverse White Paper Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12

More information

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf

#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf Jenkins as a Scientific Data and Image Processing Platform Ioannis K. Moutsatsos, Ph.D., M.SE. Novartis Institutes for Biomedical Research www.novartis.com June 18, 2014 #jenkinsconf Life Sciences are

More information

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD

HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD DEFINITION OF BIG DATA Big data is a broad term for data sets so large or complex that traditional data processing applications

More information

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools. Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools. Empowering microbial genomics. Extensive methods. Expansive possibilities. In microbiome studies

More information

NIH s Genomic Data Sharing Policy

NIH s Genomic Data Sharing Policy NIH s Genomic Data Sharing Policy 2 Benefits of Data Sharing Enables data generated from one study to be used to explore a wide range of additional research questions Increases statistical power and scientific

More information

Testimony of. Paul Misener Vice President for Global Public Policy, Amazon.com. Before the

Testimony of. Paul Misener Vice President for Global Public Policy, Amazon.com. Before the Testimony of Paul Misener Vice President for Global Public Policy, Before the United States House of Representatives Committee on Energy and Commerce Subcommittee on Communications and Technology Subcommittee

More information

Big Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.

Big Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres. Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine

More information

How Real-time Analysis turns Big Medical Data into Precision Medicine?

How Real-time Analysis turns Big Medical Data into Precision Medicine? Medical Data into Dr. Matthieu-P. Schapranow GLOBAL HEALTH, Rome, Italy August 27, 2014 Important things first: Where to find additional information? Online: Visit http://we.analyzegenomes.com for latest

More information

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends

Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica Overview and Agenda Dell Software overview Dell in

More information

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation

HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous

More information

Big Data Trends A Basis for Personalized Medicine

Big Data Trends A Basis for Personalized Medicine Big Data Trends A Basis for Personalized Medicine Dr. Hellmuth Broda, Principal Technology Architect emedikation: Verordnung, Support Prozesse & Logistik 5. Juni, 2013, Inselspital Bern Over 150,000 Employees

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

Data Management Tools: practical approaches and lessons learned when scaling up a computing and data environment to keep up with the pace of data

Data Management Tools: practical approaches and lessons learned when scaling up a computing and data environment to keep up with the pace of data Data Management Tools: practical approaches and lessons learned when scaling up a computing and data environment to keep up with the pace of data intensive research Declaration of Potential Conflicts-of-Interest,

More information

Oracle Health Sciences Translational Research Center: A Translational Medicine Platform to Address the Big Data Challenge

Oracle Health Sciences Translational Research Center: A Translational Medicine Platform to Address the Big Data Challenge An Oracle White Paper June 2012 Oracle Health Sciences Translational Research Center: A Translational Medicine Platform to Address the Big Data Challenge Disclaimer The following is intended to outline

More information

SAP HANA Enabling Genome Analysis

SAP HANA Enabling Genome Analysis SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in

More information

> Semantic Web Use Cases and Case Studies

> Semantic Web Use Cases and Case Studies > Semantic Web Use Cases and Case Studies Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection Robert Stanley 1, Bruce McManus 2, Raymond

More information

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper

IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and

More information

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio

More information

High Performance Computing Initiatives

High Performance Computing Initiatives High Performance Computing Initiatives Eric Stahlberg September 1, 2015 DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Cancer Institute Frederick National Laboratory is

More information

BlueFuse Multi Analysis Software for Molecular Cytogenetics

BlueFuse Multi Analysis Software for Molecular Cytogenetics BlueFuse Multi Analysis Software for Molecular Cytogenetics A powerful software package designed to detect and display areas of potential chromosomal abnormality within the genome. Highlights Seamless

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Bench to Bedside Clinical Decision Support:

Bench to Bedside Clinical Decision Support: Bench to Bedside Clinical Decision Support: The Role of Semantic Web Technologies in Clinical and Translational Medicine Tonya Hongsermeier, MD, MBA Corporate Manager, Clinical Knowledge Management and

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

BIOINFORMATICS Supporting competencies for the pharma industry

BIOINFORMATICS Supporting competencies for the pharma industry BIOINFORMATICS Supporting competencies for the pharma industry ABOUT QFAB QFAB is a bioinformatics service provider based in Brisbane, Australia operating nationwide and internationally. QFAB was established

More information

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices

Pipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding

More information

Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on

Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics University of California San Diego PCORI Workshop 7/2/12

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

The data explosion is transforming science

The data explosion is transforming science Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the

More information

Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network

Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network Dan Roden Member, National Advisory Council For Human Genome Research Genomic Medicine Working Group

More information

Public Health and the Learning Health Care System Lessons from Two Distributed Networks for Public Health

Public Health and the Learning Health Care System Lessons from Two Distributed Networks for Public Health Public Health and the Learning Health Care System Lessons from Two Distributed Networks for Public Health Jeffrey Brown, PhD Assistant Professor Department of Population Medicine Harvard Medical School

More information

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes:

SMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes: SMRT Analysis v2.2.0 Overview 100 338 400 01 1. SMRT Analysis v2.2.0 1.1 SMRT Analysis v2.2.0 Overview Welcome to Pacific Biosciences' SMRT Analysis v2.2.0 Overview 1.2 Contents This module will introduce

More information

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1

NIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1 MEMORANDUM TO: Principal Investigators and Research Staff DATE: 2/22/15 FROM: Anne Klibanski, MD, Partners Chief Academic Officer (CAO) Paul Anderson, MD, PhD, BWH CAO Harry Orf, PhD, MGH Sr. Vice President-Research

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

<Insert Picture Here> The Evolution Of Clinical Data Warehousing The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge

More information

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps

Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps White Paper Healthcare Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps Executive Summary The Transformation Lab at Intermountain Healthcare in Salt Lake City, Utah,

More information

From Terabytes to Exabytes, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations

From Terabytes to Exabytes, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations A Bioinformatics Research & Consulting Group From Terabytes to Exabytes, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations Ali Eghlima

More information

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI

Big Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements

More information

IO Informatics The Sentient Suite

IO Informatics The Sentient Suite IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric

More information

Data Wrangling: From the Wild to the Lake

Data Wrangling: From the Wild to the Lake Data Wrangling: From the Wild to the Lake Ignacio Terrizzano Peter Schwarz Mary Roth John Colino IBM Research - Almaden 48 hours of video is uploaded to YouTube every minute Walmart processes million transactions

More information

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all

More information

Globus Genomics Tutorial GlobusWorld 2014

Globus Genomics Tutorial GlobusWorld 2014 Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for

More information

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity

More information

Career Tracks- Information Technology Family

Career Tracks- Information Technology Family Career Tracks- Information Technology Family FUNCTIONAL AREA Applications Programming AV IT AV IT Engineering Bioinformatics Involved in the development of server/os/desktop/mobile applications and services

More information

Attacking the Biobank Bottleneck

Attacking the Biobank Bottleneck Attacking the Biobank Bottleneck Professor Jan-Eric Litton BBMRI-ERIC BBMRI-ERIC Big Data meets research biobanking Big data is high-volume, high-velocity and highvariety information assets that demand

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw

Healthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics

More information

BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am)

BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am) BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am) Course Instructor: Dr. Tzu L. Phang, Assistant Professor

More information

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Dr Alexander Henzing

Dr Alexander Henzing Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander

More information

Integration of genomic data into electronic health records

Integration of genomic data into electronic health records Integration of genomic data into electronic health records Daniel Masys, MD Affiliate Professor Biomedical & Health Informatics University of Washington, Seattle Major portion of today s lecture is based

More information

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences

Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge

More information

Big Data and Data Analysis for Personalized Medicine

Big Data and Data Analysis for Personalized Medicine Big Data and Data Analysis for Personalized Medicine Dr. Paul Terry Ambassador Agenda Information and Data The Technology The Promise Personalized Medicine 2 CEO/CTO of PHEMI Board of Life Sciences BC

More information

Overview. Overarching observations

Overview. Overarching observations Overview Genomics and Health Information Technology Systems: Exploring the Issues April 27-28, 2011, Bethesda, MD Brief Meeting Summary, prepared by Greg Feero, M.D., Ph.D. (planning committee chair) The

More information

Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics

Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics David A Dworaczyk, PhD Life and Health Sciences Strategic Development 11 December, 2014 Safe Harbor

More information

CAREER TRACKS PHASE 1 UCSD Information Technology Family Function and Job Function Summary

CAREER TRACKS PHASE 1 UCSD Information Technology Family Function and Job Function Summary UCSD Applications Programming Involved in the development of server / OS / desktop / mobile applications and services including researching, designing, developing specifications for designing, writing,

More information

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.

ebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. www.persistent.com 3 4 5 5 7 9 10 11 12 13 From the Vantage Point

More information

Visual Mining for Big Data

Visual Mining for Big Data Visual Mining for Big Data Big Dive June 21st, 2013 Alessandro Piglia Kairos3D Where do we come from? Kairos3D comes from real-time 3D graphics Serious Games (virtual visits, training for industry operators,

More information

Storage Solutions for Bioinformatics

Storage Solutions for Bioinformatics Storage Solutions for Bioinformatics Li Yan Director of FlexLab, Bioinformatics core technology laboratory liyan3@genomics.cn http://www.genomics.cn/flexlab/index.html Science and Technology Division,

More information

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India 1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto

More information

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray

More information

Gaining Ground in Translation Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health

Gaining Ground in Translation Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Gaining Ground in Translation Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Some Key Challenges in Biomedical Research Providing robust methods and tools for translation Conducting

More information

Primetime for KNIME:

Primetime for KNIME: Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director IT, Cenix BioScience Presentation for: KNIME User Group Meeting

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Worldwide Collaborations in Molecular Profiling

Worldwide Collaborations in Molecular Profiling Worldwide Collaborations in Molecular Profiling Lillian L. Siu, MD Director, Phase I Program and Cancer Genomics Program Princess Margaret Cancer Centre Lillian Siu, MD Contracted Research: Novartis, Pfizer,

More information

Twister4Azure: Data Analytics in the Cloud

Twister4Azure: Data Analytics in the Cloud Twister4Azure: Data Analytics in the Cloud Thilina Gunarathne, Xiaoming Gao and Judy Qiu, Indiana University Genome-scale data provided by next generation sequencing (NGS) has made it possible to identify

More information

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins

The Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins The Future of the Electronic Health Record Gerry Higgins, Ph.D., Johns Hopkins Topics to be covered Near Term Opportunities: Commercial, Usability, Unification of different applications. OMICS : The patient

More information

Report of the DTL focus meeting on Life Science Data Repositories

Report of the DTL focus meeting on Life Science Data Repositories Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity

More information

Intro to Bioinformatics

Intro to Bioinformatics Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

GENETIC DATA ANALYSIS

GENETIC DATA ANALYSIS GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made

More information

Open Platform. Clinical Portal. Provider Mobile. Orion Health. Rhapsody Integration Engine. RAD LAB PAYER Rx

Open Platform. Clinical Portal. Provider Mobile. Orion Health. Rhapsody Integration Engine. RAD LAB PAYER Rx Open Platform Provider Mobile Clinical Portal Engage Portal Allegro PRIVACY EMR Connect Amadeus Big Data Engine Data Processing Pipeline PAYER CLINICAL CONSUMER CUSTOM Open APIs EMPI TERMINOLOGY SERVICES

More information

Achilles a platform for exploring and visualizing clinical data summary statistics

Achilles a platform for exploring and visualizing clinical data summary statistics Biomedical Informatics discovery and impact Achilles a platform for exploring and visualizing clinical data summary statistics Mark Velez, MA Ning "Sunny" Shang, PhD Department of Biomedical Informatics,

More information

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group A Bioinformatics Research & Consulting Group Adding Omics Data to Electronic Health Record, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations

More information