Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients
|
|
- Dinah Carter
- 8 years ago
- Views:
Transcription
1 Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Brandy Bernard PhD Senior Research Scientist Institute for Systems Biology Seattle, WA Dr. Bernard s research interests are in cancer drug discovery and clinical genomics. He is currently a part of the ISB Genome Data Analysis Center (GDAC) within The Cancer Genome Atlas (TCGA) network. During this time, he has developed novel computational methods and analyses in support of TCGA network research and publications, and has provided scientific guidance for the data exploration tools and algorithms developed by the team. Dr. Bernard has led the group s research efforts and contributions to several TCGA Analysis Working Groups, particularly in the area of heterogeneous data integration and graph analysis. In collaboration with experts in functional genomics he has integrated TCGA and RNAi screening data to prioritize novel targets and tumor types for drug discovery and repurposing. His research in the area of cancer genomics has resulted in several proffered presentations at TCGA symposia and AACR meetings on distinct topics, a First Prize in the YarcData Graph Analytics Challenge, and a Life Science Discovery Fund grant to further the development of our cancer genomics web portals. In the area of clinical genomics, Dr. Bernard co-leads a collaboration with Inova Translational Medicine Institute (ITMI) to provide analytic support and develop scalable infrastructure for the integration of clinical data with whole genome sequences and molecular data from thousands of patients. Related to this effort, Dr. Bernard has worked with the PRE-EMPT Global Pregnancy Collaboration (CoLab) as well as the Crohn s and Colitis Foundation of America (CCFA) to advise in the study design and infrastructure of large-scale clinical genomics programs. Annual Quality Congress Breakout Session, Sunday, October 4, 2015 Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Objective: Define systems biology and relate this concept to the NICU context.
2 Disclosure Clinical Genomics: Scalable Analysis Across Thousands of Patients Sr. Research Scientist Brady Bernard does not have any financial arrangement or affiliations with a commercial entity. Brady Bernard will not be discussing the unlabeled use of a commercial product in her presentation. Example Big Science Projects The Cancer Genome Atlas (TCGA) Genomic and molecular characterization of 30 cancers across thousands of primary tumor samples Clinicians Researchers Software engineers Bioinformaticians TCGA Research Network Inova Translational Medicine Institute (ITMI) Analysis of thousands of whole genome sequences integrated with clinical data TCGA data and biospecimen flow Inova Translational Medicine Institute (ITMI) ITMI aims to assemble one of the world s largest collections of whole genome sequences in a single database to enable personalized healthcare and spur biomedical research Example projects: Families with full and preterm births Longitudinal study first 1000 days of life Congenital anomalies October 4,
3 Hi-level clinical genomics workflow Clinical EMR Survey Phenotypes Data cleansing Feature merging Functional cores Highly Collaborative Study design Phenotype prioritization Patient/family selection Data generation protocols Focused subgroup meetings Scalable models (dataset creation, analysis result exploration, ) Validation (predictors, variants, ) Publication and Visualization Genomic Sequencing QC/QA Data management Annotation Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Analysis Characterization (especially clinical) Genotype/phenotype associations Clinically relevant prediction Data integration Interactive exploration (web portals) Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Clinical considerations EMR and survey formulation Consistency (formalism) in the data Organization, LIMS, and metadata Aggregation of common data elements Precise phenotyping and sample size Prediction frameworks October 4,
4 EMR and Survey formulation Timing of events with respect to some reference point Easy to overlook Important for analysis and prediction Does lack of response mean no, don't know, didn't want to answer EMR and Survey formulation Timing of events with respect to some reference point Easy to overlook Important for analysis and prediction Does lack of response mean no, don't know, didn't want to answer Consistency (formalism) in the data Structured data dictionary for: Consistency across clinical or research sites More seamless automation Feature name matching (e.g., data dictionary and column name in ecrf data) Misspellings and synonyms (e.g., drugs) Mixed delimiters Data provenance and versioning Excel files? Organization, LIMS, and metadata Excel files are tempting, will not work for large consortium LIMS systems can capture meta-data in a structured and queryable form Metadata examples: source tissue known variations across batches (software changes etc.) mapped sample ids across data types date sample was taken date sample was processed Aggregation of common data elements Precise phenotyping and sample size Different sources of evidence for premature rupture of membranes Antenatan_Steroids_Indication:pprom Antenatan_Steroids_Indication:prom Delivery_Result_of_Other_Reason:pprom Delivery_Result_of_Other_Reason:prom Other_Medical_Conditions:pprom Other_Medication_Indication:pprom Prom Reason_for_C-Section_mc:pprom Reason_for_C-Section_mc:prom Tocolytic_Therapy_Indication:pprom Tocolytic_Therapy_Indication:prom Was_the_Delivery_a_Result_of:prom October 4,
5 Precise phenotyping and sample size Precise phenotyping and sample size Precise phenotyping and sample size Prediction frameworks Goal: predict phenotypes or outcomes given clinical, genomic, and molecular data Non-linearity in data, classifiers should account for this Many clinical data elements highly correlated or irrelevant for prediction, should be black-listed Cross-validation and independent data sets should be considered in advance Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Genomic & Molecular Data Annotations Confounding factors Batch effects [will happen] Ancestry [is very important] October 4,
6 Annotations Current, common, and updated: Reference genome builds Gene definitions Software and annotation versions Batch effects: Methylation plate position Mendelian inheritance errors Extremely heterozygous variants Missing calls Commonly mutated segments Batch effects: Example variant associated with PTB Batch effects: Example variant associated with PTB 23% 45 Samples ordered by date variant variant preterm 5% 19 FTB PTB n=401 n=198 p = 2e-11 preterm A admixture A admixture , same pattern in ISB in house CGI genomes Graphic summary Ancestry and population stratification BATCH PTB October 4,
7 Ancestry and population stratification Family-based genomic study design Transmission disequilibrium Identify variants that are transmitted to effected offspring more frequently than expected by chance Population associated variants and class imbalance lead to likely false positives Advantage Accounts for population stratification Larger pedigrees can be helpful, though phenotypic and genomic data may not be available Family genomics: phasing and candidate genes Family genomics: phasing and candidate genes Roach et al. (2010). Science Roach et al. (2010). Science Mendelian Inheritance Errors (MIEs) Mendelian Inheritance Errors (MIEs) Can be real de novo mutations Most likely explanation is sequencing error MIEs, while infrequent, are observed orders of magnitude more than the expected de novo mutation rate October 4,
8 Accuracy and Variant call quality Percent accuracy (100 %MIE) Less than 0.05% of all calls are MIE Less then 0.002% MIE above quality score of 80 With family trios, sequencing errors can be identified and spurious associations/type 1 error is mitigated, enabling utility of whole genome sequences in the clinical setting Variant call quality score Genomic & Molecular Data Recommendations Clinical analysis can inform genomic study design Phenotype definition and prioritization is critical Matched and balanced cases and controls minimize batch effects improve statistical strength mitigate confounding factors Larger segregating pedigrees reduce number of candidate genes potentially hybrid approaches Run nuclear families (and multigenerational, if possible) together on the same batch Quality Control Re-run controls across batches Maintain highly detailed annotations on dates, reagents, sequencing runs, software versions, tissues, External data sets from data provider to mitigate batch effects SNP arrays for sample identifiability and case/control matching Additional data considerations Staging and production environments As data is being generated, upload to staging environment then QC and structure for analysis/consumption Data freezes Create common data sets, annotation pipelines, and files for collaborative analysis SNP arrays for sample identifiability and case/control matching, especially as number of data types and source sites increases Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis De-identification, HIPAA, and Universally unique identifiers (UUIDs) Computing and collaborative projects NCI cancer cloud pilot Cloud computing Consolidate data to a centralized source Scalable computing Data backup Minimize IT Workflow management systems Web portals October 4,
9 NCI cancer cloud pilot Scalable Genomics Technologies and Architecture 7,000 genomes Staging File Server Google Standard Archival storage 100GB/subject Reads (BAM) 2GB/subject Variants (VCF) *Petabyte scale data Google Nearline Bioinformatic Pipelines Quality Control Re process Batch Assessment Data Normalization Organize/Structure Parallel Compute/Analysis Google Compute Engine In house cluster Distributed Databases Billions of Unique Variants Terabytes of Data (High Access) Google Genomics / BigQuery + Annotations Public (significant ETL and data modeling) ISB Proprietary (aggregated over thousands of control sequences) Open source workflow management systems Assist with provenance, data access, analysis, complex workflows, reproducible science Web portals Project summaries, reports, auditing Data access Research & dissemination Interactive exploration & dynamic analysis October 4,
10 Additional computational considerations Security/authentication/access controls Wikis/project pages (e.g., Confluence) and subteams (e.g., analysis working groups) Issue and project tracking (e.g., JIRA) Listservs and management Code repositories (e.g., GitHub) Discussion topics Clinical Genomic & Molecular Computational / Informatics / Analysis Concluding thoughts There are many potential [and avoidable] pitfalls Infrastructure required to establish and support Big Science consortium projects is significant and easy to underestimate Roles and responsibilities, maintenance, costs, support, data generation, QC, QA, With TCGA and ITMI, questions to be addressed with the data far exceeds the bandwidth of direct participants significant value to community in curated clinical, genomic, and molecular data what will consortium (and IRB?) guidelines be for access control and use by the broader community brady.bernard@systemsbiology.org 2bnh: alpha beta horseshoe 1hv9: left handed beta helix 1m30: SH3 like barrel October 4,
Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationA leader in the development and application of information technology to prevent and treat disease.
A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today
More informationBalancing Big Data for Security, Collaboration and Performance
Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World
More informationComputational Requirements
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density
More informationWorkshop on Establishing a Central Resource of Data from Genome Sequencing Projects
Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing
More informationENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and
More informationNIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
More informationMediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015
19th of March 2015 MediSapiens Ltd Because data is not knowledge Bio-IT solutions for improving cancer patient care Sami Kilpinen, Ph.D Co-founder, CEO MediSapiens Ltd Copyright 2015 MediSapiens Ltd. All
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationEnhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats
Enhancing Functionality of EHRs for Genomic Research, Including E- Phenotying, Integrating Genomic Data, Transportable CDS, Privacy Threats Genomic Medicine 8 meeting Alexa McCray Christopher G Chute Rex
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationTRANSLATIONAL BIOINFORMATICS 101
TRANSLATIONAL BIOINFORMATICS 101 JESSICA D. TENENBAUM Department of Bioinformatics and Biostatistics, Duke University Durham, NC 27715 USA Jessie.Tenenbaum@duke.edu SUBHA MADHAVAN Innovation Center for
More informationEnabling the Big Data Commons through indexing of data and their interactions
biomedical and healthcare Data Discovery Index Ecosystem Enabling the Big Data Commons through indexing of and their interactions 2 nd BD2K all-hands meeting Bethesda 11/12/15 Aims 1. Help users find accessible
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS
ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS INCORPORATE GENOMIC DATA INTO CLINICAL R&D KEY BENEFITS Enable more targeted, biomarker-driven clinical trials Improves efficiencies, compressing
More informationLarge-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri
Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis
More informationEuropean Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute
European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome
More informationBig Data Visualization for Genomics. Luca Vezzadini Kairos3D
Big Data Visualization for Genomics Luca Vezzadini Kairos3D Why GenomeCruzer? The amount of data for DNA sequencing is growing Modern hardware produces billions of values per sample Scientists need to
More informationIntroduction to Arvados. A Curoverse White Paper
Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12
More information#jenkinsconf. Jenkins as a Scientific Data and Image Processing Platform. Jenkins User Conference Boston #jenkinsconf
Jenkins as a Scientific Data and Image Processing Platform Ioannis K. Moutsatsos, Ph.D., M.SE. Novartis Institutes for Biomedical Research www.novartis.com June 18, 2014 #jenkinsconf Life Sciences are
More informationPreparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo
Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),
More informationi2b2 Clinical Research Chart
i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationHOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD
HOW WILL BIG DATA AFFECT RADIOLOGY (RESEARCH / ANALYTICS)? Ronald Arenson, MD DEFINITION OF BIG DATA Big data is a broad term for data sets so large or complex that traditional data processing applications
More informationAccelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.
Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools. Empowering microbial genomics. Extensive methods. Expansive possibilities. In microbiome studies
More informationNIH s Genomic Data Sharing Policy
NIH s Genomic Data Sharing Policy 2 Benefits of Data Sharing Enables data generated from one study to be used to explore a wide range of additional research questions Increases statistical power and scientific
More informationTestimony of. Paul Misener Vice President for Global Public Policy, Amazon.com. Before the
Testimony of Paul Misener Vice President for Global Public Policy, Before the United States House of Representatives Committee on Energy and Commerce Subcommittee on Communications and Technology Subcommittee
More informationBig Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
More informationHow Real-time Analysis turns Big Medical Data into Precision Medicine?
Medical Data into Dr. Matthieu-P. Schapranow GLOBAL HEALTH, Rome, Italy August 27, 2014 Important things first: Where to find additional information? Online: Visit http://we.analyzegenomes.com for latest
More informationPromises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends
Promises and Pitfalls of Big-Data-Predictive Analytics: Best Practices and Trends Spring 2015 Thomas Hill, Ph.D. VP Analytic Solutions Dell Statistica Overview and Agenda Dell Software overview Dell in
More informationHETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM. Aniket Bochare - aniketb1@umbc.edu. CMSC 601 - Presentation
HETEROGENEOUS DATA INTEGRATION FOR CLINICAL DECISION SUPPORT SYSTEM Aniket Bochare - aniketb1@umbc.edu CMSC 601 - Presentation Date-04/25/2011 AGENDA Introduction and Background Framework Heterogeneous
More informationBig Data Trends A Basis for Personalized Medicine
Big Data Trends A Basis for Personalized Medicine Dr. Hellmuth Broda, Principal Technology Architect emedikation: Verordnung, Support Prozesse & Logistik 5. Juni, 2013, Inselspital Bern Over 150,000 Employees
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationEuro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of
More informationData Management Tools: practical approaches and lessons learned when scaling up a computing and data environment to keep up with the pace of data
Data Management Tools: practical approaches and lessons learned when scaling up a computing and data environment to keep up with the pace of data intensive research Declaration of Potential Conflicts-of-Interest,
More informationOracle Health Sciences Translational Research Center: A Translational Medicine Platform to Address the Big Data Challenge
An Oracle White Paper June 2012 Oracle Health Sciences Translational Research Center: A Translational Medicine Platform to Address the Big Data Challenge Disclaimer The following is intended to outline
More informationSAP HANA Enabling Genome Analysis
SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in
More information> Semantic Web Use Cases and Case Studies
> Semantic Web Use Cases and Case Studies Case Study: Applied Semantic Knowledgebase for Detection of Patients at Risk of Organ Failure through Immune Rejection Robert Stanley 1, Bruce McManus 2, Raymond
More informationIEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper
IEEE International Conference on Computing, Analytics and Security Trends CAST-2016 (19 21 December, 2016) Call for Paper CAST-2015 provides an opportunity for researchers, academicians, scientists and
More informationOpenCB a next generation big data analytics and visualisation platform for the Omics revolution
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio
More informationHigh Performance Computing Initiatives
High Performance Computing Initiatives Eric Stahlberg September 1, 2015 DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Cancer Institute Frederick National Laboratory is
More informationBlueFuse Multi Analysis Software for Molecular Cytogenetics
BlueFuse Multi Analysis Software for Molecular Cytogenetics A powerful software package designed to detect and display areas of potential chromosomal abnormality within the genome. Highlights Seamless
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationBench to Bedside Clinical Decision Support:
Bench to Bedside Clinical Decision Support: The Role of Semantic Web Technologies in Clinical and Translational Medicine Tonya Hongsermeier, MD, MBA Corporate Manager, Clinical Knowledge Management and
More informationChapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
More informationBIOINFORMATICS Supporting competencies for the pharma industry
BIOINFORMATICS Supporting competencies for the pharma industry ABOUT QFAB QFAB is a bioinformatics service provider based in Brisbane, Australia operating nationwide and internationally. QFAB was established
More informationPipeline Pilot Enterprise Server. Flexible Integration of Disparate Data and Applications. Capture and Deployment of Best Practices
overview Pipeline Pilot Enterprise Server Pipeline Pilot Enterprise Server (PPES) is a powerful client-server platform that streamlines the integration and analysis of the vast quantities of data flooding
More informationResearch Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on
Research Data Networks: Privacy- Preserving Sharing of Protected Health Informa>on Lucila Ohno-Machado, MD, PhD Division of Biomedical Informatics University of California San Diego PCORI Workshop 7/2/12
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationThe data explosion is transforming science
Talk Outline The data tsunami and the 4 th paradigm of science The challenges for the long tail of science Where is the cloud being used now? The app marketplace SMEs Analytics as a service. What are the
More informationMajor US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network
Major US Genomic Medicine Programs: NHGRI s Electronic Medical Records and Genomics (emerge) Network Dan Roden Member, National Advisory Council For Human Genome Research Genomic Medicine Working Group
More informationPublic Health and the Learning Health Care System Lessons from Two Distributed Networks for Public Health
Public Health and the Learning Health Care System Lessons from Two Distributed Networks for Public Health Jeffrey Brown, PhD Assistant Professor Department of Population Medicine Harvard Medical School
More informationSMRT Analysis v2.2.0 Overview. 1. SMRT Analysis v2.2.0. 1.1 SMRT Analysis v2.2.0 Overview. Notes:
SMRT Analysis v2.2.0 Overview 100 338 400 01 1. SMRT Analysis v2.2.0 1.1 SMRT Analysis v2.2.0 Overview Welcome to Pacific Biosciences' SMRT Analysis v2.2.0 Overview 1.2 Contents This module will introduce
More informationNIH Genomic Data Sharing (GDS) Policy Guidance Memo #2 1
MEMORANDUM TO: Principal Investigators and Research Staff DATE: 2/22/15 FROM: Anne Klibanski, MD, Partners Chief Academic Officer (CAO) Paul Anderson, MD, PhD, BWH CAO Harry Orf, PhD, MGH Sr. Vice President-Research
More informationConcept and Project Objectives
3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the
More information<Insert Picture Here> The Evolution Of Clinical Data Warehousing
The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge
More informationIntegrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps
White Paper Healthcare Integrating Genetic Data into Clinical Workflow with Clinical Decision Support Apps Executive Summary The Transformation Lab at Intermountain Healthcare in Salt Lake City, Utah,
More informationFrom Terabytes to Exabytes, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations
A Bioinformatics Research & Consulting Group From Terabytes to Exabytes, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations Ali Eghlima
More informationBig Data in BioMedical Sciences. Steven Newhouse, Head of Technical Services, EMBL-EBI
Big Data in BioMedical Sciences Steven Newhouse, Head of Technical Services, EMBL-EBI Big Data for BioMedical Sciences EMBL-EBI: What we do and why? Challenges & Opportunities Infrastructure Requirements
More informationIO Informatics The Sentient Suite
IO Informatics The Sentient Suite Our software, The Sentient Suite, allows a user to assemble, view, analyze and search very disparate information in a common environment. The disparate data can be numeric
More informationData Wrangling: From the Wild to the Lake
Data Wrangling: From the Wild to the Lake Ignacio Terrizzano Peter Schwarz Mary Roth John Colino IBM Research - Almaden 48 hours of video is uploaded to YouTube every minute Walmart processes million transactions
More informationWork Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction
Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all
More informationGlobus Genomics Tutorial GlobusWorld 2014
Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for
More informationPersonalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences
Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity
More informationCareer Tracks- Information Technology Family
Career Tracks- Information Technology Family FUNCTIONAL AREA Applications Programming AV IT AV IT Engineering Bioinformatics Involved in the development of server/os/desktop/mobile applications and services
More informationAttacking the Biobank Bottleneck
Attacking the Biobank Bottleneck Professor Jan-Eric Litton BBMRI-ERIC BBMRI-ERIC Big Data meets research biobanking Big data is high-volume, high-velocity and highvariety information assets that demand
More informationi2b2 Clinical Research Chart
i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser
More informationHealthcare data analytics. Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw
Healthcare data analytics Da-Wei Wang Institute of Information Science wdw@iis.sinica.edu.tw Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
More informationBIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am)
BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am) Course Instructor: Dr. Tzu L. Phang, Assistant Professor
More informationBuilding Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT
Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationDr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
More informationIntegration of genomic data into electronic health records
Integration of genomic data into electronic health records Daniel Masys, MD Affiliate Professor Biomedical & Health Informatics University of Washington, Seattle Major portion of today s lecture is based
More informationDiscover more, discover faster. High performance, flexible NLP-based text mining for life sciences
Discover more, discover faster. High performance, flexible NLP-based text mining for life sciences It s not information overload, it s filter failure. Clay Shirky Life Sciences organizations face the challenge
More informationBig Data and Data Analysis for Personalized Medicine
Big Data and Data Analysis for Personalized Medicine Dr. Paul Terry Ambassador Agenda Information and Data The Technology The Promise Personalized Medicine 2 CEO/CTO of PHEMI Board of Life Sciences BC
More informationOverview. Overarching observations
Overview Genomics and Health Information Technology Systems: Exploring the Issues April 27-28, 2011, Bethesda, MD Brief Meeting Summary, prepared by Greg Feero, M.D., Ph.D. (planning committee chair) The
More informationData-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics
Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics David A Dworaczyk, PhD Life and Health Sciences Strategic Development 11 December, 2014 Safe Harbor
More informationCAREER TRACKS PHASE 1 UCSD Information Technology Family Function and Job Function Summary
UCSD Applications Programming Involved in the development of server / OS / desktop / mobile applications and services including researching, designing, developing specifications for designing, writing,
More informationebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.
Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. www.persistent.com 3 4 5 5 7 9 10 11 12 13 From the Vantage Point
More informationVisual Mining for Big Data
Visual Mining for Big Data Big Dive June 21st, 2013 Alessandro Piglia Kairos3D Where do we come from? Kairos3D comes from real-time 3D graphics Serious Games (virtual visits, training for industry operators,
More informationStorage Solutions for Bioinformatics
Storage Solutions for Bioinformatics Li Yan Director of FlexLab, Bioinformatics core technology laboratory liyan3@genomics.cn http://www.genomics.cn/flexlab/index.html Science and Technology Division,
More information1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India
1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India Call for Papers Colossal Data Analysis and Networking has emerged as a de facto
More informationKnowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success
Developing an MDM Strategy Key Components for Success WHITE PAPER Table of Contents Introduction... 2 Process Considerations... 3 Architecture Considerations... 5 Conclusion... 9 About Knowledgent... 10
More informationBig Data and the Data Lake. February 2015
Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act
More informationThe Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO
The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray
More informationGaining Ground in Translation Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health
Gaining Ground in Translation Francis S. Collins, M.D., Ph.D. Director, National Institutes of Health Some Key Challenges in Biomedical Research Providing robust methods and tools for translation Conducting
More informationPrimetime for KNIME:
Primetime for KNIME: Towards an Integrated Analysis and Visualization Environment for RNAi Screening Data F. Oliver Gathmann, Ph. D. Director IT, Cenix BioScience Presentation for: KNIME User Group Meeting
More informationIntegrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More informationWorldwide Collaborations in Molecular Profiling
Worldwide Collaborations in Molecular Profiling Lillian L. Siu, MD Director, Phase I Program and Cancer Genomics Program Princess Margaret Cancer Centre Lillian Siu, MD Contracted Research: Novartis, Pfizer,
More informationTwister4Azure: Data Analytics in the Cloud
Twister4Azure: Data Analytics in the Cloud Thilina Gunarathne, Xiaoming Gao and Judy Qiu, Indiana University Genome-scale data provided by next generation sequencing (NGS) has made it possible to identify
More informationThe Future of the Electronic Health Record. Gerry Higgins, Ph.D., Johns Hopkins
The Future of the Electronic Health Record Gerry Higgins, Ph.D., Johns Hopkins Topics to be covered Near Term Opportunities: Commercial, Usability, Unification of different applications. OMICS : The patient
More informationReport of the DTL focus meeting on Life Science Data Repositories
Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity
More informationIntro to Bioinformatics
Intro to Bioinformatics Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Sarah A Pendergrass, PhD Research Associate
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationGENETIC DATA ANALYSIS
GENETIC DATA ANALYSIS 1 Genetic Data: Future of Personalized Healthcare To achieve personalization in Healthcare, there is a need for more advancements in the field of Genomics. The human genome is made
More informationOpen Platform. Clinical Portal. Provider Mobile. Orion Health. Rhapsody Integration Engine. RAD LAB PAYER Rx
Open Platform Provider Mobile Clinical Portal Engage Portal Allegro PRIVACY EMR Connect Amadeus Big Data Engine Data Processing Pipeline PAYER CLINICAL CONSUMER CUSTOM Open APIs EMPI TERMINOLOGY SERVICES
More informationAchilles a platform for exploring and visualizing clinical data summary statistics
Biomedical Informatics discovery and impact Achilles a platform for exploring and visualizing clinical data summary statistics Mark Velez, MA Ning "Sunny" Shang, PhD Department of Biomedical Informatics,
More informationAli Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group
A Bioinformatics Research & Consulting Group Adding Omics Data to Electronic Health Record, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations
More information