Analyzing NGS data with clinical data: open source software for translational medicine

Size: px
Start display at page:

Download "Analyzing NGS data with clinical data: open source software for translational medicine"

Transcription

1 Analyzing NGS data with clinical data: open source software for translational medicine BASEL LIFE SCIENCE WEEK NGS FORUM SEPTEMBER 24, 2015 Kees van Bochove, CEO The Hyve

2 Agenda 1. Introduction 2. Open Source in Translational Medicine 3. cbioportal 4. TranSMART 5. ADAM & Apache Spark 2

3 1. INTRODUCTION 3

4 The Hyve Professional support for open source software for bioinformatics and translational research software, such as transmart, cbioportal, i2b2, Galaxy, ADAM and OHDSI Core values Share Reuse Specialize Office Locations Utrecht, Netherlands Cambridge, MA, United States Services Software development Data science services Consultancy Hosting / SLAs Mission Enable pre-competitive collaboration in life science R&D by leveraging open source software Fast-growing Started in people by now 4

5 Interdisciplinary team software engineers, data scientists, project managers & staff; expertise in bioinformatics, medical informatics, software engineering, biostatistics etc. 5

6 2. OPEN SOURCE IN TRANSLATIONAL MEDICINE 6

7

8 8

9 The Open Source Definition 1. Free Redistribution 2. Availability of Source Code 3. Allow Derived Works 4. Integrity of The Author's Source Code 5. No Discrimination Against Persons or Groups 6. No Discrimination Against Fields of Endeavor 7. Redistribution of License 8. License Must Not Be Specific to a Product 9. License Must Not Restrict Other Software 10. License Must Be Technology-Neutral

10 Open Source Source code openly accessible and reusable for everyone Enables pre-competitive collaboration: both academics and industry can use and enhance it; which grows a community Transparency: verification (scientific as well as IT security) can be done by anyone, no black box

11 The software engineering process in an open source community is not different from a closed commercial setting But the stakeholders, contributors, business models, engagement models etc. are!

12 Different Non-Functional Requirements for Software Bioinformatician in academics: create a novel solution for a problem which has publication value Basic Research: new frontiers Software should demonstrate working principle Bioinformatician / IT Services in pharma/clinic: mainly applied research: Software should be well tested, maintainable, extensible, scalable etc. Need for commercial support for open source software

13 Open Source in Translational Medicine Clinical - ecrf & apps: Datawarehousing: Data visualisation: Imaging: Scientific compute: Biobanking: Workflow / NGS: 13 Study design:

14 3. CBIOPORTAL See also: 14

15 15 cbioportal study portal

16 Colon cancer study in TraIT 16

17 Event calls Gene alteration events per sample Which genes are altered in each individual tumor sample? Data type Mutations Copy number changes Methylation mrna and/or DNA mrna expression changes Alteration event calls Non-synonymous somatic mutations Homozygous deletion or amplification Epigenetic silencing Gene fusions Over- or under-expression Alteration types and thresholds can be customized for each gene

18 18 Visualization of events across genes and data types

19 19 Review cancer genomics events in clinical context

20 20 GenePrint Visualisation (from cbioportal) in transmart

21 TM2CBIO In collaboration with Netherlands Cancer Institute ETL pipeline between transmart and cbioportal TranSMART used as data warehouse, and cbioportal as a study-based analytics mart for cancer studies Going from individual data points (e.g. mrna intensity levels) in transmart to alteration events in cbioportal 21

22 4. TRANSMART 22

23 TranSMART as a product Datawarehouse bringing together scientists from clinical sciences, preclinical research and discovery around the data Combination of internal datasets and documents with public datasets and knowledge Tailored to both biologists/clinicians and bioinformaticians Dual nature: in use for translational research in both pharma and hospitals/clinic 23

24 In early 2014, over 50 transmart implementations International Research Initiatives IMI etriks, EMIF CTMM TraIT Pharma & Biotech Sanofi, Millennium, Pfizer, JNJ, Roche Government Aligned Institutions FDA Non-Profits 1Mind4Research, Orion Bionetworks, Critical Path Institute Hospitals / Academics U Michigan, Harvard / Boston Children's Hospital, HEGP, Johns Hopkins, St. Jude Service Providers ConvergeHEALTH, thehyve, Rancho Biosciences, BTGS, Thomson Reuters, Saama Tech, Cognizant Start Organization Type Stage 2008 Johnson & Johnson Pharma Production 2008 Recombinant by Deloitte Services Multiple 2010 Sage Bionetworks Non Profit Production 2010 Thomson Reuters Services Support 2010 U-BIOPRED Consortium Production 2011 SAFE-T Consortium Pilot 2011 University of Michigan, Comprehensive Cancer Center Academic Production 2012 APHP-HEGP Paris France Academic Production 2012 BT Cure Consortium Pilot 2012 CTMM/TraIT Consortium Dev 2012 FDA Government Dev 2012 IMI/eTRIKS Consortium Dev 2012 Merck Pharma Pilot 2012 Millennium Pharmaceuticals Pharma Production 2012 One Mind for Research (1M4R) Non Profit Production 2012 Pfizer Pharma Production 2012 Roche Pharma Evaluation 2012 Sanofi-Aventis Pharma Dev 2012 St. Jude Non Profit Dev 2012 U Michigan, Computational Medicine & Bioinformatics Academic Multiple 2013 Agios Biotech Evaluation 2013 CARPEM Cancer personalized medicine Academic Dev 2013 Harvard University / Boston Children's Hospital Academic Autism Pilot 2013 Boehringer Ingelheim Pharma Pilot 2013 Bristol Myers Squibb Pharma Evaluation 2013 BT Global Services Services Pilot 2013 Accelerated Cure Project for MS Non Profit Dev 2014 Personalized medicine and colorectal cancers (France) Academic Dev 2014 PCORI PRRN Phelan-McDermid Syndrome Data Network Academic Dev

25 TranSMART Open Source History February 2012: J&J releases transmart as open source on GitHub under GPL v3 December 2012: CTMM TraIT project decides to use transmart as core infrastructure component January 2013: IMI etriks starts, uses transmart as core infrastructure component February 2013: kickoff of transmart Foundation, U. Michigan publishes PostgreSQL port March 2014: IMI EMIF kickoff, transmart is used as data integration component 25

26 Center for Translational Molecular Medicine (CTMM) Public-private consortium Dedicated to the development of Molecular Diagnostics and Molecular Imaging technologies Focusing on the translational aspects of molecular medicine. 120 partners universities, academic medical centers, medical technology enterprises and chemical and pharmaceutical companies. Budget 300 M 22 projects / research consortia TraIT is the Translational Research IT project supporting these projects with a joint IT infrastructure 26

27 TraIT Consortium Growing TraIT project team 27

28 TraIT data workflow Hospital (IT) HIS PACS LIS Samples (IT) BIMS P s e u d o n y m i z a t i o n data domains clinical data OpenClinica imaging data NBIA + AIM biobanking CBM-NL Translational Research (IT) integrated data transmart/i2b2 datawarehouse translational analytics workbench transmart/ cohort explorer R Public Data experimental data e.g. Galaxy, Chipster e.g. PhenotypeDB, Annai Systems Galaxy

29 Recombina nt / Deloitte CDISC Thomson Reuters Pfizer Astra Zeneca VUmc The Hyve 70 Sanofi Johnson & Johnson University of Michigan Philips University of Luxembourg Amsterdam, June 2013: transmart Workshop Attendees from 10 Pharma companies, 11 University Medical Centers and 12 IT companies 29

30 130 Ann Arbor, Michigan, October 2014: Annual Meeting 30

31 TranSMART wins all the prizes: Best Show Award, Best Practices Award, Best Poster Award Bio IT World, Boston, April

32 Contributors

33 The Hyve transmart 1.3 Contributions Improvements for handling GWAS data & cohort selection on SNP data Build a number of interactive advanced analytics workflows & correct statistical assumptions Imaging workflow: ETL for imaging metadata and results Prototype of a transmart 2.0 interface: new look & feel, user experience Under discussion: improved GUI for ETL? 33

34 TranSMART 2.0 User Interface - alpha

35 5. USING APACHE SPARK FOR NGS DATA ANALYSIS 35

36 NGS data storage & analysis Don't import BAM, Cram, VCF and BCF to a database! They are the databases! Indexed Compressed Highly specialized & optimized storage formats Whole ecosystem is build around this concept. All tools read and write these through a rich API HTS-JDK GATK MapReduce engine

37 Genome Analysis ToolKit (GATK) MapReduce framework for processing BAM and VCF files / databases Provides walkers that provide access pattern as a stream trough BAM and VCF files On top of these walkers there are analysis tools: Indel realignment Base Quality recalibration Unified Genotyper (=old variant caller) Haplotype Caller (= new variant caller)

38 ADAM Genomics processing engine &specialized file format built with: Apache Avro (uniform data format defintion) Apache Spark (memory-based cluster execution) Apache Parquet (Hadoop based columnar storage format) Resulting data can be accessed by Hadoop Map-Reduce, Spark, Shark, Impala, Pig, Hive etc. Support for conversion to and from BAM and VCF. MAF conversion and somatic variant calling unclear. Open source project driven mainly by UC Berkeley

39 ADAM Example setup in transmart

40 Translational Research Infrastructure User Interfaces R, Spotfire etc. Galaxy GUI TranSMART GUI TranSMART RESTful API Galaxy Sun Grid Engine (SGE cluster) Clinical API ctakes Clinical Data/i2b2 Mapping between patients in clinical db and samples in omics data Genome Analysis ToolKit (GATK) HTS - JDK BAM, CRAM, VCF, BCF ADAM/ Spark Transcriptome Analysis ToolKit (TATK) (R / Bioconductor) Transcriptome files / DB Isilon High Performance Storage Proteome Analysis ToolKit (PATK) MzML, MzIdent XNAT Hadoop HDFS / Apache Parquet (ADAM) Archiving Object Storage (e.g. Glacier) Imaging Data Repository Public Archives Sequence Read Archive (SRA) Download / ETL Gene Expression Omnibus (GEO) PRoteomics IDEntifications Database (PRIDE)

41

CASC Fall Meeting 2014

CASC Fall Meeting 2014 CASC Fall Meeting 2014 transmart and the emergent requirement for Policies regarding Open Science and Open Data to help Sustain in Translational Research Brian D. Athey, Ph.D. Michael A. Savageau Collegiate

More information

PROVENTA INTERNATIONAL. INNOVATION SPOTLIGHT SESSION 0pen Source Technologies for Precision Medicine

PROVENTA INTERNATIONAL. INNOVATION SPOTLIGHT SESSION 0pen Source Technologies for Precision Medicine INNOVATION SPOTLIGHT SESSION 0pen Source Technologies for Precision Medicine WEDNESDAY 3RD JUNE 2015 The Marriott Heathrow, London PROVENTA INTERNATIONAL INNOVATION INNOVATION SPOTLIGHT The focus of this

More information

The Evolution of Data Platforms in IMI. Anthony Rowe, Janssen R&D IT 08 April 2016 Med-e-Tel Luxembourg

The Evolution of Data Platforms in IMI. Anthony Rowe, Janssen R&D IT 08 April 2016 Med-e-Tel Luxembourg The Evolution of Data Platforms in IMI Anthony Rowe, Janssen R&D IT 08 April 2016 Med-e-Tel Luxembourg Three trends in technology today Ubiquity of the Internet Three trends in technology today Ubiquity

More information

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London

Building a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London Building a Collaborative Informatics Platform for Translational Research: An IMI Project Experience Prof. Yike Guo Department of Computing Imperial College London Living in the Era of BIG Big Data : Massive

More information

Enabling Technologies for Collaborative Research in Health. Ann Martin 28.10.2015 BigData2015 Munsbach, Luxembourg

Enabling Technologies for Collaborative Research in Health. Ann Martin 28.10.2015 BigData2015 Munsbach, Luxembourg Enabling Technologies for Collaborative Research in Health Ann Martin 28.10.2015 BigData2015 Munsbach, Luxembourg IMI Europe s partnership for health > 5 bn 2.5 bn Partnership 2008-2024 2.5 bn IMI key

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients

Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Brandy Bernard PhD Senior Research Scientist Institute for Systems Biology Seattle, WA Dr. Bernard s research

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Report of the DTL focus meeting on Life Science Data Repositories

Report of the DTL focus meeting on Life Science Data Repositories Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

The 100,000 genomes project

The 100,000 genomes project The 100,000 genomes project Tim Hubbard @timjph Genomics England King s College London, King s Health Partners Wellcome Trust Sanger Institute ClinGen / Decipher Washington DC, 26 th May 2015 The 100,000

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

NIH s Genomic Data Sharing Policy

NIH s Genomic Data Sharing Policy NIH s Genomic Data Sharing Policy 2 Benefits of Data Sharing Enables data generated from one study to be used to explore a wide range of additional research questions Increases statistical power and scientific

More information

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio

More information

Moffitt Cancer Center, M2Gen and ConvergeHEALTH Collaboration

Moffitt Cancer Center, M2Gen and ConvergeHEALTH Collaboration Moffitt Cancer Center, M2Gen and ConvergeHEALTH Collaboration Eric Padron, M.D., Section Head, Personalized Medicine and Genomics, Malignant Hematology, H. Lee Moffitt Cancer Center and Research Institute

More information

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects

Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015 19th of March 2015 MediSapiens Ltd Because data is not knowledge Bio-IT solutions for improving cancer patient care Sami Kilpinen, Ph.D Co-founder, CEO MediSapiens Ltd Copyright 2015 MediSapiens Ltd. All

More information

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students.

The University is comprised of seven colleges and offers 19. including more than 5000 graduate students. UNC CHARLOTTE A doctoral, research-intensive university, UNC Charlotte is the largest institution of higher education in the Charlotte region. The University is comprised of seven colleges and offers 19

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

In the largest and perhaps the most ambitious collaborative

In the largest and perhaps the most ambitious collaborative FEATURES THE BIRTH OF TRANSCELERATE BIOPHARMA, INC. Revolution in Clinical Research Partnerships by Dalvir Gill and Garry Neil In the largest and perhaps the most ambitious collaborative effort ever initiated

More information

Hacking Brain Disease for a Cure

Hacking Brain Disease for a Cure Hacking Brain Disease for a Cure Magali Haas, CEO & Founder #P4C2014 Innovator Presentation 2 Brain Disease is Personal The Reasons We Fail in CNS Major challenges hindering CNS drug development include:

More information

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Buzzwords Berlin - 2015 Big data analytics / machine

More information

Accelerating variant calling

Accelerating variant calling Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo

More information

Globus Genomics Tutorial GlobusWorld 2014

Globus Genomics Tutorial GlobusWorld 2014 Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

How Companies are! Using Spark

How Companies are! Using Spark How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

TRANSLATIONAL BIOINFORMATICS 101

TRANSLATIONAL BIOINFORMATICS 101 TRANSLATIONAL BIOINFORMATICS 101 JESSICA D. TENENBAUM Department of Bioinformatics and Biostatistics, Duke University Durham, NC 27715 USA Jessie.Tenenbaum@duke.edu SUBHA MADHAVAN Innovation Center for

More information

Worldwide Collaborations in Molecular Profiling

Worldwide Collaborations in Molecular Profiling Worldwide Collaborations in Molecular Profiling Lillian L. Siu, MD Director, Phase I Program and Cancer Genomics Program Princess Margaret Cancer Centre Lillian Siu, MD Contracted Research: Novartis, Pfizer,

More information

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Computational Requirements

Computational Requirements Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density

More information

Big Data An Opportunity or a Distraction? Signal or Noise?

Big Data An Opportunity or a Distraction? Signal or Noise? Big Data An Opportunity or a Distraction? Signal or Noise? Maya R. Said, Sc.D. SVP & Global Head, Oncology Policy & Market Access, Novartis 3rd International Systems Biomedicine Symposium Luxembourg, 28

More information

Transla6ng from Clinical Care to Research: Integra6ng i2b2 and OpenClinica

Transla6ng from Clinical Care to Research: Integra6ng i2b2 and OpenClinica Transla6ng from Clinical Care to : Integra6ng i2b2 and OpenClinica Aaron Abend Managing Director, Recombinant Data Corp May 13, 2011 Copyright 2011 Recombinant Data Corp. All rights reserved. 1 About Recombinant

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Processing NGS Data with Hadoop-BAM and SeqPig

Processing NGS Data with Hadoop-BAM and SeqPig Processing NGS Data with Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3

More information

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome

More information

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

<Insert Picture Here> The Evolution Of Clinical Data Warehousing The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge

More information

BIOINFORMATICS Supporting competencies for the pharma industry

BIOINFORMATICS Supporting competencies for the pharma industry BIOINFORMATICS Supporting competencies for the pharma industry ABOUT QFAB QFAB is a bioinformatics service provider based in Brisbane, Australia operating nationwide and internationally. QFAB was established

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

ITG Software Engineering

ITG Software Engineering Introduction to Cloudera Course ID: Page 1 Last Updated 12/15/2014 Introduction to Cloudera Course : This 5 day course introduces the student to the Hadoop architecture, file system, and the Hadoop Ecosystem.

More information

Cloud-Based Big Data Analytics in Bioinformatics

Cloud-Based Big Data Analytics in Bioinformatics Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large

More information

More Data in Less Time

More Data in Less Time More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational

More information

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer

How To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer Applying Big Data approaches to Competitive Intelligence challenges THOMSON REUTERS IP & SCIENCE PHARMA CI EUROPE CONFERENCE & EXHIBITION TIM MILLER 19 FEBRUARY 2014 BIG DATA, NOT JUST ABOUT VOLUMES Patient

More information

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013 Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache

More information

Why Spark on Hadoop Matters

Why Spark on Hadoop Matters Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13

More information

Netherlands escience Center

Netherlands escience Center Netherlands escience Center ICT Synergy Hub, Amsterdam Research & Innovation in the Big Data Era CWI in Bedrijf Centrum Wiskunde & Informatica Op 5 oktober 2012 Prof. dr. Jacob de Vlieg ¹ ² 1. CEO & Scientific

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes

More information

i2b2 Clinical Research Chart

i2b2 Clinical Research Chart i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser

More information

An Introduction to Genomics and SAS Scientific Discovery Solutions

An Introduction to Genomics and SAS Scientific Discovery Solutions An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!

More information

Innovation in the LIS: Implications for Design, Procurement and Management

Innovation in the LIS: Implications for Design, Procurement and Management Innovation in the LIS: Implications for Design, Procurement and Management Ulysses J. Balis, M.D. Director, Division of Pathology Informatics & Director, Pathology Informatics Fellowship Program Department

More information

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences

Personalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity

More information

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016 Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible

More information

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR 1 Agenda Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback 2 A World of Connected Devices Need a new data management architecture for Internet of Things 21% the % of

More information

1) SCOPE OF THE PROGRAM

1) SCOPE OF THE PROGRAM CALL FOR PROJECTS 2015/16 CANADA/GERMANY JOINT PROGRAM DEADLINE: 15 TH JANUARY 2016 1) SCOPE OF THE PROGRAM The main goal of the Canada/Germany joint program is to strengthen research in Canada and Germany

More information

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials

TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials Pharmaceutical leader deploys TIBCO Spotfire enterprise analytics platform across its drug discovery organization

More information

Big data in cancer research : DNA sequencing and personalised medicine

Big data in cancer research : DNA sequencing and personalised medicine Big in cancer research : DNA sequencing and personalised medicine Philippe Hupé Conférence BIGDATA 04/04/2013 1 - Titre de la présentation - nom du département émetteur et/ ou rédacteur - 00/00/2005 Deciphering

More information

Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics

Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics David A Dworaczyk, PhD Life and Health Sciences Strategic Development 11 December, 2014 Safe Harbor

More information

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:

More information

Introduction to NGS data analysis

Introduction to NGS data analysis Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High

More information

3 rd Symposium Will Big Data and Bigger Cuts Cripple Bioinformatics? Thursday 21 st March 2013

3 rd Symposium Will Big Data and Bigger Cuts Cripple Bioinformatics? Thursday 21 st March 2013 symposium 2013 3 rd Symposium Will Big Data and Bigger Cuts Cripple Bioinformatics? Thursday 21 st March 2013 Delegate booklet www.eaglegenomics.com Will Big Data and Bigger Cuts Cripple Bioinformatics?

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All

More information

In 2014, the Research Data group @ Purdue University

In 2014, the Research Data group @ Purdue University EDITOR S SUMMARY At the 2015 ASIS&T Research Data Access and Preservation (RDAP) Summit, panelists from Research Data @ Purdue University Libraries discussed the organizational structure intended to promote

More information

Join our scientific talent community

Join our scientific talent community Join our scientific talent community There has never been a better time to be a part of Janssen Research & Development. We are at the forefront of healthcare leading, evolving and transforming it into

More information

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015 Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream

More information

Towards Integrating the Detection of Genetic Variants into an In-Memory Database

Towards Integrating the Detection of Genetic Variants into an In-Memory Database Towards Integrating the Detection of Genetic Variants into an 2nd International Workshop on Big Data in Bioinformatics and Healthcare Oct 27, 2014 Motivation Genome Data Analysis Process DNA Sample Base

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime

SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime Stephan Schindewolf, SAP SE, July 13, 2015 Facts per Decision Need Decision

More information

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis

More information

CSE-E5430 Scalable Cloud Computing. Lecture 4

CSE-E5430 Scalable Cloud Computing. Lecture 4 Lecture 4 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 5.10-2015 1/23 Hadoop - Linux of Big Data Hadoop = Open Source Distributed Operating System

More information

Hadoop-BAM and SeqPig

Hadoop-BAM and SeqPig Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer

More information

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must

More information

Big Data and Data Science. The globally recognised training program

Big Data and Data Science. The globally recognised training program Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative

More information

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies

Semantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative

More information

Hortonworks Architecting the Future of Big Data

Hortonworks Architecting the Future of Big Data Hortonworks Architecting the Future of Big Data Eric Baldeschwieler CEO twitter: @jeric14 (@hortonworks) Formerly VP Hadoop Engineering @Yahoo! 8 Years at Yahoo! Hortonworks Inc. 2011 June 29, 2011 About

More information

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),

More information

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives

An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives Chalapathy Neti, Ph.D. Associate Director, Healthcare Transformation, Shahram Ebadollahi, Ph.D. Research Staff Memeber IBM Research,

More information

HADOOP IN THE LIFE SCIENCES:

HADOOP IN THE LIFE SCIENCES: White Paper HADOOP IN THE LIFE SCIENCES: An Introduction Abstract This introductory white paper reviews the Apache Hadoop TM technology, its components MapReduce and Hadoop Distributed File System (HDFS)

More information

DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING

DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING Supreet Oberoi VP Field Engineering, Concurrent Inc GET TO KNOW CONCURRENT Leader in Application Infrastructure

More information

Biotechnology and Life Science Marketing Services Mailing List and Data Card Order Form

Biotechnology and Life Science Marketing Services Mailing List and Data Card Order Form C H I Cambridge Healthtech Institute s Biotechnology and Life Science Marketing Services Mailing List and Data Card Order Form Over 800,000 names segmented by scientific interest Featuring U.S and International

More information

School of Nursing. Presented by Yvette Conley, PhD

School of Nursing. Presented by Yvette Conley, PhD Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression

More information

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction

Work Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all

More information

Cisco IT Hadoop Journey

Cisco IT Hadoop Journey Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases

More information

Hadoopizer : a cloud environment for bioinformatics data analysis

Hadoopizer : a cloud environment for bioinformatics data analysis Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,

More information

Analytics on Spark & Shark @Yahoo

Analytics on Spark & Shark @Yahoo Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment

More information

UTILIZING CDISC STANDARDS TO DRIVE EFFICIENCIES WITH OPENCLINICA Mark Wheeldon CEO, Formedix Boston June 21, 2013

UTILIZING CDISC STANDARDS TO DRIVE EFFICIENCIES WITH OPENCLINICA Mark Wheeldon CEO, Formedix Boston June 21, 2013 UTILIZING CDISC STANDARDS TO DRIVE EFFICIENCIES WITH OPENCLINICA Mark Wheeldon CEO, Formedix Boston June 21, 2013 AGENDA Introduction Real World Uses : Saving Time & Money. Your Clinical Trials Automated.

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

Balancing Big Data for Security, Collaboration and Performance

Balancing Big Data for Security, Collaboration and Performance Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World

More information

dixa a data infrastructure for chemical safety Jos Kleinjans Dept of Toxicogenomics Maastricht University

dixa a data infrastructure for chemical safety Jos Kleinjans Dept of Toxicogenomics Maastricht University dixa a data infrastructure for chemical safety Jos Kleinjans Dept of Toxicogenomics Maastricht University Current protocol for chemical safety testing Short Term Tests for Genetic Toxicity Bacterial Reverse

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information