Analyzing NGS data with clinical data: open source software for translational medicine
|
|
- Cory Martin
- 8 years ago
- Views:
Transcription
1 Analyzing NGS data with clinical data: open source software for translational medicine BASEL LIFE SCIENCE WEEK NGS FORUM SEPTEMBER 24, 2015 Kees van Bochove, CEO The Hyve
2 Agenda 1. Introduction 2. Open Source in Translational Medicine 3. cbioportal 4. TranSMART 5. ADAM & Apache Spark 2
3 1. INTRODUCTION 3
4 The Hyve Professional support for open source software for bioinformatics and translational research software, such as transmart, cbioportal, i2b2, Galaxy, ADAM and OHDSI Core values Share Reuse Specialize Office Locations Utrecht, Netherlands Cambridge, MA, United States Services Software development Data science services Consultancy Hosting / SLAs Mission Enable pre-competitive collaboration in life science R&D by leveraging open source software Fast-growing Started in people by now 4
5 Interdisciplinary team software engineers, data scientists, project managers & staff; expertise in bioinformatics, medical informatics, software engineering, biostatistics etc. 5
6 2. OPEN SOURCE IN TRANSLATIONAL MEDICINE 6
7
8 8
9 The Open Source Definition 1. Free Redistribution 2. Availability of Source Code 3. Allow Derived Works 4. Integrity of The Author's Source Code 5. No Discrimination Against Persons or Groups 6. No Discrimination Against Fields of Endeavor 7. Redistribution of License 8. License Must Not Be Specific to a Product 9. License Must Not Restrict Other Software 10. License Must Be Technology-Neutral
10 Open Source Source code openly accessible and reusable for everyone Enables pre-competitive collaboration: both academics and industry can use and enhance it; which grows a community Transparency: verification (scientific as well as IT security) can be done by anyone, no black box
11 The software engineering process in an open source community is not different from a closed commercial setting But the stakeholders, contributors, business models, engagement models etc. are!
12 Different Non-Functional Requirements for Software Bioinformatician in academics: create a novel solution for a problem which has publication value Basic Research: new frontiers Software should demonstrate working principle Bioinformatician / IT Services in pharma/clinic: mainly applied research: Software should be well tested, maintainable, extensible, scalable etc. Need for commercial support for open source software
13 Open Source in Translational Medicine Clinical - ecrf & apps: Datawarehousing: Data visualisation: Imaging: Scientific compute: Biobanking: Workflow / NGS: 13 Study design:
14 3. CBIOPORTAL See also: 14
15 15 cbioportal study portal
16 Colon cancer study in TraIT 16
17 Event calls Gene alteration events per sample Which genes are altered in each individual tumor sample? Data type Mutations Copy number changes Methylation mrna and/or DNA mrna expression changes Alteration event calls Non-synonymous somatic mutations Homozygous deletion or amplification Epigenetic silencing Gene fusions Over- or under-expression Alteration types and thresholds can be customized for each gene
18 18 Visualization of events across genes and data types
19 19 Review cancer genomics events in clinical context
20 20 GenePrint Visualisation (from cbioportal) in transmart
21 TM2CBIO In collaboration with Netherlands Cancer Institute ETL pipeline between transmart and cbioportal TranSMART used as data warehouse, and cbioportal as a study-based analytics mart for cancer studies Going from individual data points (e.g. mrna intensity levels) in transmart to alteration events in cbioportal 21
22 4. TRANSMART 22
23 TranSMART as a product Datawarehouse bringing together scientists from clinical sciences, preclinical research and discovery around the data Combination of internal datasets and documents with public datasets and knowledge Tailored to both biologists/clinicians and bioinformaticians Dual nature: in use for translational research in both pharma and hospitals/clinic 23
24 In early 2014, over 50 transmart implementations International Research Initiatives IMI etriks, EMIF CTMM TraIT Pharma & Biotech Sanofi, Millennium, Pfizer, JNJ, Roche Government Aligned Institutions FDA Non-Profits 1Mind4Research, Orion Bionetworks, Critical Path Institute Hospitals / Academics U Michigan, Harvard / Boston Children's Hospital, HEGP, Johns Hopkins, St. Jude Service Providers ConvergeHEALTH, thehyve, Rancho Biosciences, BTGS, Thomson Reuters, Saama Tech, Cognizant Start Organization Type Stage 2008 Johnson & Johnson Pharma Production 2008 Recombinant by Deloitte Services Multiple 2010 Sage Bionetworks Non Profit Production 2010 Thomson Reuters Services Support 2010 U-BIOPRED Consortium Production 2011 SAFE-T Consortium Pilot 2011 University of Michigan, Comprehensive Cancer Center Academic Production 2012 APHP-HEGP Paris France Academic Production 2012 BT Cure Consortium Pilot 2012 CTMM/TraIT Consortium Dev 2012 FDA Government Dev 2012 IMI/eTRIKS Consortium Dev 2012 Merck Pharma Pilot 2012 Millennium Pharmaceuticals Pharma Production 2012 One Mind for Research (1M4R) Non Profit Production 2012 Pfizer Pharma Production 2012 Roche Pharma Evaluation 2012 Sanofi-Aventis Pharma Dev 2012 St. Jude Non Profit Dev 2012 U Michigan, Computational Medicine & Bioinformatics Academic Multiple 2013 Agios Biotech Evaluation 2013 CARPEM Cancer personalized medicine Academic Dev 2013 Harvard University / Boston Children's Hospital Academic Autism Pilot 2013 Boehringer Ingelheim Pharma Pilot 2013 Bristol Myers Squibb Pharma Evaluation 2013 BT Global Services Services Pilot 2013 Accelerated Cure Project for MS Non Profit Dev 2014 Personalized medicine and colorectal cancers (France) Academic Dev 2014 PCORI PRRN Phelan-McDermid Syndrome Data Network Academic Dev
25 TranSMART Open Source History February 2012: J&J releases transmart as open source on GitHub under GPL v3 December 2012: CTMM TraIT project decides to use transmart as core infrastructure component January 2013: IMI etriks starts, uses transmart as core infrastructure component February 2013: kickoff of transmart Foundation, U. Michigan publishes PostgreSQL port March 2014: IMI EMIF kickoff, transmart is used as data integration component 25
26 Center for Translational Molecular Medicine (CTMM) Public-private consortium Dedicated to the development of Molecular Diagnostics and Molecular Imaging technologies Focusing on the translational aspects of molecular medicine. 120 partners universities, academic medical centers, medical technology enterprises and chemical and pharmaceutical companies. Budget 300 M 22 projects / research consortia TraIT is the Translational Research IT project supporting these projects with a joint IT infrastructure 26
27 TraIT Consortium Growing TraIT project team 27
28 TraIT data workflow Hospital (IT) HIS PACS LIS Samples (IT) BIMS P s e u d o n y m i z a t i o n data domains clinical data OpenClinica imaging data NBIA + AIM biobanking CBM-NL Translational Research (IT) integrated data transmart/i2b2 datawarehouse translational analytics workbench transmart/ cohort explorer R Public Data experimental data e.g. Galaxy, Chipster e.g. PhenotypeDB, Annai Systems Galaxy
29 Recombina nt / Deloitte CDISC Thomson Reuters Pfizer Astra Zeneca VUmc The Hyve 70 Sanofi Johnson & Johnson University of Michigan Philips University of Luxembourg Amsterdam, June 2013: transmart Workshop Attendees from 10 Pharma companies, 11 University Medical Centers and 12 IT companies 29
30 130 Ann Arbor, Michigan, October 2014: Annual Meeting 30
31 TranSMART wins all the prizes: Best Show Award, Best Practices Award, Best Poster Award Bio IT World, Boston, April
32 Contributors
33 The Hyve transmart 1.3 Contributions Improvements for handling GWAS data & cohort selection on SNP data Build a number of interactive advanced analytics workflows & correct statistical assumptions Imaging workflow: ETL for imaging metadata and results Prototype of a transmart 2.0 interface: new look & feel, user experience Under discussion: improved GUI for ETL? 33
34 TranSMART 2.0 User Interface - alpha
35 5. USING APACHE SPARK FOR NGS DATA ANALYSIS 35
36 NGS data storage & analysis Don't import BAM, Cram, VCF and BCF to a database! They are the databases! Indexed Compressed Highly specialized & optimized storage formats Whole ecosystem is build around this concept. All tools read and write these through a rich API HTS-JDK GATK MapReduce engine
37 Genome Analysis ToolKit (GATK) MapReduce framework for processing BAM and VCF files / databases Provides walkers that provide access pattern as a stream trough BAM and VCF files On top of these walkers there are analysis tools: Indel realignment Base Quality recalibration Unified Genotyper (=old variant caller) Haplotype Caller (= new variant caller)
38 ADAM Genomics processing engine &specialized file format built with: Apache Avro (uniform data format defintion) Apache Spark (memory-based cluster execution) Apache Parquet (Hadoop based columnar storage format) Resulting data can be accessed by Hadoop Map-Reduce, Spark, Shark, Impala, Pig, Hive etc. Support for conversion to and from BAM and VCF. MAF conversion and somatic variant calling unclear. Open source project driven mainly by UC Berkeley
39 ADAM Example setup in transmart
40 Translational Research Infrastructure User Interfaces R, Spotfire etc. Galaxy GUI TranSMART GUI TranSMART RESTful API Galaxy Sun Grid Engine (SGE cluster) Clinical API ctakes Clinical Data/i2b2 Mapping between patients in clinical db and samples in omics data Genome Analysis ToolKit (GATK) HTS - JDK BAM, CRAM, VCF, BCF ADAM/ Spark Transcriptome Analysis ToolKit (TATK) (R / Bioconductor) Transcriptome files / DB Isilon High Performance Storage Proteome Analysis ToolKit (PATK) MzML, MzIdent XNAT Hadoop HDFS / Apache Parquet (ADAM) Archiving Object Storage (e.g. Glacier) Imaging Data Repository Public Archives Sequence Read Archive (SRA) Download / ETL Gene Expression Omnibus (GEO) PRoteomics IDEntifications Database (PRIDE)
41
CASC Fall Meeting 2014
CASC Fall Meeting 2014 transmart and the emergent requirement for Policies regarding Open Science and Open Data to help Sustain in Translational Research Brian D. Athey, Ph.D. Michael A. Savageau Collegiate
More informationPROVENTA INTERNATIONAL. INNOVATION SPOTLIGHT SESSION 0pen Source Technologies for Precision Medicine
INNOVATION SPOTLIGHT SESSION 0pen Source Technologies for Precision Medicine WEDNESDAY 3RD JUNE 2015 The Marriott Heathrow, London PROVENTA INTERNATIONAL INNOVATION INNOVATION SPOTLIGHT The focus of this
More informationThe Evolution of Data Platforms in IMI. Anthony Rowe, Janssen R&D IT 08 April 2016 Med-e-Tel Luxembourg
The Evolution of Data Platforms in IMI Anthony Rowe, Janssen R&D IT 08 April 2016 Med-e-Tel Luxembourg Three trends in technology today Ubiquity of the Internet Three trends in technology today Ubiquity
More informationBuilding a Collaborative Informatics Platform for Translational Research: Prof. Yike Guo Department of Computing Imperial College London
Building a Collaborative Informatics Platform for Translational Research: An IMI Project Experience Prof. Yike Guo Department of Computing Imperial College London Living in the Era of BIG Big Data : Massive
More informationEnabling Technologies for Collaborative Research in Health. Ann Martin 28.10.2015 BigData2015 Munsbach, Luxembourg
Enabling Technologies for Collaborative Research in Health Ann Martin 28.10.2015 BigData2015 Munsbach, Luxembourg IMI Europe s partnership for health > 5 bn 2.5 bn Partnership 2008-2024 2.5 bn IMI key
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationClinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients
Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Brandy Bernard PhD Senior Research Scientist Institute for Systems Biology Seattle, WA Dr. Bernard s research
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationReport of the DTL focus meeting on Life Science Data Repositories
Report of the DTL focus meeting on Life Science Data Repositories Goal The goal of the meeting was to inform and discuss research data repositories for life sciences. The big data era adds to the complexity
More informationBig Data and the Data Lake. February 2015
Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act
More informationThe 100,000 genomes project
The 100,000 genomes project Tim Hubbard @timjph Genomics England King s College London, King s Health Partners Wellcome Trust Sanger Institute ClinGen / Decipher Washington DC, 26 th May 2015 The 100,000
More informationENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and
More informationNIH s Genomic Data Sharing Policy
NIH s Genomic Data Sharing Policy 2 Benefits of Data Sharing Enables data generated from one study to be used to explore a wide range of additional research questions Increases statistical power and scientific
More informationOpenCB a next generation big data analytics and visualisation platform for the Omics revolution
OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio
More informationMoffitt Cancer Center, M2Gen and ConvergeHEALTH Collaboration
Moffitt Cancer Center, M2Gen and ConvergeHEALTH Collaboration Eric Padron, M.D., Section Head, Personalized Medicine and Genomics, Malignant Hematology, H. Lee Moffitt Cancer Center and Research Institute
More informationWorkshop on Establishing a Central Resource of Data from Genome Sequencing Projects
Report on the Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Background and Goals of the Workshop June 5 6, 2012 The use of genome sequencing in human research is growing
More informationNew solutions for Big Data Analysis and Visualization
New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationMediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015
19th of March 2015 MediSapiens Ltd Because data is not knowledge Bio-IT solutions for improving cancer patient care Sami Kilpinen, Ph.D Co-founder, CEO MediSapiens Ltd Copyright 2015 MediSapiens Ltd. All
More informationThe University is comprised of seven colleges and offers 19. including more than 5000 graduate students.
UNC CHARLOTTE A doctoral, research-intensive university, UNC Charlotte is the largest institution of higher education in the Charlotte region. The University is comprised of seven colleges and offers 19
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationIn the largest and perhaps the most ambitious collaborative
FEATURES THE BIRTH OF TRANSCELERATE BIOPHARMA, INC. Revolution in Clinical Research Partnerships by Dalvir Gill and Garry Neil In the largest and perhaps the most ambitious collaborative effort ever initiated
More informationHacking Brain Disease for a Cure
Hacking Brain Disease for a Cure Magali Haas, CEO & Founder #P4C2014 Innovator Presentation 2 Brain Disease is Personal The Reasons We Fail in CNS Major challenges hindering CNS drug development include:
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Buzzwords Berlin - 2015 Big data analytics / machine
More informationAccelerating variant calling
Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo
More informationGlobus Genomics Tutorial GlobusWorld 2014
Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationBuilding Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT
Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationAre You Big Data Ready?
ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain
More informationTRANSLATIONAL BIOINFORMATICS 101
TRANSLATIONAL BIOINFORMATICS 101 JESSICA D. TENENBAUM Department of Bioinformatics and Biostatistics, Duke University Durham, NC 27715 USA Jessie.Tenenbaum@duke.edu SUBHA MADHAVAN Innovation Center for
More informationWorldwide Collaborations in Molecular Profiling
Worldwide Collaborations in Molecular Profiling Lillian L. Siu, MD Director, Phase I Program and Cancer Genomics Program Princess Margaret Cancer Centre Lillian Siu, MD Contracted Research: Novartis, Pfizer,
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationComputational Requirements
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density
More informationBig Data An Opportunity or a Distraction? Signal or Noise?
Big Data An Opportunity or a Distraction? Signal or Noise? Maya R. Said, Sc.D. SVP & Global Head, Oncology Policy & Market Access, Novartis 3rd International Systems Biomedicine Symposium Luxembourg, 28
More informationTransla6ng from Clinical Care to Research: Integra6ng i2b2 and OpenClinica
Transla6ng from Clinical Care to : Integra6ng i2b2 and OpenClinica Aaron Abend Managing Director, Recombinant Data Corp May 13, 2011 Copyright 2011 Recombinant Data Corp. All rights reserved. 1 About Recombinant
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationProcessing NGS Data with Hadoop-BAM and SeqPig
Processing NGS Data with Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3
More informationEuropean Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute
European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome
More information<Insert Picture Here> The Evolution Of Clinical Data Warehousing
The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge
More informationBIOINFORMATICS Supporting competencies for the pharma industry
BIOINFORMATICS Supporting competencies for the pharma industry ABOUT QFAB QFAB is a bioinformatics service provider based in Brisbane, Australia operating nationwide and internationally. QFAB was established
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
More informationNIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
More informationITG Software Engineering
Introduction to Cloudera Course ID: Page 1 Last Updated 12/15/2014 Introduction to Cloudera Course : This 5 day course introduces the student to the Hadoop architecture, file system, and the Hadoop Ecosystem.
More informationCloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
More informationMore Data in Less Time
More Data in Less Time Leveraging Cloudera CDH as an Operational Data Store Daniel Tydecks, Systems Engineering DACH & CE Goals of an Operational Data Store Load Data Sources Traditional Architecture Operational
More informationHow To Use Data Analysis To Get More Information From A Computer Or Cell Phone To A Computer
Applying Big Data approaches to Competitive Intelligence challenges THOMSON REUTERS IP & SCIENCE PHARMA CI EUROPE CONFERENCE & EXHIBITION TIM MILLER 19 FEBRUARY 2014 BIG DATA, NOT JUST ABOUT VOLUMES Patient
More informationIntel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
More informationWhy Spark on Hadoop Matters
Why Spark on Hadoop Matters MC Srivas, CTO and Founder, MapR Technologies Apache Spark Summit - July 1, 2014 1 MapR Overview Top Ranked Exponential Growth 500+ Customers Cloud Leaders 3X bookings Q1 13
More informationNetherlands escience Center
Netherlands escience Center ICT Synergy Hub, Amsterdam Research & Innovation in the Big Data Era CWI in Bedrijf Centrum Wiskunde & Informatica Op 5 oktober 2012 Prof. dr. Jacob de Vlieg ¹ ² 1. CEO & Scientific
More informationData Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
More informationThree data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk
Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes
More informationi2b2 Clinical Research Chart
i2b2 Clinical Research Chart Shawn Murphy MD, Ph.D. Griffin Weber MD, Ph.D. Michael Mendis Vivian Gainer MS Lori Phillips MS Rajesh Kuttan Wensong Pan MS Henry Chueh MD Susanne Churchill Ph.D. John Glaser
More informationAn Introduction to Genomics and SAS Scientific Discovery Solutions
An Introduction to Genomics and SAS Scientific Discovery Solutions Dr Karen M Miller Product Manager Bioinformatics SAS EMEA 16.06.03 Copyright 2003, SAS Institute Inc. All rights reserved. 1 Overview!
More informationInnovation in the LIS: Implications for Design, Procurement and Management
Innovation in the LIS: Implications for Design, Procurement and Management Ulysses J. Balis, M.D. Director, Division of Pathology Informatics & Director, Pathology Informatics Fellowship Program Department
More informationPersonalized Medicine: Humanity s Ultimate Big Data Challenge. Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences
Personalized Medicine: Humanity s Ultimate Big Data Challenge Rob Fassett, MD Chief Medical Informatics Officer Oracle Health Sciences 2012 Oracle Corporation Proprietary and Confidential 2 3 Humanity
More informationBig Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
More informationAgenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR
1 Agenda Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback 2 A World of Connected Devices Need a new data management architecture for Internet of Things 21% the % of
More information1) SCOPE OF THE PROGRAM
CALL FOR PROJECTS 2015/16 CANADA/GERMANY JOINT PROGRAM DEADLINE: 15 TH JANUARY 2016 1) SCOPE OF THE PROGRAM The main goal of the Canada/Germany joint program is to strengthen research in Canada and Germany
More informationTIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials
TIBCO Spotfire Helps Organon Bridge the Data Gap Between Basic Research and Clinical Trials Pharmaceutical leader deploys TIBCO Spotfire enterprise analytics platform across its drug discovery organization
More informationBig data in cancer research : DNA sequencing and personalised medicine
Big in cancer research : DNA sequencing and personalised medicine Philippe Hupé Conférence BIGDATA 04/04/2013 1 - Titre de la présentation - nom du département émetteur et/ ou rédacteur - 00/00/2005 Deciphering
More informationData-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics
Data-driven Medicine in the Age of Genomics Overcoming the Challenge With Advanced Molecular Analytics David A Dworaczyk, PhD Life and Health Sciences Strategic Development 11 December, 2014 Safe Harbor
More informationDigital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE
Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:
More informationIntroduction to NGS data analysis
Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High
More information3 rd Symposium Will Big Data and Bigger Cuts Cripple Bioinformatics? Thursday 21 st March 2013
symposium 2013 3 rd Symposium Will Big Data and Bigger Cuts Cripple Bioinformatics? Thursday 21 st March 2013 Delegate booklet www.eaglegenomics.com Will Big Data and Bigger Cuts Cripple Bioinformatics?
More informationEuro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences
Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of
More informationOracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
More informationIn 2014, the Research Data group @ Purdue University
EDITOR S SUMMARY At the 2015 ASIS&T Research Data Access and Preservation (RDAP) Summit, panelists from Research Data @ Purdue University Libraries discussed the organizational structure intended to promote
More informationJoin our scientific talent community
Join our scientific talent community There has never been a better time to be a part of Janssen Research & Development. We are at the forefront of healthcare leading, evolving and transforming it into
More informationData Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015
Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream
More informationTowards Integrating the Detection of Genetic Variants into an In-Memory Database
Towards Integrating the Detection of Genetic Variants into an 2nd International Workshop on Big Data in Bioinformatics and Healthcare Oct 27, 2014 Motivation Genome Data Analysis Process DNA Sample Base
More informationGAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION
GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.
More informationSAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime
SAP Healthcare Analytics Solutions Provide physicians and researchers access to patient data from various systems in realtime Stephan Schindewolf, SAP SE, July 13, 2015 Facts per Decision Need Decision
More informationLarge-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri
Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis
More informationCSE-E5430 Scalable Cloud Computing. Lecture 4
Lecture 4 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 5.10-2015 1/23 Hadoop - Linux of Big Data Hadoop = Open Source Distributed Operating System
More informationHadoop-BAM and SeqPig
Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer
More informationBig Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management
Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must
More informationBig Data and Data Science. The globally recognised training program
Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative
More informationSemantic Data Management. Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies
Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies 1 Enterprise Information Challenge Source: Oracle customer 2 Vision of Semantically Linked Data The Network of Collaborative
More informationHortonworks Architecting the Future of Big Data
Hortonworks Architecting the Future of Big Data Eric Baldeschwieler CEO twitter: @jeric14 (@hortonworks) Formerly VP Hadoop Engineering @Yahoo! 8 Years at Yahoo! Hortonworks Inc. 2011 June 29, 2011 About
More informationPreparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo
Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),
More informationAn EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives
An EVIDENCE-ENHANCED HEALTHCARE ECOSYSTEM for Cancer: I/T perspectives Chalapathy Neti, Ph.D. Associate Director, Healthcare Transformation, Shahram Ebadollahi, Ph.D. Research Staff Memeber IBM Research,
More informationHADOOP IN THE LIFE SCIENCES:
White Paper HADOOP IN THE LIFE SCIENCES: An Introduction Abstract This introductory white paper reviews the Apache Hadoop TM technology, its components MapReduce and Hadoop Distributed File System (HDFS)
More informationDRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING
DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING Supreet Oberoi VP Field Engineering, Concurrent Inc GET TO KNOW CONCURRENT Leader in Application Infrastructure
More informationBiotechnology and Life Science Marketing Services Mailing List and Data Card Order Form
C H I Cambridge Healthtech Institute s Biotechnology and Life Science Marketing Services Mailing List and Data Card Order Form Over 800,000 names segmented by scientific interest Featuring U.S and International
More informationSchool of Nursing. Presented by Yvette Conley, PhD
Presented by Yvette Conley, PhD What we will cover during this webcast: Briefly discuss the approaches introduced in the paper: Genome Sequencing Genome Wide Association Studies Epigenomics Gene Expression
More informationWork Package 13.5: Authors: Paul Flicek and Ilkka Lappalainen. 1. Introduction
Work Package 13.5: Report summarising the technical feasibility of the European Genotype Archive to collect, store, and use genotype data stored in European biobanks in a manner that complies with all
More informationCisco IT Hadoop Journey
Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases
More informationHadoopizer : a cloud environment for bioinformatics data analysis
Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,
More informationAnalytics on Spark & Shark @Yahoo
Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment
More informationUTILIZING CDISC STANDARDS TO DRIVE EFFICIENCIES WITH OPENCLINICA Mark Wheeldon CEO, Formedix Boston June 21, 2013
UTILIZING CDISC STANDARDS TO DRIVE EFFICIENCIES WITH OPENCLINICA Mark Wheeldon CEO, Formedix Boston June 21, 2013 AGENDA Introduction Real World Uses : Saving Time & Money. Your Clinical Trials Automated.
More informationFocusing on results not data comprehensive data analysis for targeted next generation sequencing
Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes
More informationA leader in the development and application of information technology to prevent and treat disease.
A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today
More informationBalancing Big Data for Security, Collaboration and Performance
Balancing Big Data for Security, Collaboration and Performance Sai Balu Lineberger Cancer Center UNC Chapel Hill Oct 14, 2014 About UNC Oldest Public University -1793 Top 5 Public University. 46th World
More informationdixa a data infrastructure for chemical safety Jos Kleinjans Dept of Toxicogenomics Maastricht University
dixa a data infrastructure for chemical safety Jos Kleinjans Dept of Toxicogenomics Maastricht University Current protocol for chemical safety testing Short Term Tests for Genetic Toxicity Bacterial Reverse
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More information