Routine processing of large scale human whole genome sequencing data.

Size: px
Start display at page:

Download "Routine processing of large scale human whole genome sequencing data."

Transcription

1 Routine processing of large scale human whole genome sequencing data. Ies Nijman, UMCU, CPCT, Hartwig Medical Foundation Compute Resources for Life Science Research

2 Center for Personalized Cancer Treatment Bottom-up initiative founded in 2010 UMCU, EUR, NKI UMC Groningen AMC Amsterdam VuMC Amsterdam LUMC Leiden Meander Radboud Nijmegen MUMC Maastricht

3 Personalized treatment

4 Two Weeks Center for Personalized Cancer Treatment Patient with Metastatic Disease 2-4 Biopsies (fresh frozen) Pathological Analysis ng DNA Isolation ng Patient Stratification Research IonTorrent PGM MiSeq Illumina HiSeq X ten or Actionable Mutations & Amplification >50 genes + Biomarker Discovery Profiling Cancer Pathways and Processes Start Targeted Therapy Allocation Fase1 Clinical Trial Systems Biology Whole Genome Sequencing Response monitoring Bioinformatic analysis Resistance / Progression Remission / Cure Databanking Mutations, INDELs, Copy Number Variations in vitro / in vivo Modeling of Hypotheses Response monitoring in ecrf

5 National scaling: Hartwig Medical Foundation Value chain Including patients Collecting clinical data Taking biopsies 1 Preprocessing biopsies Sequencing DNA of biopsies Value chain Managing sequence and clinical data database, and reporting Analyzing database for new treatment and non-treatment options Conducting clinical trials Storage of tissue 1 (biobank) Centralized Facility in a Foundation setup - Made possible through philanthropy (2 to 3 years) - Whole genome sequencing using Illumina Xten setup - Integrate clinical and genetic data - Provide input for individual patient reporting - Provide access to cohort information for research to benefit future patient care Location: Matrix VI, Amsterdam Science Park Start: April 2015 Operational: Summer 2015

6 HUB organoids Test specific drugs on tumor organoids to confirm sensitivity Treat patient with selected drug(s) until disease progression Bioinformatics and Systems Biology to identify pathways Obtain patient biopsy

7 Targets: cpct: 2500 patients/yr Reference 30x; tumor 90x (= 4 genome eq) clinical genetics labs: 7500 samples/yr Total/max ~ genomes/yr

8 Compute Xten generates on avg 50 genomes/day (16 genomes/machine/3 days) Processing target: 16 genomes or 4 T-N pairs in 3 days Storage Raw data (BCL.gz>Fastq.gz): ~ 100 Gb/sample = ~ 5 TB/day Processed data (BAM, gvcf): ~ 100 Gb/sample = ~ 5 TB/day Temp data: 2-3 fold increase during processing Store what for how long? Datasharing Centrale storage / archive? How to get data to customers How to visualize/use centrally stored data.

9 BLCs bcl2fastq Fastq s BWA-MEM DeDuplication IndelRealign BaseRecalibration? BAM conversion report Read QC report Bam QC report bioinformatic NGS dataprocessing - Perl wrapper, logging & control backbone - Submits to grid engine - Runs with standardized.ini files to configure each module GATK Haplotypecaller gvcf GenotypeVCF VarScan Strelka Freebayes Mutect Somatic VCF Contra (exome only) FreeC CNVs Delly DEL, DUP, INV, TRA Filter & Annotate Filter & Annotate Filter & Annotate Filter & Annotate VCF Somatic VCF CNVs SVs

10 3 sequencers active Each linked to processing server (36 cores, 256Gb RAM, 10TB SSD) Overflow/extra runs in UMCU HPC. Processing still limiting factor! Merge runs / redos etc killing! Not the intention to expand local hardware Optimalisations in hardware & pipeline

11 single sample WGS 60x; Real time (hrs): total 58 Flagstat; 0,28 single sample WGS 60x; Core time (hrs): total 308 Flagstat; 2,28 variant_caller; 11,05 Sorting; 1,63 Mapping; 18,56 variant_caller; 60,00 indelrealign; 10,15 indelrealign; 3,48 Prestats; 6,70 Sorting; 16,65 Mapping; 186,89 Merge; 3,72 Poststats; 6,00 Merge; 24,91 Prestats; 6,70 Poststats; 6,00

12 7 sample set, 30x WGS;Real time (hrs): total 212 CNV_freec; 7,69 Flagstat; 0,96 7 sample set, 30x WGS; Core time (hrs): total 1058 CNV_freec; 11,15 Flagstat; 7,76 Variantcaller; 42,70 Mapping; 52,66 Sorting; 4,98 Variantcaller; 247,05 Merge; 11,81 Sorting; 52,05 Mapping; 522,38 Indelrealignment; 60,64 Prestats; 24,59 Poststats; 4,00 Indelrealignment; 84,00 Prestats; 24,57 Merge; 86,70 Poststats; 16,00

13 Tumor 120x- Normal 30x pair: Real time (hrs): total 1083 Indexing; 1,66 flagstat; 6,48 Tumor 120x- Normal 30x pair: Core time (hrs): total 2245 Indexing; 4,45 flagstat; 17,04 Mapping; 154,68 Merge&dedup; 22,21 Poststats; 0,00 PreStats; 55,23 Somatic; 1.077,28 Mapping; 581,72 Merge&dedup; 65,78 Indelrealignment; 57,42 Poststats; 0,00 PreStats; 60,61 Somatic; 705,37 Sorting; 17,59 VariantCalling; 45,04 CNV; 18,15 CNV; 25,56 VariantCalling; 272,04 Indelrealignment; 86,14 Sorting; 55,17

14 Pilots with various partners with central goal: Each sample runs with predictable and constant turn around time (<=3 days) and for the same price. Q: scalability; min and max sample flow Q: price curve days?

15 Partners: Curoverse (Arvados) on Azure Cloud BlueBee (IBM/TU Delft) on local power hardware Schuberg-Phillis on local, private cloud Genalice on simple hardware with proprietary software Surf-Sara Vancis

16 Genalice: insanely fast (mapping, variant calling): ~ 45 Results differ; still to figure out FP/FN Still need for additional hardware to complete CNV, somatic and other analyses. Surf Sara / Vancis: pilots not actively started yet

17 Pipeline runs; small optimalisations to match infrastructure Test set is Tumor 90x Normal 30x pair Wall time: 53 hrs Tumor 90x - Normal 30x pair: CPU hrs total 1128 QC stats; 16 Somatics; 192 Realignment & Variant calling; 312 Mapping, sorting, dedup; 608

18 tumor 90x - Normal 30x pair: Real Time (hrs): total 143 flagstat; 7,37 tumor 90x - Normal 30x pair: Core Time (hrs): total flagstat; 117,92 Mapping&markdup; 1.380,27 Mapping&markdup; 21,57 Somatic; 4.350,89 merge; 772,00 Somatic; 67,23 merge; 12,06 Indelrealignment; 2.074,14 Indelrealignment; 29,72 VariantAnnotation; 63,20 VariantCalling; 1.997,60 VariantAnnotation; 3,95 VariantCalling; 9,73

19 The type of data and analyses are difficult to optimize large number of parallel chunks suffer from slow/io merge steps. Some tools are just slow.. Until now we have no numbers on scalability from partners and/or concrete pricelevels..

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Introduction to NGS data analysis

Introduction to NGS data analysis Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High

More information

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo

Preparing the scenario for the use of patient s genome sequences in clinic. Joaquín Dopazo Preparing the scenario for the use of patient s genome sequences in clinic Joaquín Dopazo Computational Medicine Institute, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB),

More information

Services. Updated 05/31/2016

Services. Updated 05/31/2016 Updated 05/31/2016 Services 1. Whole exome sequencing... 2 2. Whole Genome Sequencing (WGS)... 3 3. 16S rrna sequencing... 4 4. Customized gene panels... 5 5. RNA-Seq... 6 6. qpcr... 7 7. HLA typing...

More information

Een behandeling op maat voor iedere patient: wat betekent dat in de praktijk? Emile Voest

Een behandeling op maat voor iedere patient: wat betekent dat in de praktijk? Emile Voest Een behandeling op maat voor iedere patient: wat betekent dat in de praktijk? Emile Voest Need for personalised cancer treatment Scenario: patient with metastasized colorectal cancer Doctor suggests treatment

More information

Data Analysis for Ion Torrent Sequencing

Data Analysis for Ion Torrent Sequencing IFU022 v140202 Research Use Only Instructions For Use Part III Data Analysis for Ion Torrent Sequencing MANUFACTURER: Multiplicom N.V. Galileilaan 18 2845 Niel Belgium Revision date: August 21, 2014 Page

More information

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED

Targeted. sequencing solutions. Accurate, scalable, fast TARGETED Targeted TARGETED Sequencing sequencing solutions Accurate, scalable, fast Sequencing for every lab, every budget, every application Ion Torrent semiconductor sequencing Ion Torrent technology has pioneered

More information

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Cluster Info Sheet About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Welcome to the PMCBRC cluster! We are happy to provide and manage this compute cluster as a resource

More information

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

DTL AS CROSS-SECTOR TECHNOLOGY PLATFORM

DTL AS CROSS-SECTOR TECHNOLOGY PLATFORM DTL AS CROSS-SECTOR TECHNOLOGY PLATFORM Enabling next generation life science research workshop Utrecht, October 23, 2013 AN INTEGRATED APPROACH TO BIG SCIENCE & DATA genomics/ngs proteomics metabolomics

More information

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA

HPC Cloud. Focus on your research. Floris Sluiter Project leader SARA HPC Cloud Focus on your research Floris Sluiter Project leader SARA Why an HPC Cloud? Christophe Blanchet, IDB - Infrastructure Distributing Biology: Big task to port them all to your favorite architecture

More information

Importance of Statistics in creating high dimensional data

Importance of Statistics in creating high dimensional data Importance of Statistics in creating high dimensional data Hemant K. Tiwari, PhD Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham History of Genomic Data

More information

Analysis of NGS Data

Analysis of NGS Data Analysis of NGS Data Introduction and Basics Folie: 1 Overview of Analysis Workflow Images Basecalling Sequences denovo - Sequencing Assembly Annotation Resequencing Alignments Comparison to reference

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation

Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation PN 100-9879 A1 TECHNICAL NOTE Single-Cell Whole Genome Sequencing on the C1 System: a Performance Evaluation Introduction Cancer is a dynamic evolutionary process of which intratumor genetic and phenotypic

More information

High Performance Compu2ng Facility

High Performance Compu2ng Facility High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,

More information

GC3 Use cases for the Cloud

GC3 Use cases for the Cloud GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect

More information

Practical Guideline for Whole Genome Sequencing

Practical Guideline for Whole Genome Sequencing Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics

More information

Overview of Next Generation Sequencing platform technologies

Overview of Next Generation Sequencing platform technologies Overview of Next Generation Sequencing platform technologies Dr. Bernd Timmermann Next Generation Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin, Germany Outline 1. Technologies

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center

Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center Q&A: Kevin Shianna on Ramping up Sequencing for the New York Genome Center Name: Kevin Shianna Age: 39 Position: Senior vice president, sequencing operations, New York Genome Center, since July 2012 Experience

More information

Focusing on results not data comprehensive data analysis for targeted next generation sequencing

Focusing on results not data comprehensive data analysis for targeted next generation sequencing Focusing on results not data comprehensive data analysis for targeted next generation sequencing Daniel Swan, Jolyon Holdstock, Angela Matchan, Richard Stark, John Shovelton, Duarte Mohla and Simon Hughes

More information

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Using genetic biomarkers to pre-identify oncology patients for clinical trials

Using genetic biomarkers to pre-identify oncology patients for clinical trials White paper Quintiles Vantage Point Quintiles helped develop or commercialize all of the Top 30 bestselling oncology products of 2014 Oncology pre-profiling: Using genetic biomarkers to pre-identify oncology

More information

Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis

Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis Yanlei Diao, Abhishek Roy University of Massachusetts Amherst {yanlei,aroy}@cs.umass.edu Toby Bloom New York Genome Center tbloom@nygenome.org

More information

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute

European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome

More information

Specialty Lab Informatics and its role in a large academic medical center

Specialty Lab Informatics and its role in a large academic medical center Specialty Lab Informatics and its role in a large academic medical center Zoltan N. Oltvai, M.D. Associate Professor Department of Pathology University of Pittsburgh Disclosures I have no financial interest,

More information

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti

Data deluge (and it s applications) Gianluigi Zanetti. Data deluge. (and its applications) Gianluigi Zanetti Data deluge (and its applications) Prologue Data is becoming cheaper and cheaper to produce and store Driving mechanism is parallelism on sensors, storage, computing Data directly produced are complex

More information

Hadoopizer : a cloud environment for bioinformatics data analysis

Hadoopizer : a cloud environment for bioinformatics data analysis Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,

More information

University Medical Centres

University Medical Centres University Medical Centres in the Netherlands AMC UMC Utrecht University Medical Centres University Medical Centres and the Health System Reform in the Netherlands: a Position Paper In the last ten years

More information

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department

More information

Lessons from the Stanford HIV Drug Resistance Database

Lessons from the Stanford HIV Drug Resistance Database 1 Lessons from the Stanford HIV Drug Resistance Database Bob Shafer, MD Department of Medicine and by Courtesy Pathology (Infectious Diseases) Stanford University Outline 2 Goals and rationale for HIVDB

More information

-> Integration of MAPHiTS in Galaxy

-> Integration of MAPHiTS in Galaxy Enabling NGS Analysis with(out) the Infrastructure, 12:0512 Development of a workflow for SNPs detection in grapevine From Sets to Graphs: Towards a Realistic Enrichment Analy species: MAPHiTS -> Integration

More information

Genetic diagnostics the gateway to personalized medicine

Genetic diagnostics the gateway to personalized medicine Micronova 20.11.2012 Genetic diagnostics the gateway to personalized medicine Kristiina Assoc. professor, Director of Genetic Department HUSLAB, Helsinki University Central Hospital The Human Genome Packed

More information

Big data in cancer research : DNA sequencing and personalised medicine

Big data in cancer research : DNA sequencing and personalised medicine Big in cancer research : DNA sequencing and personalised medicine Philippe Hupé Conférence BIGDATA 04/04/2013 1 - Titre de la présentation - nom du département émetteur et/ ou rédacteur - 00/00/2005 Deciphering

More information

Worldwide Collaborations in Molecular Profiling

Worldwide Collaborations in Molecular Profiling Worldwide Collaborations in Molecular Profiling Lillian L. Siu, MD Director, Phase I Program and Cancer Genomics Program Princess Margaret Cancer Centre Lillian Siu, MD Contracted Research: Novartis, Pfizer,

More information

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator

Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Organization and analysis of NGS variations. Alireza Hadj Khodabakhshi Research Investigator Why is the NGS data processing a big challenge? Computation cannot keep up with the Biology. Source: illumina

More information

Handling next generation sequence data

Handling next generation sequence data Handling next generation sequence data a pilot to run data analysis on the Dutch Life Sciences Grid Barbera van Schaik Bioinformatics Laboratory - KEBB Academic Medical Center Amsterdam Very short intro

More information

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools.

Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools. Accelerate genomic breakthroughs in microbiology. Gain deeper insights with powerful bioinformatic tools. Empowering microbial genomics. Extensive methods. Expansive possibilities. In microbiome studies

More information

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,

More information

How Real-time Analysis turns Big Medical Data into Precision Medicine?

How Real-time Analysis turns Big Medical Data into Precision Medicine? Medical Data into Dr. Matthieu-P. Schapranow GLOBAL HEALTH, Rome, Italy August 27, 2014 Important things first: Where to find additional information? Online: Visit http://we.analyzegenomes.com for latest

More information

Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients

Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Clinical Genomics at Scale: Synthesizing and Analyzing Big Data From Thousands of Patients Brandy Bernard PhD Senior Research Scientist Institute for Systems Biology Seattle, WA Dr. Bernard s research

More information

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office

Nazneen Aziz, PhD. Director, Molecular Medicine Transformation Program Office 2013 Laboratory Accreditation Program Audioconferences and Webinars Implementing Next Generation Sequencing (NGS) as a Clinical Tool in the Laboratory Nazneen Aziz, PhD Director, Molecular Medicine Transformation

More information

Accelerating variant calling

Accelerating variant calling Accelerating variant calling Mauricio Carneiro GSA Broad Institute Intel Genomic Sequencing Pipeline Workshop Mount Sinai 12/10/2013 This is the work of many Genome sequencing and analysis team Mark DePristo

More information

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio

More information

Disease gene identification with exome sequencing

Disease gene identification with exome sequencing Disease gene identification with exome sequencing Christian Gilissen Dept. of Human Genetics Radboud University Nijmegen Medical Centre c.gilissen@antrg.umcn.nl Contents Infrastructure Exome sequencing

More information

LifeScope Genomic Analysis Software 2.5

LifeScope Genomic Analysis Software 2.5 USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop doc.hpccloud.surfsara.nl UvA workshop 2016-01-25 UvA HPC Course Jan 2016 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working

More information

Core Facility Genomics

Core Facility Genomics Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray

More information

SAP HANA Enabling Genome Analysis

SAP HANA Enabling Genome Analysis SAP HANA Enabling Genome Analysis Joanna L. Kelley, PhD Postdoctoral Scholar, Stanford University Enakshi Singh, MSc HANA Product Management, SAP Labs LLC Outline Use cases Genomics review Challenges in

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

Automated and Scalable Data Management System for Genome Sequencing Data

Automated and Scalable Data Management System for Genome Sequencing Data Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs

More information

Hadoop-BAM and SeqPig

Hadoop-BAM and SeqPig Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer

More information

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage

Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research. March 17, 2011 Rendez-Vous Séquençage Advances in RainDance Sequence Enrichment Technology and Applications in Cancer Research March 17, 2011 Rendez-Vous Séquençage Presentation Overview Core Technology Review Sequence Enrichment Application

More information

Rapid Aneuploidy and CNV Detection in Single Cells using the MiSeq System

Rapid Aneuploidy and CNV Detection in Single Cells using the MiSeq System i Technical Note: Reproductive Health Rapid Aneuploidy and CNV Detection in Single Cells using the MiSeq System Comparison between data generated from single cells using 24sure array-based screening and

More information

Next generation DNA sequencing technologies. theory & prac-ce

Next generation DNA sequencing technologies. theory & prac-ce Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing

More information

An example of bioinformatics application on plant breeding projects in Rijk Zwaan

An example of bioinformatics application on plant breeding projects in Rijk Zwaan An example of bioinformatics application on plant breeding projects in Rijk Zwaan Xiangyu Rao 17-08-2012 Introduction of RZ Rijk Zwaan is active worldwide as a vegetable breeding company that focuses on

More information

Open source analytics for Big Data in Big Pharma

Open source analytics for Big Data in Big Pharma Open source analytics for Big Data in Big Pharma Applications in next generation sequencing data Big Data SIG 23 Apr 2015 Miika Ahdesmaki Miika Ahdesmaki 23 April 2015 Cambridge Wireless Big Data SIG AstraZeneca

More information

NGS and complex genetics

NGS and complex genetics NGS and complex genetics Robert Kraaij Genetic Laboratory Department of Internal Medicine r.kraaij@erasmusmc.nl Gene Hunting Rotterdam Study and GWAS Next Generation Sequencing Gene Hunting Mendelian gene

More information

Next Generation Sequencing. mapping mutations in congenital heart disease

Next Generation Sequencing. mapping mutations in congenital heart disease Next Generation Sequencing mapping mutations in congenital heart disease AV Postma PhD Academic Medical Center Amsterdam, the Netherlands Overview talk Congenital heart disease and genetics Next generation

More information

Integrated Rule-based Data Management System for Genome Sequencing Data

Integrated Rule-based Data Management System for Genome Sequencing Data Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer

More information

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable

Putting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable DDN Whitepaper Putting Genomes in the Cloud with WOS TM Making data sharing faster, easier and more scalable Table of Contents Cloud Computing 3 Build vs. Rent 4 Why WOS Fits the Cloud 4 Storing Sequences

More information

picturing tomorrow s treatment using yesterday s images

picturing tomorrow s treatment using yesterday s images picturing tomorrow s treatment using yesterday s images Outline today s treatment the problem our approach probability maps datamanagement & calculations SURFsara Google maps stakeholders

More information

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE

Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE Digital Health: Catapulting Personalised Medicine Forward STRATIFIED MEDICINE CRUK Stratified Medicine Initiative Somatic mutation testing for prediction of treatment response in patients with solid tumours:

More information

Identifying new approaches for cancer treatment

Identifying new approaches for cancer treatment Identifying new approaches for cancer treatment Extensive genetic and molecular of human tumor and healthy tissue research into the origins of cancer has samples derived during surgery made it clear that

More information

Next Generation Sequencing; Technologies, applications and data analysis

Next Generation Sequencing; Technologies, applications and data analysis ; Technologies, applications and data analysis Course 2542 Dr. Martie C.M. Verschuren Research group Analysis techniques in Life Science, Breda Prof. dr. Johan T. den Dunnen Leiden Genome Technology Center,

More information

SURFsara HPC Cloud Workshop

SURFsara HPC Cloud Workshop SURFsara HPC Cloud Workshop www.cloud.sara.nl Tutorial 2014-06-11 UvA HPC and Big Data Course June 2014 Anatoli Danezi, Markus van Dijk cloud-support@surfsara.nl Agenda Introduction and Overview (current

More information

Accelerating Data-Intensive Genome Analysis in the Cloud

Accelerating Data-Intensive Genome Analysis in the Cloud Accelerating Data-Intensive Genome Analysis in the Cloud Nabeel M Mohamed Heshan Lin Wu-chun Feng Department of Computer Science Virginia Tech Blacksburg, VA 24060 {nabeel, hlin2, wfeng}@vt.edu Abstract

More information

E. coli plasmid and gene profiling using Next Generation Sequencing

E. coli plasmid and gene profiling using Next Generation Sequencing E. coli plasmid and gene profiling using Next Generation Sequencing Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Introduction General

More information

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?

More information

Attacking the Biobank Bottleneck

Attacking the Biobank Bottleneck Attacking the Biobank Bottleneck Professor Jan-Eric Litton BBMRI-ERIC BBMRI-ERIC Big Data meets research biobanking Big data is high-volume, high-velocity and highvariety information assets that demand

More information

A leader in the development and application of information technology to prevent and treat disease.

A leader in the development and application of information technology to prevent and treat disease. A leader in the development and application of information technology to prevent and treat disease. About MOLECULAR HEALTH Molecular Health was founded in 2004 with the vision of changing healthcare. Today

More information

ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials)

ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials) ALCHEMIST (Adjuvant Lung Cancer Enrichment Marker Identification and Sequencing Trials) 3 Integrated Trials Testing Targeted Therapy in Early Stage Lung Cancer Part of NCI s Precision Medicine Effort in

More information

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls?

BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls? BRCA1 / 2 testing by massive sequencing highlights, shadows or pitfalls? Giovanni Luca Scaglione, PhD ------------------------ Laboratory of Clinical Molecular Diagnostics and Personalized Medicine, Institute

More information

16.40 Tumor - A New Generation of Cancer Biologics

16.40 Tumor - A New Generation of Cancer Biologics Tumor Cell Biology Meeting Program Wednesday, November 9, 2011 Hotel de Werelt Lunteren 09.30 Registration and coffee 10.30 Welcome by Anne-Marie Cleton, Pathology, LUMC, Leiden Session I : Invasion and

More information

Analyzing NGS data with clinical data: open source software for translational medicine

Analyzing NGS data with clinical data: open source software for translational medicine Analyzing NGS data with clinical data: open source software for translational medicine BASEL LIFE SCIENCE WEEK NGS FORUM SEPTEMBER 24, 2015 Kees van Bochove, CEO The Hyve Agenda 1. Introduction 2. Open

More information

Text file One header line meta information lines One line : variant/position

Text file One header line meta information lines One line : variant/position Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!

More information

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011

NECC History. Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 NECC History Karl V. Steiner 2011 Annual NECC Meeting, Orono, Maine March 15, 2011 EPSCoR Cyberinfrastructure Workshop First regional NENI (now NECC) Workshop held in Vermont in August 2007 Workshop heldinkentucky

More information

HPC Growing Pains. Lessons learned from building a Top500 supercomputer

HPC Growing Pains. Lessons learned from building a Top500 supercomputer HPC Growing Pains Lessons learned from building a Top500 supercomputer John L. Wofford Center for Computational Biology & Bioinformatics Columbia University I. What is C2B2? Outline Lessons learned from

More information

Computational Requirements

Computational Requirements Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density

More information

Release Information. Copyright. Limit of Liability. Trademarks. Customer Support

Release Information. Copyright. Limit of Liability. Trademarks. Customer Support Release Information Document Version Number GeneticistAsst-1.1.6-UG002 Software Version 1.1.6 Document Status Final Copyright 2015. SoftGenetics, LLC, All rights reserved. The information contained herein

More information

The NeurOmics team at a recent project meeting

The NeurOmics team at a recent project meeting Introduction Welcome to the NeurOmics project newsletter. This is the second edition and comes after the project has been underway for just over a year. This means that whilst we still have lots of work

More information

Overview sequence projects

Overview sequence projects Overview sequence projects Bioassist NGS meeting 15-01-2010 Barbera van Schaik KEBB - Bioinformatics Laboratory b.d.vanschaik@amc.uva.nl NGS at the Academic Medical Center Sequence facility Laboratory

More information

SEQUENCING. From Sample to Sequence-Ready

SEQUENCING. From Sample to Sequence-Ready SEQUENCING From Sample to Sequence-Ready ACCESS ARRAY SYSTEM HIGH-QUALITY LIBRARIES, NOT ONCE, BUT EVERY TIME The highest-quality amplicons more sensitive, accurate, and specific Full support for all major

More information

Genomic Medicine The Future of Cancer Care. Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America

Genomic Medicine The Future of Cancer Care. Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America Genomic Medicine The Future of Cancer Care Shayma Master Kazmi, M.D. Medical Oncology/Hematology Cancer Treatment Centers of America Personalized Medicine Personalized health care is a broad term for interventions

More information

CHALLENGES IN NEXT-GENERATION SEQUENCING

CHALLENGES IN NEXT-GENERATION SEQUENCING CHALLENGES IN NEXT-GENERATION SEQUENCING BASIC TENETS OF DATA AND HPC Gray s Laws of data engineering 1 : Scientific computing is very dataintensive, with no real limits. The solution is scale-out architecture

More information

Oncology Insights Enabled by Knowledge Base-Guided Panel Design and the Seamless Workflow of the GeneReader NGS System

Oncology Insights Enabled by Knowledge Base-Guided Panel Design and the Seamless Workflow of the GeneReader NGS System White Paper Oncology Insights Enabled by Knowledge Base-Guided Panel Design and the Seamless Workflow of the GeneReader NGS System Abstract: This paper describes QIAGEN s philosophy and process for developing

More information

Global Alliance. Ewan Birney Associate Director EMBL-EBI

Global Alliance. Ewan Birney Associate Director EMBL-EBI Global Alliance Ewan Birney Associate Director EMBL-EBI Our world is changing Research to Medical Research English as language Lightweight legal Identical/similar systems Open data Publications Grant-funding

More information

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow Barry Bolding Cray Inc Seattle, WA 1 CUG 2013 Paper Genomic Applications on Cray supercomputers: Next Generation Sequencing

More information

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015

MediSapiens Ltd. Bio-IT solutions for improving cancer patient care. Because data is not knowledge. 19th of March 2015 19th of March 2015 MediSapiens Ltd Because data is not knowledge Bio-IT solutions for improving cancer patient care Sami Kilpinen, Ph.D Co-founder, CEO MediSapiens Ltd Copyright 2015 MediSapiens Ltd. All

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information