Accelerating Life Science Discovery using a High-Performance Analytics Platform in a Collaborative Environment Overview

Size: px
Start display at page:

Download "Accelerating Life Science Discovery using a High-Performance Analytics Platform in a Collaborative Environment Overview"

Transcription

1 Accelerating Life Science Discovery using a High-Performance Analytics Platform in a Collaborative Environment Overview October 7, 2015 Kathy Tzeng, PhD Worldwide Technical Lead Healthcare & Life Sciences IBM Systems Group

2 Genomic Solution Enablement Team Mission: Porting and Optimization of Genomics/Translational applications on IBM solution Developing Solutions with Partners Making IBM SW/HW available to Software developers Members: Independent Software Vendor (ISV) team Toronto Compiler Lab Boeblingen Development Lab Tokyo Research Lab Austin Research Lab 2

3 GENOMIC MEDICINE from Sequencing to Personalized Healthcare NHGRI, a branch of NIH, has defined 5 steps for genomic medicine. (source: E. Green et al., Nature 470, ) 3 Next Generation Sequencing (or other ingestion) the focus is on very large data generation, mainly from $1000 whole genome sequencing, and the data processing and reduction includes human, plant, animal, and microbiome genomics Translational Research/Early Discovery the focus is on data integration including genomic data, and the analytics required to identify biomarkers, understand disease mechanisms, and to identify new medical treatments Personalized Healthcare/Clinical Genomics the focus is on delivering genomic medicine to patients to improve outcomes by associating patients with known genomic specific treatments

4 A Computationally Challenging Problem Breakthroughs in Genomic Medicine require quantifying associations between known population traits, environmental factors, and biological responses Known Traits or Environmental Features Predictive Response Function Measured Biological Response F(t) W(t) R(t) Quantities describing population traits or environmental factors at time t Model of associations between features and responses as a function of time t Computational Challenges Feature combinatorics Large file sizes Large population sizes Unstructured data types Quantities describing response events for an organism at time t 4

5 Workload Challenge #1: Big Data Analytics Variant information requires a computationally intensive analysis of raw sequence data across thousands of genomic samples Processing time per genome 1 to 100 hours * on 1 compute node * Duration depends on selection of analytical tools and hardware High-Throughput Sequencing Assembly & Alignment Variant Calling Variant Annotations Raw Reads De Novo Assembly SOAPdenovo Velvet Reference-Based Mapping BWA Bowtie SOAP Reference Genomes TGCA GEO dbsnp Variant Calling Picard GATK SAMtools SOAPsnp Annotation Tools ANNOVAR Gene Ontology File Format FastQ BAM VCF Sample: intergenic SNP in IL23R associated with Crohn's disease 3 billion DNA base pairs Whole Human 30 x coverage ~ 150 GB (compressed) ~ 150 GB 100 to 200 MB 500 MB Each human genome can have a few million variants 5

6 Workload Challenge #2: Unstructured Information Scientific data must be extracted from very large volumes of natural language content, biomedical images, and other unstructured data, and transformed into a structured format for analysis Omics Data Variant Databases exonic NOD2 16 a frameshift SNP exonic GJB2 13 associated with hearing loss exonic CRYL1,GJB6 13 a 342kb deletion Phenotypic Data Ex. Clinical Histories, Medical Images was in good health until 2-3 months ago when she gradually developed fatigue and intermittent epigastric pain, Scientific Literature Peer-Reviewed Articles, Clinical Guidelines, Textbooks, Patents Information must be transformed into normalized structured data for statistical analysis and relationship visualization 6

7 Workload Challenge #3: Big Data Integration Discovery of genotype-phenotype associations requires an analysis of complex data types that must be integrated within a common analytical environment 1 Omics Data ##FORMAT=<ID=DP, ##FORMAT=<ID=HQ, #CHROM POS ID REF ALT rs G A Variant Calls & Annotations 2 Phenotypic Data 3 Knowledge Base Clinical Features, Environmental Factors, Biological Responses Electronic Text & Web Sites + Big Data Warehouse Environment Patient-Centric Logical Data Model Genotypic Data 1 Variant List VCF Patient ID Patient ID Observed Traits & Responses 2 Phenotypic Data Variant ID Phenotype ID Detail on a Single Variant Patient Population Observation Detail 3 Knowledge Base RDBMS and/or NoSQL 7

8 Key Capabilities Leading biomedical research organizations are asking for technology capabilities that will give them a low-cost solution to accelerate scientific discovery in Genomic Medicine Flexible, scalable, and low-cost high-performance compute and storage solutions capable of efficiently processing rapidly growing quantities of genomic and other types of complex life science data Seamless integration of complex life science data types on a common analytical platform Rapid extraction and analysis of unstructured language content from very large volumes of clinical and scientific documents Metadata collection capabilities providing detailed audit trails as source data are transformed into analytical results Tools for scientific collaboration that enable data and workload sharing tocross organizations and geographic boundaries in a secure environment that ensures data privacy 8

9 A Foundation for Computational Science IBM s Reference Architecture for Genomic Medicine supports big data computational research on a foundation of HPC compute, storage, and workload management capabilities Performance optimization for open source and commercial analytics applications Research Applications Computational Modeling Genomic Analysis Pipelines Text Analytics /NLP -Apache UIMA -IBM System T LAN Image Analysis Text Analytics for the conversion of natural language concepts into structured data entities IBM Research, IBM Watson, IBM Business Partners RDBMS or NoSQL database environments enabling rapid processing of large volumes of complex highdimensional data structures in a data warehouse IBM BigInsights, IBM Business Partners Big Data Foundation Big Data Warehouse + Workload Orchestration with Metadata Capture Data Management: File System & Storage / ILM Intelligent resource allocation, sharing, and monitoring across parallel HPC workloads IBM Platform Computing, IBM Business Partners Low-cost, low-latency, easy-access storage & archiving of data and metadata across heterogeneous environments IBM Spectrum Scale / Elastic Storage Server 9

10 IBM Systems Facilitate Scientific Collaboration Data management and analytics tools can be accessed and shared across heterogeneous systems in on-premise and cloud environments Local Data Center On-Premise Users External Collaborators (Heterogeneous Environments) Private Cloud Users Public Cloud Users 1/10 GbE Applications Big Data Warehouse HPC Network Workload Orchestration with Metadata Capture Data Management: File System / Storage ILM Workload Burst WAN 10GbE or InfiniBand Big Data foundation enables data access, data management, and HPC workload orchestration across heterogeneous onpremise, private cloud, public cloud, and hybrid cloud environments On-Premise Cluster Virtual Private Clouds Encrypted VPN 10

11 Workload Orchestration Platforms Genomics Translational Personalized Healthcare Access AppCenter (PAC, Galaxy, DataBiology, Lab7) Application & Workflow File & Database Visualization System & Log Compute Orchestrator (ASC/EGO, LSF, Symphony, PPM) HPC Cluster Big Data Spark Cluster Openstack Docker Storage Datahub (Spectrum Scale, Zato, Nirvana) SSD/Flash FC/IB Attached Low-cost Storage HA/DR Storage Cloud Storage 11

12 A framework for NGS and HPC Systems Architecture Users HPC Platform Management Software Stack Suite Scale-out cluster Scale-up SMP Spectrum Scale ESS Active Archive TSM/LTFS/HPSS Devices 12

13 IBM Genomics Reference Architecture The IBM Reference Architecture is an ecosystem of data management and analytics tools developed by IBM and industry-leading commercial and open source software providers Edico Genome 13

14 BioBuilds Open Source Bioinformatics Open Source bioinformatics tools for research, commercial, and regulated environments. Turn-key: Pre-built binaries and complete build scripts enable easy deployment Optimized: POWER8 binaries provide the best performance for your hardware Ready for the Clinic: A single source for tools streamlining verification and audit Long Term Support: Community sponsorship and support contracts ensure ongoing support for tools 14

15 Open Source Application Portfolio in BioBuilds ALLPATHS-LG Bedtools Bfast BLAST (NCBI) Bowite Bowtie2 BWA Cufflinks FastQC Numpy PICARD PLINK Python SAMTools SOAP3-DP SOAPDenovo SQLite R Bioconductor FASTA Trinity SHRiMP Updated tools HMMER (LE) OpenSSL IGV irods RNAStar ISAAC TMAP SOAPaligner/soap2 Updated tools Bowtie2 HMMER Tabix BWA HTSeq Mothur TopHat Velvet/Oases OpenSSL 15 15

16 Optimization of GATK from Broad Institute IBM works with genomics leaders to improve performance of analytical workflows like GATK on IBM Power 8 Systems 16

17 Optimization of Broad s Best Practice Pipeline ~ 65X Whole Human Genome analysis done within a day ~ 150X Whole Exome analysis done in 3.45 hours Steps Intel Runtime* IBM Runtime BWA Samtools MarkDuplicates RealignTargets IndelRealigner BaseRecalibrator PrintReads+Index PreProcessiong Total HaplotypeCaller 2.03 Input Dataset: G15512.HCC1954.1, coverage: 65x Both IBM and Intel solution: # of Machines = 1 # of cores/machine = 24 IBM Solution: GHz Power8 with GPFS Total Note*: 17

18 Performance of L3 Bioinformatics BALSA on Power 8 with GPU Power GHz, 2x k40 GPU and GPFS 18

19 IO Cache Library to Optimize Performance of Genomics Application IBM uses a File Cache Library to improve I/O Performance and reduce workflow runtimes Application: Illumina s Casava V. 1.8 (BCL to FASTQ) Data Set: 8 lanes of HiSeq data Without cache library With cache library Elapsed Time = 1730 min Elapsed Time = 107 min 19

20 Accelerating Genomics Applications using GPFS IBM and BIOVIA s Pipeline Pilot scale genomic analysis from the desktop to the enterprise using IBM GPFS Speed of the file system matters Bowtie2: NGS Benchmarks on 2.6 GHz idataplex with GPFS and NFS Elapsed Time in Minutes, lower is better GPFS NFS

21 Genomic Workflow Optimization Typical Genomic Sequencing Workflow Command Line bwa aln -t 12 -l 40 -n 3 -k 2 bwa sampe -a 700 -P -o 1000 samtools view bt samtools sort Picard: java Xmx8g -Djava.io.tmpdir MarkDuplicates.jar METRICS_FILE=metrics CREATE_INDEX=true VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=true ASSUME_SORTED=true TMP_DIR Picard: java -Xmx8g -Djava.io.tmpdir AddOrReplaceReadGroups.jar SORT_ORDER=coordinate RGID=sample_lane RGLB=sample RGPL=illumina RGPU=lane RGSM=sample RGCN=center_name CREATE_INDEX=True VALIDATION_STRINGENCY=LENIENT TMP_DIR Gatk lite: java -Xmx8g -Djava.io.tmpdir -T RealignerTargetCreator -nt 1 Gatk lite: java -Xmx8g -Djava.io.tmpdir -T IndelRealigner -targetintervals -known 1000G_biallelic.indels.hg19.vcf Picard: java -Xmx8g -Djava.io.tmpdir FixMateInformation.jar SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true TMP_DIR Gatk lite: java -Xmx#{JAVA_REQMEM}g -Djava.io.tmpdir -T CountCovariates recalfile - knownsites:dbsnp,vcf /gpfs/gpfs1/genome/snp_indel_vcf/dbsnp_137.hg19.vcf -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate Gatk lit: java -Xmx8g -Djava.io.tmpdir -T TableRecalibration -recalfile -smode SET_Q_ZERO - solid_nocall_strategy THROW_EXCEPTION -nback 7 --baq RECALCULATE Gatk lite:java -Xmx4g -jar $GATK_BIN/GenomeAnalysisTK.jar -glm BOTH -R $REFERENCE -T UnifiedGenotyper I recalibrated.bam 21

22 Genomic Workflow Optimization IBM Platform Process Manager facilitates genomic workflow execution 22

23 Genomic Workflow Optimization IBM Platform LSF workload scheduler is linked to the Process Manager and maximizes the utilization of HPC resources to improve workflow runtimes Data Set: 37x coverage of whole human genomes Workflow Input: 74 fastq.gz files, Workflow Output: Recalibrated Bam file Dependency steps = Using LSF bsub w option Runs 1 st Set 2 nd Set 3 rd Set 4 th Set Total Sets 1 set on 8 nodes hrs hrs 4 sets on 8 nodes hrs 20.9 hrs hrs hrs hrs 23

24 Data Compression Appliance Compression Algorithms gzip on Power 8 with FPGA board available now CRAM Compression ratio (lossless) On average 1:3 for fastq files 1:2 to 1:4 with respect to BAM files depending on the sequencing depth and other factors. (from FASTQ to compressed BAM ratio is 16X) Speed/throughput 2.5GB/s on average (200 GB fastq can be compressed in 80 second) Achieved beyond 10 times speed up using 12 cores (approximately 0.5GB/min) FPGA acceleration is ongoing. Pistoia compression contest was held in James Bonfieldof Sanger Institute won with 1:9 compression ratio and 0.1GB/min CRAM is released late 2012 to compress BAM file by EBI and accepted by Global Alliance of Genomics and Health. IBM is collaborating with Sanger Institute and EBI on improving compression for genomics data Samtools, Picard, CRAM Source:Baker M.,Nature Methods7, (2010) 24

25 .. >187_29_706_F3 T T >187_29_829_F3 T >187_29_858_F3 T > Enterprise Data Management IBM works with Lab7 to deliver data provenance with performance, reliability and security Experimental Design Sample Prep Sequencing Mapping Analysis Reporting Meta Analysis Sample LIMS User Experience Workflow Engine Federated Data Engine Pipeline Engine Sample Data Reference Attribute Sheet Pipeline Visualization/EDA Lab7 ESP Comprehensive software platform --- combines LIMS and informatics functionalities h Data provenance ---maintains continuous data provenance by: Tracking the history of samples, analyses, and results Providing detailed audit trails 9 Sequencing platform flexibility ---manages data generated from any sequencing platform IBM Power System Solution with GPFS and Platform LSF delivers: Superior compute infrastructure --- Superior performance, scalability & maximum throughput 8 Outstanding enterprise-grade reliability and security: Reliability, Availability and Serviceability (RAS) features help avoid unplanned downtime IBM Power Security and Compliance (PowerSC ) enables security compliance automation and includes reporting for compliance measurement and audit (HIPAA) 8 Total cost of ownership --- Very affordable compared to like-sized x86 systems 25

26 Data Provenance with Performance, Reliability and Security 26 Databiology for Enterprise Functional Architecture Databiology for Enterprise Scientific Samples Annotation Ontologies Shopping Basket Social Comments + Attachments + WF Integration 3 C s (Configure, Command, Collaborate) Portal API Custom Web Apps via API Compute and Storage Softlayer LSF GPFS Project Management Roles + Access Lifecycle Management Meta Information Financial + Resource Mgmt Task Management Transport DBE Download Manager DBE Multiprot S3, SCP, RSync, SFTP, FTP HTTP Applications Import Analysis Visualization Configuration Infrastructure Compute Storage Network Identity Management Instruments Logic Everything as an app: Scripts, Binaries, Pipelines, Workflow Management, Virtual Machines Version Control + Reproducible Data Provenance IBM Power System Solution with GPFS and Platform LSF delivers: Superior compute infrastructure --- Superior performance, scalability & maximum throughput 8 Outstanding enterprise-grade reliability and security: Reliability, Availability and Serviceability (RAS) features help avoid unplanned downtime IBM Power Security and Compliance (PowerSC ) enables security compliance automation and includes reporting for compliance measurement and audit (HIPAA) 8Total cost of ownership --- Very affordable compared to like-sized x86 systems Interface Information Management Orchestration SaaS + customer specific instances Central hub to manage all omics data and to orchestrate all activities Functionally rich and orientated on key steps in R&D life cycle Insight to Instrument with best in class applications Easy integration with existing environments Automatic data provenance and reporting Cost neutral deployment Gradual roll-out / Low risk

27 transmart - Optimized on Power8 and Spectrum Scale transmart associates genotypic & phenotypic data for complex analytics Watson Explorer extracts insight from scientific literature and data record and provides enrichment to transmart s analysis 27

28 transmart Power8 Deployment Architecture Users Application Browser HTTP Web Server (Apache2) Watson Analytics Server HTTP I2b2 Application Server transmart Solr Full Text index Watson Analytics JDBC Application Server PLINK GPFS PostgreSQL transmart DB JDBC (Tomcat 7) Quartz Job Call R Analytics Tools Gene Patterns Power8 28

29 Accelerate transmart ETL by Power8/Spectrum Scale Dataset TCGA_OV Simulation GSE32583 GSE13168 GSE1456 GSE15258 No. Records 5,789,632 40,774, ,724 1,203,282 3,600,555 4,702,050 29

30 Zato s Scalable Data Federation Solution for Healthcare and Genomics Data Spanning Data Centers in Parallel with a Single Pane of Glass for Clinical and Research Applications on Power 8 and GPFS Imaging Data Lab Results LAN LAN Electronic Health Record Data LAN Nursing Home Records VPN Microbiology Reports LAN Claims Data Radiology Reports LAN VPN VPN Internet Genomic Data Accepted Medical Knowledge NIH Data CDC Data NLM Data 30

31 Thank You 31 22

32 32 32

33 Noblis BioVelocity is Developed and Optimized on IBM s Power 8 33

Practical Solutions for Big Data Analytics

Practical Solutions for Big Data Analytics Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)

More information

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA

Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow. Barry Bolding. Cray Inc Seattle, WA Genomic Applications on Cray supercomputers: Next Generation Sequencing Workflow Barry Bolding Cray Inc Seattle, WA 1 CUG 2013 Paper Genomic Applications on Cray supercomputers: Next Generation Sequencing

More information

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013

ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and

More information

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri

Large-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis

More information

Data management challenges in todays Healthcare and Life Sciences ecosystems

Data management challenges in todays Healthcare and Life Sciences ecosystems Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com Evolution of Data Sets in Healthcare

More information

Text file One header line meta information lines One line : variant/position

Text file One header line meta information lines One line : variant/position Software Calling: GATK SAMTOOLS mpileup Varscan SOAP VCF format Text file One header line meta information lines One line : variant/position ##fileformat=vcfv4.1! ##filedate=20090805! ##source=myimputationprogramv3.1!

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Practical Guideline for Whole Genome Sequencing

Practical Guideline for Whole Genome Sequencing Practical Guideline for Whole Genome Sequencing Disclosure Kwangsik Nho Assistant Professor Center for Neuroimaging Department of Radiology and Imaging Sciences Center for Computational Biology and Bioinformatics

More information

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud

IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain

More information

Integrated Rule-based Data Management System for Genome Sequencing Data

Integrated Rule-based Data Management System for Genome Sequencing Data Integrated Rule-based Data Management System for Genome Sequencing Data A Research Data Management (RDM) Green Shoots Pilots Project Report by Michael Mueller, Simon Burbidge, Steven Lawlor and Jorge Ferrer

More information

Globus Genomics Tutorial GlobusWorld 2014

Globus Genomics Tutorial GlobusWorld 2014 Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for

More information

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix

Alternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?

More information

IBM Reference Architecture for Genomics

IBM Reference Architecture for Genomics Front cover IBM Reference Architecture for Genomics Speed, Scale, Smarts Frank Lee, Ph.D. Redpaper Genomic medicine promises to revolutionize biomedical research and clinical care. By investigating the

More information

New solutions for Big Data Analysis and Visualization

New solutions for Big Data Analysis and Visualization New solutions for Big Data Analysis and Visualization From HPC to cloud-based solutions Barcelona, February 2013 Nacho Medina imedina@cipf.es http://bioinfo.cipf.es/imedina Head of the Computational Biology

More information

Delivering the power of the world s most successful genomics platform

Delivering the power of the world s most successful genomics platform Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE

More information

Introduction to Arvados. A Curoverse White Paper

Introduction to Arvados. A Curoverse White Paper Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12

More information

Getting Started & Successful with Big Data

Getting Started & Successful with Big Data Getting Started & Successful with Big Data @Pentaho #BigDataWebSeries 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866) 660-7555 Your Hosts Today Davy Nys VP EMEA & APAC Pentaho Paul

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

IBM 000-281 EXAM QUESTIONS & ANSWERS

IBM 000-281 EXAM QUESTIONS & ANSWERS IBM 000-281 EXAM QUESTIONS & ANSWERS Number: 000-281 Passing Score: 800 Time Limit: 120 min File Version: 58.8 http://www.gratisexam.com/ IBM 000-281 EXAM QUESTIONS & ANSWERS Exam Name: Foundations of

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

Managing and Conducting Biomedical Research on the Cloud Prasad Patil

Managing and Conducting Biomedical Research on the Cloud Prasad Patil Managing and Conducting Biomedical Research on the Cloud Prasad Patil Laboratory for Personalized Medicine Center for Biomedical Informatics Harvard Medical School SaaS & PaaS gmail google docs app engine

More information

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, 2015. 20014 IBM Corporation

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, 2015. 20014 IBM Corporation Boas Betzler Cloud IBM Distinguished Computing Engineer for a Smarter Planet Globally Distributed IaaS Platform Examples AWS and SoftLayer November 9, 2015 20014 IBM Corporation Building Data Centers The

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers

Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Introduction to NGS data analysis

Introduction to NGS data analysis Introduction to NGS data analysis Jeroen F. J. Laros Leiden Genome Technology Center Department of Human Genetics Center for Human and Clinical Genetics Sequencing Illumina platforms Characteristics: High

More information

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute

Data Analysis & Management of High-throughput Sequencing Data. Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Data Analysis & Management of High-throughput Sequencing Data Quoclinh Nguyen Research Informatics Genomics Core / Medical Research Institute Current Issues Current Issues The QSEQ file Number files per

More information

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC

Agenda. HPC Software Stack. HPC Post-Processing Visualization. Case Study National Scientific Center. European HPC Benchmark Center Montpellier PSSC HPC Architecture End to End Alexandre Chauvin Agenda HPC Software Stack Visualization National Scientific Center 2 Agenda HPC Software Stack Alexandre Chauvin Typical HPC Software Stack Externes LAN Typical

More information

Quick Reference Selling Guide for Intel Lustre Solutions Overview

Quick Reference Selling Guide for Intel Lustre Solutions Overview Overview The 30 Second Pitch Intel Solutions for Lustre* solutions Deliver sustained storage performance needed that accelerate breakthrough innovations and deliver smarter, data-driven decisions for enterprise

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Scaling up to Production

Scaling up to Production 1 Scaling up to Production Overview Productionize then Scale Building Production Systems Scaling Production Systems Use Case: Scaling a Production Galaxy Instance Infrastructure Advice 2 PRODUCTIONIZE

More information

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved

Accelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved DDN Case Study Accelerate > Converged Storage Infrastructure 2013 DataDirect Networks. All Rights Reserved The University of Florida s (ICBR) offers access to cutting-edge technologies designed to enable

More information

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster

About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Cluster Info Sheet About the Princess Margaret Computational Biology Resource Centre (PMCBRC) cluster Welcome to the PMCBRC cluster! We are happy to provide and manage this compute cluster as a resource

More information

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution

OpenCB a next generation big data analytics and visualisation platform for the Omics revolution OpenCB a next generation big data analytics and visualisation platform for the Omics revolution Development at the University of Cambridge - Closing the Omics / Moore s law gap with Dell & Intel Ignacio

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

Hadoopizer : a cloud environment for bioinformatics data analysis

Hadoopizer : a cloud environment for bioinformatics data analysis Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) anthony.bretaudeau@irisa.fr, INRIA/Irisa, Campus de Beaulieu, 35042,

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

IBM ELASTIC STORAGE SEAN LEE

IBM ELASTIC STORAGE SEAN LEE IBM ELASTIC STORAGE SEAN LEE Solution Architect Platform Computing Division IBM Greater China Group Agenda Challenges in Data Management What is IBM Elastic Storage Key Features Elastic Storage Server

More information

Automated and Scalable Data Management System for Genome Sequencing Data

Automated and Scalable Data Management System for Genome Sequencing Data Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs

More information

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik

Leading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated

More information

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne

DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne DeIC Watson Agreement - hvad betyder den for DeIC medlemmerne Preben Jacobsen Solution Architect Nordic Lead, Software Defined Infrastructure Group IBM Danmark 2014 IBM Corporation Link: https://www.youtube.com/watch?v=_xcmh1lqb9i

More information

GC3 Use cases for the Cloud

GC3 Use cases for the Cloud GC3: Grid Computing Competence Center GC3 Use cases for the Cloud Some real world examples suited for cloud systems Antonio Messina Trieste, 24.10.2013 Who am I System Architect

More information

CHALLENGES IN NEXT-GENERATION SEQUENCING

CHALLENGES IN NEXT-GENERATION SEQUENCING CHALLENGES IN NEXT-GENERATION SEQUENCING BASIC TENETS OF DATA AND HPC Gray s Laws of data engineering 1 : Scientific computing is very dataintensive, with no real limits. The solution is scale-out architecture

More information

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences

Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences Euro-BioImaging European Research Infrastructure for Imaging Technologies in Biological and Biomedical Sciences WP11 Data Storage and Analysis Task 11.1 Coordination Deliverable 11.2 Community Needs of

More information

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Scalable Cloud Computing Solutions for Next Generation Sequencing Data Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of

More information

Hadoop s Rise in Life Sciences

Hadoop s Rise in Life Sciences Exploring EMC Isilon scale-out storage solutions Hadoop s Rise in Life Sciences By John Russell, Contributing Editor, Bio IT World Produced by Cambridge Healthtech Media Group By now the Big Data challenge

More information

NVIDIA GPUs in the Cloud

NVIDIA GPUs in the Cloud NVIDIA GPUs in the Cloud 4 EVOLVING CLOUD REQUIREMENTS On premises Off premises Hybrid Cloud Connecting clouds New workloads Components to disrupt 5 GLOBAL CLOUD PLATFORM Unified architecture enabled by

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

Turbo-Charging Open Source Hadoop for Faster, more Meaningful Insights

Turbo-Charging Open Source Hadoop for Faster, more Meaningful Insights Turbo-Charging Open Source Hadoop for Faster, more Meaningful Insights Gord Sissons Senior Manager, Technical Marketing IM Platform Computing gsissons@ca.ibm.com Agenda Some Context IM Platform Computing

More information

<Insert Picture Here> Infrastructure as a Service (IaaS) Cloud Computing for Enterprises

<Insert Picture Here> Infrastructure as a Service (IaaS) Cloud Computing for Enterprises Infrastructure as a Service (IaaS) Cloud Computing for Enterprises Speaker Title The following is intended to outline our general product direction. It is intended for information

More information

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment

Tutorial for Windows and Macintosh. Preparing Your Data for NGS Alignment Tutorial for Windows and Macintosh Preparing Your Data for NGS Alignment 2015 Gene Codes Corporation Gene Codes Corporation 775 Technology Drive, Ann Arbor, MI 48108 USA 1.800.497.4939 (USA) 1.734.769.7249

More information

It s Not Public Versus Private Clouds - It s the Right Infrastructure at the Right Time With the IBM Systems and Storage Portfolio

It s Not Public Versus Private Clouds - It s the Right Infrastructure at the Right Time With the IBM Systems and Storage Portfolio White Paper - It s the Right Infrastructure at the Right Time With the IBM Systems and Storage Portfolio Contents Executive Summary....2 Introduction....3 Private clouds - Powerful tech, new solutions....3

More information

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System

Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance

More information

Accelerating Data-Intensive Genome Analysis in the Cloud

Accelerating Data-Intensive Genome Analysis in the Cloud Accelerating Data-Intensive Genome Analysis in the Cloud Nabeel M Mohamed Heshan Lin Wu-chun Feng Department of Computer Science Virginia Tech Blacksburg, VA 24060 {nabeel, hlin2, wfeng}@vt.edu Abstract

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production

UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department

More information

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT

Building Bioinformatics Capacity in Africa. Nicky Mulder CBIO Group, UCT Building Bioinformatics Capacity in Africa Nicky Mulder CBIO Group, UCT Outline What is bioinformatics? Why do we need IT infrastructure? What e-infrastructure does it require? How we are developing this

More information

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Addressing Open Source Big Data, Hadoop, and MapReduce limitations Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3)

Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3) Windows HPC Server 2008 R2 Service Pack 3 (V3 SP3) Greg Burgess, Principal Development Manager Windows Azure High Performance Computing Microsoft Corporation HPC Server Components Job Scheduler Distributed

More information

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges

More information

Cloud-based Analytics and Map Reduce

Cloud-based Analytics and Map Reduce 1 Cloud-based Analytics and Map Reduce Datasets Many technologies converging around Big Data theme Cloud Computing, NoSQL, Graph Analytics Biology is becoming increasingly data intensive Sequencing, imaging,

More information

Personalized Medicine and IT

Personalized Medicine and IT Personalized Medicine and IT Data-driven Medicine in the Age of Genomics www.intel.com/healthcare/bigdata Ketan Paranjape General Manager, Life Sciences Intel Corp. @Portlandketan 1 The Central Dogma of

More information

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM

IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM IBM Platform Computing : infrastructure management for HPC solutions on OpenPOWER Jing Li, Software Development Manager IBM #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 Scale-out and Cloud

More information

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:

IBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform: Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.

More information

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we

More information

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille

Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille Eoulsan Analyse du séquençage à haut débit dans le cloud et sur la grille Journées SUCCES Stéphane Le Crom (UPMC IBENS) stephane.le_crom@upmc.fr Paris November 2013 The Sanger DNA sequencing method Sequencing

More information

Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform

Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform Joris Poort, President & CEO, Rescale, Inc. Ilea Graedel, Manager, Rescale, Inc. 1 Cloud HPC on the Rise 1.1 Background Engineering and science

More information

CSE-E5430 Scalable Cloud Computing. Lecture 4

CSE-E5430 Scalable Cloud Computing. Lecture 4 Lecture 4 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 5.10-2015 1/23 Hadoop - Linux of Big Data Hadoop = Open Source Distributed Operating System

More information

Richmond, VA. Richmond, VA. 2 Department of Microbiology and Immunology, Virginia Commonwealth University,

Richmond, VA. Richmond, VA. 2 Department of Microbiology and Immunology, Virginia Commonwealth University, Massive Multi-Omics Microbiome Database (M 3 DB): A Scalable Data Warehouse and Analytics Platform for Microbiome Datasets Shaun W. Norris 1 (norrissw@vcu.edu) Steven P. Bradley 2 (bradleysp@vcu.edu) Hardik

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Big Workflow: More than Just Intelligent Workload Management for Big Data

Big Workflow: More than Just Intelligent Workload Management for Big Data Big Workflow: More than Just Intelligent Workload Management for Big Data Michael Feldman White Paper February 2014 EXECUTIVE SUMMARY Big data applications represent a fast-growing category of high-value

More information

ASPERA HIGH-SPEED TRANSFER SOFTWARE. Moving the world s data at maximum speed

ASPERA HIGH-SPEED TRANSFER SOFTWARE. Moving the world s data at maximum speed ASPERA HIGH-SPEED TRANSFER SOFTWARE Moving the world s data at maximum speed PRESENTERS AND AGENDA PRESENTER John Heaton Aspera Director of Sales Engineering john@asperasoft.com AGENDA How Cloud is used

More information

IBM Smart Business Storage Cloud

IBM Smart Business Storage Cloud GTS Systems Services IBM Smart Business Storage Cloud Reduce costs and improve performance with a scalable storage virtualization solution SoNAS Gerardo Kató Cloud Computing Solutions 2010 IBM Corporation

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief

WOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud

More information

Deep Sequencing Data Analysis

Deep Sequencing Data Analysis Deep Sequencing Data Analysis Ross Whetten Professor Forestry & Environmental Resources Background Who am I, and why am I teaching this topic? I am not an expert in bioinformatics I started as a biologist

More information

Overcoming Storage Barriers in Life Sciences Research with IBM s Next Generation Sequencing Solutions. Executive Summary

Overcoming Storage Barriers in Life Sciences Research with IBM s Next Generation Sequencing Solutions. Executive Summary Overcoming Storage Barriers in Life Sciences Research with IBM s Next Generation Sequencing Solutions Sponsored by IBM Srini Chari, Ph.D., MBA October 2011 Cabot Partners Group, Inc. 100 Woodcrest Lane,

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

<Insert Picture Here> The Evolution Of Clinical Data Warehousing

<Insert Picture Here> The Evolution Of Clinical Data Warehousing The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge

More information

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe

Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. Genome Analyzer IIx Genome Analyzer IIe Go where the biology takes you. To published results faster With proven scalability To the forefront of discovery To limitless applications

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

The deployment of OHMS TM. in private cloud

The deployment of OHMS TM. in private cloud Healthcare activities from anywhere anytime The deployment of OHMS TM in private cloud 1.0 Overview:.OHMS TM is software as a service (SaaS) platform that enables the multiple users to login from anywhere

More information

RED HAT: UNLOCKING THE VALUE OF THE CLOUD

RED HAT: UNLOCKING THE VALUE OF THE CLOUD RED HAT: UNLOCKING THE VALUE OF THE CLOUD Chad Tindel September 2010 1 RED HAT'S APPROACH TO THE CLOUD IS BETTER Build better clouds with Red Hat 1. The most comprehensive solutions for clouds both private

More information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,

More information

Enabling the Big Data Commons through indexing of data and their interactions

Enabling the Big Data Commons through indexing of data and their interactions biomedical and healthcare Data Discovery Index Ecosystem Enabling the Big Data Commons through indexing of and their interactions 2 nd BD2K all-hands meeting Bethesda 11/12/15 Aims 1. Help users find accessible

More information

High Performance Compu2ng Facility

High Performance Compu2ng Facility High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,

More information

EMC ATMOS. Managing big data in the cloud A PROVEN WAY TO INCORPORATE CLOUD BENEFITS INTO YOUR BUSINESS ATMOS FEATURES

EMC ATMOS. Managing big data in the cloud A PROVEN WAY TO INCORPORATE CLOUD BENEFITS INTO YOUR BUSINESS ATMOS FEATURES EMC ATMOS Managing big data in the cloud Essentials Purpose-built cloud storage platform designed for unlimited global scale Intelligently automates management of content through highly flexible policies

More information

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment

CloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment CloudCenter Full Lifecycle Management An application-defined approach to deploying and managing applications in any datacenter or cloud environment CloudCenter Full Lifecycle Management Page 2 Table of

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community

Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Cloud BioLinux: Pre-configured and On-demand Bioinformatics Computing for the Genomics Community Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/

More information

Challenges associated with analysis and storage of NGS data

Challenges associated with analysis and storage of NGS data Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Next-generation sequencing Next-generation sequencing

More information

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk

Three data delivery cases for EMBL- EBI s Embassy. Guy Cochrane www.ebi.ac.uk Three data delivery cases for EMBL- EBI s Embassy Guy Cochrane www.ebi.ac.uk EMBL European Bioinformatics Institute Genes, genomes & variation European Nucleotide Archive 1000 Genomes Ensembl Ensembl Genomes

More information

Get More Scalability and Flexibility for Big Data

Get More Scalability and Flexibility for Big Data Solution Overview LexisNexis High-Performance Computing Cluster Systems Platform Get More Scalability and Flexibility for What You Will Learn Modern enterprises are challenged with the need to store and

More information

G E N OM I C S S E RV I C ES

G E N OM I C S S E RV I C ES GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information