Data management challenges in todays Healthcare and Life Sciences ecosystems



Similar documents
Seagate Lustre Update. Peter Bojanic

Seagate HPC /Big Data Business Tech Talk. December 2014

OpenStack Benelux. Dan Chester. Seagate Cloud Systems & Solutions

Red Hat Storage Server

Object-based Storage in Big Data and Analytics. Ashish Nadkarni Research Director Storage IDC

Solid State Storage in the Evolution of the Data Center

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

Data Storage At the Heart of any Information System. Ken Claffey, VP/GM - June 2015

SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS

Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

DDN updates object storage platform as it aims to break out of HPC niche

Performance, Reliability, and Operational Issues for High Performance NAS Storage on Cray Platforms. Cray User Group Meeting June 2007

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Seagate Cloud Systems & Solutions

Why EMC for SAP HANA. EMC is the #1 Storage Vendor for SAP (IDC Storage User Demand Study, Fall 2011)

SGI Solutions for RDSI/CAUDIT 2013 SGI

ECMWF HPC Workshop: Accelerating Data Management

Inside Lustre HSM. An introduction to the newly HSM-enabled Lustre 2.5.x parallel file system. Torben Kling Petersen, PhD.

Enabling High performance Big Data platform with RDMA

IBM ELASTIC STORAGE SEAN LEE

February, 2015 Bill Loewe

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Big + Fast + Safe + Simple = Lowest Technical Risk

A Cloud WHERE PHYSICAL ARE TOGETHER AT LAST

WOS OBJECT STORAGE PRODUCT BROCHURE DDN.COM Full Spectrum Object Storage

Big Data Challenges in Bioinformatics

Data Center Op+miza+on

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

(Scale Out NAS System)

Boas Betzler. Planet. Globally Distributed IaaS Platform Examples AWS and SoftLayer. November 9, IBM Corporation

Cisco for SAP HANA Scale-Out Solution on Cisco UCS with NetApp Storage

Storage Architectures for Big Data in the Cloud

Software-defined Storage Architecture for Analytics Computing

Protecting Big Data Data Protection Solutions for the Business Data Lake

The path to the cloud training

SGI Big Data Ecosystem - Overview. Dave Raddatz, Big Data Solutions & Performance, SGI

巨 量 資 料 分 層 儲 存 解 決 方 案

RED HAT STORAGE SERVER TECHNICAL OVERVIEW

Big Data Performance Growth on the Rise

Easier - Faster - Better

Scality Conversations (Episode 3) Ever Evolving Data Center Hardware. Leo Leung, VP of Corporate Marketing

Product Brochure. Hedvig Distributed Storage Platform Modern Storage for Modern Business. Elastic. Accelerate data to value. Simple.

Flexible Scalable Hardware independent. Solutions for Long Term Archiving

HadoopTM Analytics DDN

Personalized Medicine and IT

Dell s SAP HANA Appliance

Integrated Grid Solutions. and Greenplum

IBM General Parallel File System (GPFS ) 3.5 File Placement Optimizer (FPO)

Mit Soft- & Hardware zum Erfolg. Giuseppe Paletta

It s Not Public Versus Private Clouds - It s the Right Infrastructure at the Right Time With the IBM Systems and Storage Portfolio

THE SUN STORAGE AND ARCHIVE SOLUTION FOR HPC

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Building a Scalable Storage with InfiniBand

High Performance Computing OpenStack Options. September 22, 2015

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales

Netapp HPC Solution for Lustre. Rich Fenton UK Solutions Architect

The path to the cloud training

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

Backup and Recovery Redesign with Deduplication

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

ebay Storage, From Good to Great

Cray s Storage History and Outlook Lustre+ Jason Goodman, Cray LUG Denver

RED HAT STORAGE PORTFOLIO OVERVIEW

Optimizing Storage for Better TCO in Oracle Environments. Part 1: Management INFOSTOR. Executive Brief

Nexenta looks to expand TAM with scale-out object storage play, IPO on horizon

IBM Infrastructure for Long Term Digital Archiving

EMC BACKUP MEETS BIG DATA

IBM Storage Technical Strategy and Trends

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

NEXT GENERATION EMC: LEAD YOUR STORAGE TRANSFORMATION. Copyright 2013 EMC Corporation. All rights reserved.

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

New Hitachi Virtual Storage Platform Family. Name Date

New Storage System Solutions

HADOOP ON ORACLE ZFS STORAGE A TECHNICAL OVERVIEW

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

SMB Direct for SQL Server and Private Cloud

Please give me your feedback

THE FUTURE OF STORAGE IS SOFTWARE DEFINED. Jasper Geraerts Business Manager Storage Benelux/Red Hat

Big data management with IBM General Parallel File System

VMware Software-Defined Storage & Virtual SAN 5.5.1

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

Quick Reference Selling Guide for Intel Lustre Solutions Overview

HP BladeSystem Advantage over Cisco s UCS

Sun Constellation System: The Open Petascale Computing Architecture

WHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution

Software defined Storage The next generation of Storage virtualization

Transcription:

Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com

Evolution of Data Sets in Healthcare RIS Example (1995!)

Data Aggregation from multiple clinical disciplines

There are integration engines but data management issues are still present

The New paradigm of the connected individual 5

Why Genomics data sets are important and disruptive $4.27b by 2017!

Seagate Cloud Systems and Solutions Portfolio HPC Information OEM Cloud Services HPC Infrastructure Scale-Out Systems OEM Cloud Services COMPANY COMPANY COMPANY Integrated Engineered to optimize high-performance capacity and performance computing 40% fewer racks solutions require 1TB/s + file system performance Scalable, modular Engineered solutions for components object storage Validated and and and solutions architectures for open source and software-defined storage Private cloud appliances for backup and recovery Modular, scalable components for DIY customers COMPANY COMPANY COMPANY Custom systems Cloud Cloud Cloud backup and and and restore, 2+ million enclosures Backup a service 17+ Exabytes shipped disaster Disaster recovery recovery, as Drive Variety (HDD, and and and long-term a Service storage solutions SAS, SATA, SSD, Archive as a service hybrid) Endpoint backup Enclosures, controllers Managed services Customer-driven partnership Services: Logistics, fulfillment, warranty, design, supply chain 7

Our Technology Investment Roadmap Lowest Cost/ TB Storage Increasing Total Performance Flash Controllers Lowering System Power and Cooling Costs Lowering System Volume Self-Healing Components and Systems Lowest Total System Lifecycle Cost (Deployment, Operation, Disposal) 8

Seagate HCLS Mission/Vision Seagate brings an open approach to Intelligent Information Infrastructure accelerating bioinformatics pipelines and helping manage next-generation workloads with scale, performance, security and cost aligned to today's challenges. Our multi-tiered data storage solutions enable high-throughput, scalable geo-distributed storage, while meeting the complex compliance and data management challenges of high performance computing in bioinformatics. 9

Simplifying Scientific Workflows Using new memory and storage tiers In-Transit Analysis Memory Tiers: 3D stacked, NVRAM,PCM, etc. Close if not tied to the chip Science is not completed in totally separate stages anymore. But is rather a flowing cycle. Work and reorganization is done between physical media. Data on physical media is analyzed, reduced, and manipulated. Interconnect on chip In-Situ analysis Data moves back and forth between tiers...... Cold Storage: Tape or Disk 10

Intelligent Information Infrastructure - Powered by Seagate Delivering with PARTNERS components through complete systems and into the cloud Intelligent Information Infrastructure Storage Network Storage Computing Servers Cloud Storage Racks Drives/Flash Storage Security Network Services File Systems Management Software Controllers 11

Building a Secure High Performance Data Fabric Acquire Process / Analyze Intelligent Information Infrastructure Store Archive Distribute Accelerating Customer Workflows Acquire, Analyze, Store, Distribute and Archive.

The Big Data Transition in Genomics 2010 2015 Objective: Research Clinical Time to sequence 1 week 10 human genomes per day Cost to sequence $100K $1000 Applications: POSIX Casava Casava, GATK (Broad), SOAPdenovo (BGI) Analytics None SAP Hana Drug discovery Hadoop, Others. Cloud None Instrument to Illumina BaseSpace, Amazon, Google Leaders Instruments Life Tech, illumina, 454 illumina, Life Tech, PacBio Leaders Compute Dell, HP and IBM BGI (1PF), Google (600K cores), Amazon Leaders Storage CIFS, NFS, GPFS, Lustre NFS,CIFS, GPFS, Lustre, Object Stores 13

Life Science Solutions : At Scale! Until the arrival of petascale supercomputers, no one could piece together the entire HIV capsid an assemblage of more than 1,300 identical proteins in atomic-level detail. The simulations that added the missing pieces to the puzzle were conducted during testing of Blue Waters, a new supercomputer at the National Center for Supercomputing Applications at the University of Illinois. Mature HIV-1 capsid structure by cryo-electron microscopy and all-atom molecular dynamics Nature 497, 643 646 (30 May 2013) doi:10.1038/nature12162 Received 02 November 2012 Accepted 05 April 2013 Published online 29 May 2013 14

The converged Infrastructure Solution HPC Data CS9000 (Scale out Parallel file system) Clinical Object Storage (Scality for Unstructured data CIFS, NFS and Archive) Object Servers (S3) Cloud Backup/DR and Long term Archive HPC Compute and Analy3cs farm HSM policy engine (e.g. irods, TSM, DMF) Sequencers and other Lab Instruments

The converged Infrastructure Solution HPC Data CS9000 (Scale out Parallel file system) Clinical Object Storage (Scality for Unstructured data CIFS, NFS and Archive) Object Servers (S3) Cloud Backup/DR and Long term Archive HPC Compute and Analy3cs farm HSM policy engine (e.g. irods, TSM, DMF) Sequencers and other Lab Instruments

Integration across Parallel File System and Grid Data services irods Data Grid irods Resource irods Resource irods Resource ClusterStor File System Seagate StrataStor Cleversafe, Scality, SwiTstack Parallel Applica3on Scratch and Medium term Storage Virtual Machine Environment VM s Grid Services Object Store

Secure Access to Data on the Clinical Side Research Systems Sequence Data irods-enabled samtools 4) 5) irods 3) NCGenes Clinician Data Sets 1) Clinician request for sequence reads on patient X 2) Patient id lookup to obtain subject id 3) Subject id lookup in irods 4) Data sets packaged in zip file and retrieved 5) Data unzipped and displayed within secure workspace 2) 1) Portal Researcher Secure Medical Workspace EMR Clinical Studies Clinical Systems

ClusterStor HSM Partners irods, Cray TAS and SGI DMF ClusterStor v2.0 or greater w/lustre 2.5 is HSM Ready Requires HSM application and support from a partner HSM partner options SGI DMF Cray TAS irods 20

The converged Infrastructure Solution HPC Data CS9000 (Scale out Parallel file system) Clinical Object Storage (Scality for Unstructured data CIFS, NFS and Archive) Object Servers (S3) Cloud Backup/DR and Long term Archive HPC Compute and Analy3cs farm HSM policy engine (e.g. irods, TSM, DMF) Sequencers and other Lab Instruments

New predictive data Analytics architectures 22

Hadoop Workflow Accelerator on ClusterStor Overview Hadoop compute uses the ClusterStor storage system for primary input and output operations. MapReduce temporary files are saved using minimal local storage or can be configured to save to ClusterStor eliminating all local storage. The Hadoop Workflow Accelerator eliminates the reliance on direct attached storage, allowing for truly diskless compute nodes and independent scaling of compute and storage Hadoop Workflow Accelerator optimizes Hadoop performance for you workloads and applications while improving TCO. ClusterStor Lustre Hadoop-on-ClusterStor 40 Gbs Ethernet or Infiniband Between Racks Parallel FS Hadoop CPU CPU CPU HDFS Hadoop CPU CPU CPU Linux Lustre Shared Disk HDFS Local Disks 23

The converged Infrastructure Solution HPC Data CS9000 (Scale out Parallel file system) Clinical Object Storage (Scality for Unstructured data CIFS, NFS and Archive) Object Servers (S3) Cloud Backup/DR and Long term Archive HPC Compute and Analy3cs farm HSM policy engine (e.g. irods, TSM, DMF) Sequencers and other Lab Instruments

Software Defined CSS Validated Architectures Scality is Validated & Supported By Seagate For Key Customer Use Cases Validated Architecture Use Cases Tailored Hardware Configurations Shipped with SW installed Seagate options for Billing, etc Tested solutions Benchmarks End to end support Managed Services option 25

Software Defined CSS Validated Architecture Family SAN & NAS Object Storage Cleversafe Scality Nexenta OpenStack Swift Basho Ceph Terabytes 500 TBs Petabytes 26

Single Pane of Glass for Infrastructure and data Management Fully Integrated Solution Visibility and Management Low level diagnostics, embedded monitoring, proactive alerts Easy to Manage Real Time Monitoring 27

Thank you!