ACCELERATING COLLABORATION IN LIFE SCIENCE WITH DDN STORAGE
|
|
- Kristina Houston
- 7 years ago
- Views:
Transcription
1 WHITEPAPER ACCELERATING COLLABORATION IN LIFE SCIENCE WITH DDN STORAGE INTRODUCTION Imagine a world where the genome of each and every citizen is sequenced at birth and several more times throughout their life. Now imagine as a CIO having to define, source and implement a datacenter infrastructure to host, share and archive that data, compliant to HIPAA and CLIA for up to 20+ years. These are just some of the big data issues and challenges that healthcare, pharmaceutical, analytics and SEQuencing as a Service (SEQaaS) companies will need to address in the very near future. With the rapid decline in the cost of sequencing and time to results approaching $1,000/Exome 1 in less than a week, sequencing a patient s DNA will be as common-place as an CAT scan or MRI. Today, the art of sequencing a full human genome is still in its infancy, compared to what is just on the horizon. The most advanced second generation sequencing instruments available today (NGS2), like the Life Technologies Post-Light Ion Semiconductor Sequencing platform Proton and Illumina s HiSeq2200, will soon be surpassed by the emerging and highly anticipated third generation (NGS3) single molecule, PCR free, micro-fluid Nanopore solutions. These pending advancements, combined with emerging improvements in data analytics, will soon enable physicians to access a fully aligned, map-reduced genome in less than 24 hours. In short, the data deluge we have been hearing so much about is just about to start. In this paper, we will examine how a state-of-the-art facility like Baylor or Yale Medical Center processes a human genome with analysis and diagnosis. We will then contrast that against what an outsourcing paradigm might look like with its benefits. The outsourcing of high-value tasks is a natural evolution in business. It virtually guarantees that a product or service will persist and evolve to their highest level of value. For sequencing, we see this evolution being led by companies like PerkinElmer, Illumina, and Beckman Coulter. In parallel, we also see genome interpretation for profit emerge from providers who specialize in analytics. The focus of the outsourcing discussion will be around the massive amounts of data which have to pass between the various points of the process. We will then show how a well thought out private cloud solution from DataDirect Networks can provide a fast, flexible, secure and cost-effective means for automating the movement of big data while improving quality of service. 1 Exome efficient way to selectively sequence the coding regions of the genome of one individual
2 TABLE OF CONTENTS INTRODUCTION...1 TABLE OF CONTENTS...2 CLINICAL GENETIC SEQUENCING...3 Sequencing Today... 3 Sequencing... 4 Outsourcing Services... 4 HOUSTON WE HAVE A PROBLEM...5 WEB OBJECT SCALER (WOS)...6 HOW WOS CAN ACCELERATE THE DISCOVERY PROCESS...7 A CASE STUDY...8 CONCLUSION...10
3 Figure 1 - NGS2 vs. pending NGS3 Nanopore Technology CLINICAL GENETIC SEQUENCING GENETIC TESTS ARE NOT ALWAYS DEFINITIVE--THEY ARE JUST ONE TOOL USED IN THE ART OF DIAGNOSIS. As it stands today, none of the sequence manufactures are FDA approved. So, technically you can t use a commercial sequencer from Life Technologies, Illumina, Roche 454 or others for full chromosome or exome sequencing and/or diagnosis. So, how are FDA approved geneticscreenings done today? It turns out that only very specific genetic tests are FDA approved. A well known example of this is the BRCA1 (BReast CAncer gene one) and BRCA2 (BReast CAncer gene two), where targeted tests via arrays or kits are used. By today s standards, a full genetic screening for BRACA1/BRACA2, or the hundreds of other known genetic disorders is possible, but not sanctioned by the FDA. SEQUENCING TODAY For our discussion, let s assume there is an FDA approved sequencer that is accurate enough for human diagnosis (both of these assumptions will come true in the next 12 to 18 months). How would a research hospital like Yale or Baylor perform these genetic screenings to make an early and informed diagnosis? And do so for the vast number of genetic disorders that carry through the population? There are two possible paths, one is preemptive and the other is reactive. Preemptive screening implies sequencing a child s exomes in vivo or shortly after birth and looking for genetic code variations that could lead to future disorders. The second is to sequence when an undiagnosed genetic symptom is encountered. In either case, the process would look something like this (which is performed for each requested sequence): The physician would take a sample (blood or cheek swab), from the patient and send it off to the sequencing center (via mail, internal mail or courier). Both Baylor and Yale have well staffed centers local to the hospital. The sample is queued, scheduled then prepared, sequenced and aligned. The resulting aligned genome or exome files (it s just data now) are sent to a bioinformaticians who runs a number of searches against known genetic disorders. The bioinformatics center sends their results to a geneticist who creates a report for the requesting physician. The results are sent back to the physician who may consult with a geneticist before meeting with the patient. The genetic data is then archived per HIPAA and CLIA for up to 20 years. It should be noted that, as of today, genetic tests are not always definitive; they are just one tool used in the art of diagnosis. 3
4 AUTOMATED ANALYTICS TOOLS ARE VERY SOPHISTICATED AND ARE CONSTANTLY EVOLVING AND BEING UPDATED WITH NEW BIO- MARKERS WEEKLY IF NOT DAILY. SEQUENCING What we are seeing today is the refinement and subsequent consolidation of SEQaaS market to four or five players who will ultimately service 80% of the market. These companies focus on the art of human reference sequencing for early stage diagnosis, cancer research, drug efficacy, full de novo sequencing and even Combined DNA Index System (CODIS) for law enforcement. These sequencing as a service (SEQaaS) providers are highly efficient. They are evolving their business models to address the unique and varying needs of clinical and research (Pharma) community i.e.: CLIA and HIPAA compliant facilities, law enforcement and long term archiving. There are even centers today that provide cloud compute and storage services, allowing Pharma and clinicians to run their own alignment programs within a private, secure cloud environment. How are SEQaaS providers any different from the centers at Yale or Baylor? The answer is twofold, the first being the collective combination of cost, quality and time to results; the second is flexibility or their willingness to modify and adopt their process flow, to meet the regulatory and business needs of their customers. OUTSOURCING SERVICES Beyond price, there are several other metrics that customer of SEQaaS providers look for. Attributes like reputation, brand awareness, quality or accuracy of the results (signified by a metric call coverage), these are all very important. For clinical customers, their primary concerns are around accuracy, down to one error in four billion (the number of base-pairs in one of the two human genomes we carry in our cells). This level of accuracy (higher than what is achieved in mitoses) is necessary because just a single letter error in just the right location can indicate a false positive for a number of genetic based disorders. And then there s time to results. Today, a full human genome takes about two to four weeks to sequence, align and analyze. That time will further drop with the introduction of single molecule sequencers (nano-pore), along with other data centric advancements discussed in this paper. SEQaaS Providers are Flexible As noted earlier, a SEQaaS business is financially motivated to listen, quickly respond and adapt their business model to meet customer needs. This behavior positively impacts both their legacy and future customers, while accelerating the discovery process a win-win for all. Internal sequencing centers, associated with large clinical institutions like Yale, are focused primarily on research. Doing ad hock clinical runs is timely, expensive and disrupt lab flow and efficiency. If they were to finance, staff and equip a clinical sequencing center, they would quickly discover that SEQaaS providers can deliver the needed results faster, more accurately and at a price point they could never match, all due to achieving economies of scale. Analytics is an Emerging Market The second emerging trend is the birth of bio analytic companies. These are for profit businesses that have developed a suite of innovative data-mining services to address the needs of the clinical, research and the law enforcement community. There are companies who provide: Human genome interpretation for disease Looking for anomalies such as SNP calls, deletions, insertion, extensions, palindromes, etc. and correlating the results to known genetic disorders as well as noting the relationship to other known genes. Full genome interpretation for drugs efficacy Pharmacogenetics or Chemotherapy. Genome interpretation for law enforcement Creating high-level profiles of perpetrators based on genetic markers; gender, ethnicity, color of eyes, hair, blood type, even age. These analytics firms have a suite of automated software tools that are very sophisticated and can generate detailed target identification reports in minutes. These custom compute intensive programs are constantly evolving and being updated with new bio-markers weekly if not daily. Their workflow involves ingesting terabytes of genetic data from the SEQaaS 4
5 IF YOU ATTEMPT TO PUSH THAT MUCH DATA INTO A PUBLIC CLOUD, IT MIGHT TAKE DAYS. companies and delivering their results to physicians, geneticists, researchers and law enforcement agencies. The deliverable is in an easily understandable, natural language report that summarizes any and all relevant genetic data. For the research and Pharma community, the reports map where anomalies are, their genetic impact, along with their relationship to related-markets. For clinical customers, they can even deliver the results in electronic medical records (EMR) format. As new bio-markers are discovered, archived data can be rescreen, supplement reports generated and forwarded, based on class of services. Future Data Intensive Services As advancement in genetics research and development progress, you can be assured that the research and clinical community will continue to expand and include new institutions who will further strain the process of collaborative research. These include institutions who: Provide Gene Synthesis services for genetically engineered organisms or gene therapy i.e., DNA2.0, IDT, GenScript, etc.; Breed mice or primates with transgenic and gene knockout for drug or genetic marker research i.e., Jackson Labs, DeltaOne, PolyGene, etc.; And in the not so distant future Xenotransplantation firms who grow genetically modified animals i.e., pigs or primates, who are designed (from inception), to grow human organs which are harvested and delivering to surgeons for transplant. HOUSTON WE HAVE A PROBLEM As we have stated, the outsourcing of sequencing, analytics, gene synthesis, transgenic & gene knock and the promise of xenotransplantation, all involve the movement of gigabytes of information between multiple researchers and clinicians. So the question becomes, how do you efficiently move this data around between multiple users, quickly, securely and cost effectively? Today most genetic data is moved via a cloud service or via data programs like Dropbox or sftp. For very large data sets 2, hard disk drives are shipped back and forth between the end customers, the SEQaaS provider and the data-analytics company. If the genetic data is non-clinical (research), then you can use public clouds which are costly and slow. At 10Mb/sec. (that s a very high rate); it takes ~15 minutes to move a single Gigabyte of data into the cloud and then something less than 15 minutes to download. Again, if you re staging base-calls, then your datasets are large (in the order of 100GB). If you attempt to push that much data into a public cloud, via Dropbox or sftp to multiple end points, it might take days. In addition to the size/time constraints, there are other issues: Security i.e., access rights, credentials, encryption and so forth. HIPAA, CLIA and CODIS compliance; Archiving and protecting data, in some cases for >20 years; The logistics of sharing data between multiple sites behind a firewall and How to manage the storage pool and allocate the bandwidth for these moves. 2 Such large data sets are typically associated with base-calls the interpretation of single-strand DNA sequence data 5
6 The answer as it turns out is technology (no surprise here). Over the years, we have all benefited from advancements in Internet bandwidth, the increased density & speed of storage platforms and the migration of computer driven intelligence into every facet of our lives. When these advancements are applied to the above problem, what emerges is an eloquent and very cost effective storage appliance from DataDirect Networks called Web Object Scaler or WOS. To the casual observer, WOS looks like a NAS server, and in some ways it is. However, we will show how a WOS-based private cloud, can fundamentally change how information is shared, managed and processed securely all while lowering the cost of research and accelerating time to discovery. WEB OBJECT SCALER (WOS) A WOS-BASED PRIVATE CLOUD FUNDAMENTALLY CHANGES HOW INFORMATION IS SHARED, MANAGED AND PROCESSED SECURELY. With the rapid adoption of outsourcing, physician, pharmaceutical companies and researchers will all have access to a competing group of service providers who are highly motivated, efficient and customer focused. However, this new emerging paradigm will need a feature rich infrastructure for moving gigabytes data between multiple end points. This is where a private cloud can not only address the stated needs, but significantly improve the rate of discovery, all while lowering cost. The WOS private cloud solution is a highly optimal approach for addressing the growing and complex needs of a global Life Science research community. WOS is a revolutionary object-based, federated storage system for the global distribution of immutable objects. It addresses a collaboration paradigm by providing users a unique name space within their network (directory or drive name-space), where they can read and/or write files which automatically appear in a directory structure somewhere else in the world. The destination is nothing more than a collection of IP addresses associated with the WOS hardware. Data is stored as a WOS object. You can think of an object as a file, or a single tar. gz object or a zip file, etc. Each WOS object has associated with it a unique object ID and a metadata header or preamble. The object ID is created upon placement of the object within the WOS environment (PUT command). The header has a user-defined 65KB metadata preamble which can be filled with either text or binary. This metadata can be used as an object identifier i.e., file name, date, size, etc., patient information, an Electronic Record image, LIMS data, summary of analysis, or any combination thereof. WOS objects also contain a check sum, a signature (used in the creation of the Object ID), and the object s policies which determine: Where to PUT an object (destination as many as 64 unique nodes); Who can GET or access a given object, Who can DELETE the object; DDN WOS is very simple file depository that supports various industry standard interfaces such as native RESTful, S3, NFS, Figure 2 - A Global Collaboration Platform for Pharma DDN.COM
7 and CIFS to name a few. From the user perspective, it behaves like Dropbox with half the latency (peer to peer transaction), and safely located behind the firewall. In figure 4, we see an example of a WOS object containing a 2GB BAM file payload. If we assume a 10 Mb/ sec upload bandwidth, the metadata would appear on the server almost instantly and the BAM payload would be locally accessible in just under 30 minutes just copy the object and WOS does the rest. At its core, WOS is an intelligent software stack that allows a massively scalable content delivery platform to be created out of small building blocks, enabling the system to start small, and easily grow to a multi-petabyte scale (see figure 3). WOS is a fully distributed system, meaning there are absolutely no single points of failure or bottlenecks. This allows the system to scale with each new cloud building block (called nodes) linearly adding to the system s performance capabilities and storage capacity. Detailed technical information on WOS can be found at Figure 3 WOS 6000 storage appliance for Global Collaboration 4BM Sub-Object 4BM Sub-Object 4BM Sub-Object 4BM Sub-Object 2GB BAM file Metadata Header 150GB Payloard WOS Object Profile 64KBUser Metadata Check Sum Policy Signature Figure 4 - WOS Object HOW WOS CAN ACCELERATE THE DISCOVERY PROCESS The WOS solution provides to the Pharma researcher or Clinician the ability to share their research data as quickly and simply as an idea. Gone is the day of zipping up files and sftp or Dropbox or the hundred other different ways researchers and IT departments facilitate data sharing. It is all reduced to a simple file copy, followed by a phone call... Did you see what I did? These are unexpected results. 7
8 Patterson Newark Elizabeth Edison Trenton Cherry Hill wood Vineland Millville Atlantic City A CASE STUDY THE ABILITY TO LEVERAGE WOS POLICIES IMPROVES RESEARCHER EFFECTIVENESS AND PROMOTES COLLABORATION. In this scenario, we will study a sequencing/ analysis effort by a large pharmaceutical company that is globally dispersed. The company is doing research to understand the variability among patients for determining drug efficacy and safety for a new smallmolecule drug. It is becoming increasingly expensive and inefficient for the company to build and maintain a sequencing infrastructure which is not directly associated with their core competence. Hence, they have decided to outsource full genome sequencing to a SEQaaS company in the Northeast and data analytics to a company in the EU. This effort involves three different research centers: Japan, California and Switzerland. The analytics company needs to send their results to all three sites with minimal delay and the SEQaaS company needs to deliver aligned SAM files to the analytics company and FASTQ/SAM files to Bio/Pharma firm in California. The SEQaaS firm has to archive all data for 10 years in two separate locations. The map in Figure 5 shows just how complex a global collaboration initiative can be. However, the WOS private cloud solution, provides a secure namespace where files can be shared globally with a high level of flexibility. In fact, WOS is so flexible that it s possible for the analytics company in France to collaborate only with the Basel Switzerland Pharma location and then leverage their GigE leased line to extend the collaboration to California and Japan. And, this is all accomplished with policy-based automation. The ability to leverage WOS policies at an object level is extremely powerful. It greatly simplifies the global distribution of data, improves researcher effectiveness and promotes collaboration. It has been shown that once this level of collaboration begins, it accelerates and evolves; fostering new applications and innovations. In this scenario, the SEQaaS Company in New England has in its datacenter a petabyte of DDN SFA 12K-E storage running DDN GRIDScaler parallel file system (see Figure 7). The storage array interfaces via InfiniBand FDR to a datacenter server farm. Both the Proliant servers and the Illumina HiSeq2200 sequencers, mount via NFS, directly to the DDN WOS array. Ingested data from the HiSeq2200 is RAID protected in-place, based on pre-set WOS policies. The staged data is then copied to the SFA12K-E where it is processed by the datacenter servers and run through the CASAVA pipeline and GATK for SNP calling. The resulting FASTQ files are then gap aligned with BWA and the final aligned BAM file is pushed back to the WOS storage array along Delano Oxnard Yuma, Arizona Pharma sits SEQaaS Analytics Archive GigE Leased Line Internet (WWW) Figure 5 - Collaboration map 8
9 with the unaligned FASTQ files. This particular setup is furthered automated with irods which is running native on the SFA12-E. irods provides a customizable data management architecture that assigns all the appropriate policies to the objects being posted to the WOS storage array. It also automates the movement of data through the network and retires (deletes) HiSeq2200 staged-data after the FASTQ and BAM file are staged back to the WOS array. Once the data is staged, the WOS policies initiate an asynchronous peer-to-peer copy of the data to the backup-site in New Jersey, the Pharma facility in California and the data analytics company in France. Once the data is safely staged, irods can then send off an to a list of recipients informing them that the data is now local and accessible. Illumina HiSeq2200 Sequencers 10 GigE 10Mb/sec. Upload Web Object Scaler France California New Jersey i.r.o.d.s FDR-IB i.r.o.d.s DDN SFA10K-E Figure 6 - Data flow view of SEQaaS provider in New England HISEQ2200 Data ingested to WOS array Each object is RAID protected in place SFA12K-E As data is staged to WOS, it is copied to SFA12K-E Copy performed via irods rules engine COMPUTER SFA12-E staged data is run through CASAVA and GATK for SNP calling At end of run, SAM and FASTQ data is staged back to WOS Original HiSeq2200 data is deleted WOS ARRAY Data staged via irods rules engine; WOS Policies initiate collaboration rules WOS archived data is copied to data analytics firm in France FASTQ/SAM data is also copied to Pharma in CA and WOS data warehouse in NJ, USA ANALYTICS Local data in the WOS array is processed per customers requirements Results are staged back to WOS array and policies send results to Pharma in Basel WOS policies there send analytics data on the CA and Japan Figure 7 - Collaboration data flow view 9
10 CONCLUSION What is so unique and remarkable about the outlined approach is how a very complex collaboration effort is reduced to a simple file read/write command (Linux bash <cp> command or drag & drop in Windows). Data sets from round the world just appear on a local drive, no need to down-load objects from an S3 cloud or tar & sftp or un-box a shipped drive and hope it will works this time. In short, all the underlining complexities of sending data objects up to 256 destinations while ensuring data integrity is accomplished with a set of pre-determined policies. Once those policies are defined for a given directory space, any transaction within that space will flawlessly execute per those policies. The DataDirect Networks WOS platform is a mature, hardened and highly scalable solution for sharing ideas in a global community. Our account managers, applications specialists and engineering teams are dedicated to delivering the highest value and performance for your investment. DDN storage solutions will grow to meet your needs, help consolidate and simplify your data center while accelerating global collaboration. Our high value storage solutions are backed by enterprise class services and support and our subject matter experts will work with your team to ensure best in class performance and value for the life of the platform. ABOUT DDN DataDirect Networks (DDN) is the world s leading big data storage supplier to data-intensive, global organizations. For more than 15 years, DDN has designed, developed, deployed and optimized systems, software and solutions that enable enterprises, service providers, universities and government agencies to generate more value and to accelerate time to insight from their data and information, on premise and in the cloud. Organizations leverage the power of DDN technology and the deep technical expertise of its team to capture, store, process, analyze, collaborate and distribute data, information and content at largest scale in the most efficient, reliable and cost effective manner. DDN customers include many of the world s leading financial services firms and banks, healthcare and life science organizations, manufacturing and energy companies, government and research facilities, and web and cloud service providers. For more information, visit our website or call SALES@DDN.COM DataDirect Networks, Inc. All Rights Reserved. v2 (5/15) 10
Accelerating Collaboration in Life Science with DDN Storage. ddn.com. DDN Whitepaper. 2012 DataDirect Networks. All Rights Reserved.
DDN Whitepaper Accelerating Collaboration in Life Science with DDN Storage Table of Contents Introduction 3 Clinical Genetic Sequencing 4 Sequencing Today 4 Sequencing 5 Outsourcing Services 5 SEQaaS Providers
More informationANY SURVEILLANCE, ANYWHERE, ANYTIME
ANY SURVEILLANCE, ANYWHERE, ANYTIME WHITEPAPER DDN Storage Powers Next Generation Video Surveillance Infrastructure INTRODUCTION Over the past decade, the world has seen tremendous growth in the use of
More informationWOS for Research. ddn.com. DDN Whitepaper. Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved.
DDN Whitepaper WOS for Research Utilizing irods to manage collaborative research. 2012 DataDirect Networks. All Rights Reserved. irods and the DDN Web Object Scalar (WOS) Integration irods, an open source
More informationENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and
More informationPutting Genomes in the Cloud with WOS TM. ddn.com. DDN Whitepaper. Making data sharing faster, easier and more scalable
DDN Whitepaper Putting Genomes in the Cloud with WOS TM Making data sharing faster, easier and more scalable Table of Contents Cloud Computing 3 Build vs. Rent 4 Why WOS Fits the Cloud 4 Storing Sequences
More informationAccelerate > Converged Storage Infrastructure. DDN Case Study. ddn.com. 2013 DataDirect Networks. All Rights Reserved
DDN Case Study Accelerate > Converged Storage Infrastructure 2013 DataDirect Networks. All Rights Reserved The University of Florida s (ICBR) offers access to cutting-edge technologies designed to enable
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationWOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief
DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud
More informationANY THREAT, ANYWHERE, ANYTIME Scalable.Infrastructure.to.Enable.the.Warfi.ghter
WHITEPAPER ANY THREAT, ANYWHERE, ANYTIME Scalable.Infrastructure.to.Enable.the.Warfi.ghter THE BIG DATA CHALLENGE AND OPPORTUNITY The.proliferation,.management.and.analysis.of.intelligence.data.is.a.fast.growing.concern.
More informationObject storage in Cloud Computing and Embedded Processing
Object storage in Cloud Computing and Embedded Processing Jan Jitze Krol Systems Engineer DDN We Accelerate Information Insight DDN is a Leader in Massively Scalable Platforms and Solutions for Big Data
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationWHITE PAPER SPLUNK SOFTWARE AS A SIEM
SPLUNK SOFTWARE AS A SIEM Improve your security posture by using Splunk as your SIEM HIGHLIGHTS Splunk software can be used to operate security operations centers (SOC) of any size (large, med, small)
More informationData management challenges in todays Healthcare and Life Sciences ecosystems
Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com Evolution of Data Sets in Healthcare
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationScaling up to Production
1 Scaling up to Production Overview Productionize then Scale Building Production Systems Scaling Production Systems Use Case: Scaling a Production Galaxy Instance Infrastructure Advice 2 PRODUCTIONIZE
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationAn Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing
An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates
More informationHow A V3 Appliance Employs Superior VDI Architecture to Reduce Latency and Increase Performance
How A V3 Appliance Employs Superior VDI Architecture to Reduce Latency and Increase Performance www. ipro-com.com/i t Contents Overview...3 Introduction...3 Understanding Latency...3 Network Latency...3
More informationSave Time and Money with Quantum s Integrated Archiving Solution
Case Study Forum WHITEPAPER Save Time and Money with Quantum s Integrated Archiving Solution TABLE OF CONTENTS Summary of Findings...3 The Challenge: How to Cost Effectively Archive Data...4 The Solution:
More informationAlternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
More informationT a c k l i ng Big Data w i th High-Performance
Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A
More informationBig Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
More informationGeospatial Imaging Cloud Storage Capturing the World at Scale with WOS TM. ddn.com. DDN Whitepaper. 2011 DataDirect Networks. All Rights Reserved.
DDN Whitepaper Geospatial Imaging Cloud Storage Capturing the World at Scale with WOS TM Table of Contents Growth and Complexity Challenges for Geospatial Imaging 3 New Solutions to Drive Insight, Simplicity
More informationKey Considerations for Managing Big Data in the Life Science Industry
Key Considerations for Managing Big Data in the Life Science Industry The Big Data Bottleneck In Life Science Faster, cheaper technology outpacing Moore s law Lower costs and increasing speeds leading
More informationA Survey of Shared File Systems
Technical Paper A Survey of Shared File Systems Determining the Best Choice for your Distributed Applications A Survey of Shared File Systems A Survey of Shared File Systems Table of Contents Introduction...
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationHigh Performance Compu2ng Facility
High Performance Compu2ng Facility Center for Health Informa2cs and Bioinforma2cs Accelera2ng Scien2fic Discovery and Innova2on in Biomedical Research at NYULMC through Advanced Compu2ng Efstra'os Efstathiadis,
More informationFUJITSU Transformational Application Managed Services
FUJITSU Application Managed Services Going digital What does it mean for Applications Management? Most public and private sector enterprises recognize that going digital will drive business agility and
More informationObject Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
More informationIntroduction to NetApp Infinite Volume
Technical Report Introduction to NetApp Infinite Volume Sandra Moulton, Reena Gupta, NetApp April 2013 TR-4037 Summary This document provides an overview of NetApp Infinite Volume, a new innovation in
More informationMaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products
MaxDeploy Ready Hyper- Converged Virtualization Solution With SanDisk Fusion iomemory products MaxDeploy Ready products are configured and tested for support with Maxta software- defined storage and with
More informationDiagram 1: Islands of storage across a digital broadcast workflow
XOR MEDIA CLOUD AQUA Big Data and Traditional Storage The era of big data imposes new challenges on the storage technology industry. As companies accumulate massive amounts of data from video, sound, database,
More informationHadoopTM Analytics DDN
DDN Solution Brief Accelerate> HadoopTM Analytics with the SFA Big Data Platform Organizations that need to extract value from all data can leverage the award winning SFA platform to really accelerate
More informationProviding Self-Service, Life-cycle Management for Databases with VMware vfabric Data Director
Providing Self-Service, Life-cycle Management for Databases with VMware vfabric Data Director Graeme Gordon Senior Systems Engineer, VMware 2013 VMware Inc. All rights reserved Traditional IT Application
More informationwww.repstor.com Maximise your Microsoft investment to provide Legal Matter Management
www.repstor.com Maximise your Microsoft investment to provide Legal Matter Management Maximise your Microsoft investment to provide Legal Matter Management custodian for legal extends the powerful document
More informationCarestream Information Management Solutions. Managing the explosion in patient information
Managing the explosion in patient information Carestream Information Management Solutions Carestream Information Management Solutions The right information in the right place at the right time from the
More informationUsing Predictive Analytics to Build a World Class Healthcare System
Using Predictive Analytics to Build a World Class Healthcare System Swati Abbott CEO, Blue Health Intelligence Doug Porter SVP and CIO, Blue Cross/Blue Shield Association Using Predictive Analytics to
More informationNetApp Big Content Solutions: Agile Infrastructure for Big Data
White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data
More informationBrocade Network Monitoring Service (NMS) Helps Maximize Network Uptime and Efficiency
WHITE PAPER SERVICES Brocade Network Monitoring Service (NMS) Helps Maximize Network Uptime and Efficiency Brocade monitoring service delivers business intelligence to help IT organizations meet SLAs,
More informationSolving Rendering Bottlenecks in Computer Animation
Solving Rendering Bottlenecks in Computer Animation A technical overview of Violin NFS caching success in computer animation November 2010 2 Introduction Computer generated animation requires enormous
More informationAutomated DNA sequencing 20/12/2009. Next Generation Sequencing
DNA sequencing the beginnings Ghent University (Fiers et al) pioneers sequencing first complete gene (1972) first complete genome (1976) Next Generation Sequencing Fred Sanger develops dideoxy sequencing
More informationWHITE PAPER. Reinventing Large-Scale Digital Libraries With Object Storage Technology
WHITE PAPER Reinventing Large-Scale Digital Libraries With Object Storage Technology CONTENTS Introduction..........................................................................3 Hitting The Limits
More informationDISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES
DATAGUISE WHITE PAPER SECURING HADOOP: DISCOVERING AND SECURING SENSITIVE DATA IN HADOOP DATA STORES OVERVIEW: The rapid expansion of corporate data being transferred or collected and stored in Hadoop
More informationPerformance Optimization Guide
Performance Optimization Guide Publication Date: July 06, 2016 Copyright Metalogix International GmbH, 2001-2016. All Rights Reserved. This software is protected by copyright law and international treaties.
More informationENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE
ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE Hadoop Storage-as-a-Service ABSTRACT This White Paper illustrates how EMC Elastic Cloud Storage (ECS ) can be used to streamline the Hadoop data analytics
More informationCloud Computing And Pharma: A Prescription For Success. How and why this critical technology will change the industry. kellyservices.
Cloud Computing And Pharma: A Prescription For Success How and why this critical technology will change the industry kellyservices.us Contents Introduction Introduction / 3 01 Streamlining Operations and
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationSOLUTION BRIEF. IMAT Enhances Clinical Trial Cohort Identification. imatsolutions.com
SOLUTION BRIEF IMAT Enhances Clinical Trial Cohort Identification imatsolutions.com Introduction Timely access to data is always a top priority for mature organizations. Identifying and acting on the information
More informationIBM Software Information Management Creating an Integrated, Optimized, and Secure Enterprise Data Platform:
Creating an Integrated, Optimized, and Secure Enterprise Data Platform: IBM PureData System for Transactions with SafeNet s ProtectDB and DataSecure Table of contents 1. Data, Data, Everywhere... 3 2.
More informationSaving healthcare costs by implementing new genetic risk tests for early detection of cancer and prevention of cardiovascular diseases
Saving healthcare costs by implementing new genetic risk tests for early detection of cancer and prevention of cardiovascular diseases Jeff Gulcher, MD PhD Chief Scientific Officer and co-founder decode
More informationWOS. High Performance Object Storage
Datasheet WOS High Performance Object Storage The Big Data explosion brings both challenges and opportunities to businesses across all industry verticals. Providers of online services are building infrastructures
More informationRevitalising your Data Centre by Injecting Cloud Computing Attributes. Ricardo Lamas, Cloud Computing Consulting Architect IBM Australia
Revitalising your Data Centre by Injecting Attributes Ricardo Lamas, Consulting Architect IBM Australia Today s datacenters face enormous challenges: I need to consolidate to reduce sprawl and OPEX. I
More informationObject Storage A Dell Point of View
Object Storage A Dell Point of View Dell Product Group 1 THIS POINT OF VIEW PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationThe Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays
The Evolution of Microsoft SQL Server: The right time for Violin flash Memory Arrays Executive Summary Microsoft SQL has evolved beyond serving simple workgroups to a platform delivering sophisticated
More informationIncreased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER
Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES TABLE OF CONTENTS Introduction... 3 Overview: Delphix Virtual Data Platform... 4 Delphix for AWS... 5 Decrease the
More informationAutomated and Scalable Data Management System for Genome Sequencing Data
Automated and Scalable Data Management System for Genome Sequencing Data Michael Mueller NIHR Imperial BRC Informatics Facility Faculty of Medicine Hammersmith Hospital Campus Continuously falling costs
More informationObject Oriented Storage and the End of File-Level Restores
Object Oriented Storage and the End of File-Level Restores Stacy Schwarz-Gardner Spectra Logic Agenda Data Management Challenges Data Protection Data Recovery Data Archive Why Object Based Storage? The
More informationUNLEASH THE POWER OF YOUR DATA
BANKING 3.0 UNLEASH THE POWER OF YOUR DATA BUSINESS INTELLIGENCE ANALYTICS CDW FINANCIAL SERVICES 66% of banking and capital markets executives have changed the way they approach big decision-making as
More informationTIBCO Spotfire Guided Analytics. Transferring Best Practice Analytics from Experts to Everyone
TIBCO Spotfire Guided Analytics Transferring Best Practice Analytics from Experts to Everyone Introduction Business professionals need powerful and easy-to-use data analysis applications in order to make
More informationMake the Most of Big Data to Drive Innovation Through Reseach
White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability
More informationHitachi Cloud Services for Private File Tiering. Low Risk Cloud at Your Own Pace. The Hitachi Vision on Cloud
S o l u t i o n P r o f i l e Hitachi Cloud Services for Private File Tiering Low Risk Cloud at Your Own Pace Hitachi Data Systems is a premier provider of cloud storage infrastructure, services and solutions
More informationBricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation
Bricata Next Generation Intrusion Prevention System A New, Evolved Breed of Threat Mitigation Iain Davison Chief Technology Officer Bricata, LLC WWW.BRICATA.COM The Need for Multi-Threaded, Multi-Core
More informationGenetic diagnostics the gateway to personalized medicine
Micronova 20.11.2012 Genetic diagnostics the gateway to personalized medicine Kristiina Assoc. professor, Director of Genetic Department HUSLAB, Helsinki University Central Hospital The Human Genome Packed
More informationWatson to Gain Ability to See with Planned $1B Acquisition of Merge Healthcare Deal Brings Watson Technology Together with Leader in Medical Images
Watson to Gain Ability to See with Planned $1B Acquisition of Merge Healthcare Deal Brings Watson Technology Together with Leader in Medical Images Armonk, NY and CHICAGO -- [August 6, 2015]: IBM (NYSE:
More informationMaking a Case for Including WAN Optimization in your Global SharePoint Deployment
Making a Case for Including WAN Optimization in your Global SharePoint Deployment Written by: Mauro Cardarelli Mauro Cardarelli is co-author of "Essential SharePoint 2007 -Delivering High Impact Collaboration"
More informationA Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise
WHITE PAPER A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise NOTICE This White Paper may contain proprietary information
More informationDDN updates object storage platform as it aims to break out of HPC niche
DDN updates object storage platform as it aims to break out of HPC niche Analyst: Simon Robinson 18 Oct, 2013 DataDirect Networks has refreshed its Web Object Scaler (WOS), the company's platform for efficiently
More informationA Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief
A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief With the massive growth of unstructured data in today s enterprise environments, storage IT administrators are constantly
More informationLong-term data storage in the media and entertainment industry: StrongBox LTFS NAS archive delivers 84% reduction in TCO
Long-term data storage in the media and entertainment industry: StrongBox LTFS NAS archive delivers 84% reduction in TCO Lowering Long-term Archive Storage Costs with Crossroads Systems StrongBox, Brad
More informationIndustrial Cyber Security Risk Manager. Proactively Monitor, Measure and Manage Industrial Cyber Security Risk
Industrial Cyber Security Risk Manager Proactively Monitor, Measure and Manage Industrial Cyber Security Risk Industrial Attacks Continue to Increase in Frequency & Sophistication Today, industrial organizations
More informationM A N A G I N G D A T A G R O W T H W H I L E B E T T E R M O N E T I Z I N G I N F O R M A T I O N V A L U E
Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com B U Y E R U S E C A S E I n v e s t m e n t i n a C l o u d S t o r a g e S o l u t i o n H e l p
More informationCloud Cube Model: Selecting Cloud Formations for Secure Collaboration
Cloud Cube Model: Selecting Cloud Formations for Secure Collaboration Problem Cloud computing offers massive scalability - in virtual computing power, storage, and applications resources - all at almost
More informationBuilding Confidence in Big Data Innovations in Information Integration & Governance for Big Data
Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data IBM Software Group Important Disclaimer THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONAL
More informationDelivering real Information Management solutions that result in better, more efficient healthcare.
Delivering real Information Management solutions that result in better, more efficient healthcare. For more than 30 years, we have helped companies overcome challenges and identify opportunities to achieve
More informationServices Professional Services for DNA
Services Professional Services for DNA Maximize the Value of Your Technology and Resource Investments with the Help of Professional Services Delivered by Industry Specialists Services Optimize the return
More informationIBM Smart Business Storage Cloud
GTS Systems Services IBM Smart Business Storage Cloud Reduce costs and improve performance with a scalable storage virtualization solution SoNAS Gerardo Kató Cloud Computing Solutions 2010 IBM Corporation
More informationDirect Scale-out Flash Storage: Data Path Evolution for the Flash Storage Era
Enterprise Strategy Group Getting to the bigger truth. White Paper Direct Scale-out Flash Storage: Data Path Evolution for the Flash Storage Era Apeiron introduces NVMe-based storage innovation designed
More informationLarge-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri
Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis
More informationHyperQ Remote Office White Paper
HyperQ Remote Office White Paper Parsec Labs, LLC. 7101 Northland Circle North, Suite 105 Brooklyn Park, MN 55428 USA 1-763-219-8811 www.parseclabs.com info@parseclabs.com sales@parseclabs.com Introduction
More informationFIVE TIPS FOR A SUCCESSFUL EMAIL ARCHIVE MIGRATION TO MICROSOFT OFFICE 365 WHITEPAPER
FIVE TIPS FOR A SUCCESSFUL EMAIL ARCHIVE MIGRATION TO MICROSOFT OFFICE 365 WHITEPAPER Introduction Microsoft Office 365 is a new powerful office productivity solution that replaces multiple on premise
More informationBuilding a Scalable Big Data Infrastructure for Dynamic Workflows
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
More informationPARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
More informationHadoop-BAM and SeqPig
Hadoop-BAM and SeqPig Keijo Heljanko 1, André Schumacher 1,2, Ridvan Döngelci 1, Luca Pireddu 3, Matti Niemenmaa 1, Aleksi Kallio 4, Eija Korpelainen 4, and Gianluigi Zanetti 3 1 Department of Computer
More informationENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION
ENZO UNIFIED SOLVES THE CHALLENGES OF REAL-TIME DATA INTEGRATION Enzo Unified Solves Real-Time Data Integration Challenges that Increase Business Agility and Reduce Operational Complexities CHALLENGES
More informationInnovate and Grow: SAP and Teradata
Partners Innovate and Grow: SAP and Teradata Lily Gulik, Teradata Director, SAP Center of Excellence Wayne Boyle, Chief Technology Officer Strategy, Teradata R&D Table of Contents Introduction: The Integrated
More informationSCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS
Sean Lee Solution Architect, SDI, IBM Systems SCALABLE FILE SHARING AND DATA MANAGEMENT FOR INTERNET OF THINGS Agenda Converging Technology Forces New Generation Applications Data Management Challenges
More informationConsiderations for Management of Laboratory Data
Considerations for Management of Laboratory Data 2003 Scientific Computing & Instrumentation LIMS Guide, November 2003 Michael H Elliott Drowning in a sea of data? Nervous about 21 CFR Part 11? Worried
More informationTable of Contents. Technical paper Open source comes of age for ERP customers
Technical paper Open source comes of age for ERP customers It s no secret that open source software costs less to buy the software is free, in fact. But until recently, many enterprise datacenter managers
More informationMobility. Mobility is a major force. It s changing human culture and business on a global scale. And it s nowhere near achieving its full potential.
Mobility arrow.com Mobility This year, the number of mobile devices is expected to exceed the world s population. Soon, smartphones will surpass PCs as the device of choice for Internet access. A startling
More informationIBM Data Security Services for endpoint data protection endpoint data loss prevention solution
Automating policy enforcement to prevent endpoint data loss IBM Data Security Services for endpoint data protection endpoint data loss prevention solution Highlights Facilitate policy-based expertise and
More informationHP Converged Cloud Cloud Platform Overview. Shane Pearson Vice President, Portfolio & Product Management
HP Converged Cloud Cloud Platform Overview Shane Pearson Vice President, Portfolio & Product Management Cloud is the biggest disruption since the Internet 1970-80s Mainframe 1990s Client/Server 2000s The
More informationIBM ELASTIC STORAGE SEAN LEE
IBM ELASTIC STORAGE SEAN LEE Solution Architect Platform Computing Division IBM Greater China Group Agenda Challenges in Data Management What is IBM Elastic Storage Key Features Elastic Storage Server
More informationThe functionality and advantages of a high-availability file server system
The functionality and advantages of a high-availability file server system This paper discusses the benefits of deploying a JMR SHARE High-Availability File Server System. Hardware and performance considerations
More informationStrategic Benefits of an Online Clinical Data Repository
Strategic Benefits of an Online Clinical Data Repository 5625 Dillard Drive Suite 205 Cary, NC 27518 www.pharsight.com Strategic Benefits of an Online Clinical Data Repository Contents Introduction 2 The
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More information<Insert Picture Here> The Evolution Of Clinical Data Warehousing
The Evolution Of Clinical Data Warehousing Srinivas Karri Principal Consultant Agenda Value of Clinical Data Clinical Data warehousing & The Big Data Challenge
More informationMedia Workflows Nice Shoes operates a 24x7 highly collaborative environment and needed to enable users to work in real-time. ddn.com.
DDN Case Study Accelerating > Media Workflows Nice Shoes operates a 24x7 highly collaborative environment and needed to enable users to work in real-time. 2012 DataDirect Networks. All Rights Reserved.
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES THE NEW YORK GENOME CENTER NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. capabilities. N E X T- G E
More information