ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
|
|
- Leonard Lambert
- 8 years ago
- Views:
Transcription
1 ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013
2 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and medical practice, a momentous challenge arises how to cope with the rapidly increasing volume of complex data. Issues such as data storage, access, transfer, sharing, security, and analysis must be resolved to enable the new era of genomic medicine. Annai Systems provides several tools to enable and enhance genomic data use: the Annai-GNOS data management platform, GeneTorrent and GTFuse for accelerated file transfer and file mining, request Portal for collaboration and discovery, and the BioCompute Farm for analytical power. These powerful tools can be deployed in concert or independently. of Annai Platform Components Annai-GNOS provides a fast, scalable and robust network solution for storing, moving, finding, and securing genomic sequence data and associated metadata. GNOS-enabled repositories are capable of handling multi-petabytes of next generation sequencing data for fast and flexible storage, search, and retrieval. GeneTorrent is a data transfer protocol that allows for highspeed transfer of data files into and out of a given GNOS enabled repository. The repository and file transfer capabilities are highly secure and meet government standards, as defined by the Federal Information Security and Management Act of 2002 (FISMA). BioCompute Farm is a virtualized computation environment that provides on-demand compute power specifically optimized to facilitate analysis of genomic data. Users can enjoy high throughput computing without having to build local high-performance compute platforms or transfer massive data files over the Internet. request is a web portal which employs a query and networking infrastructure enabling researchers to search, find, and manage downloads from multiple GNOS-enabled data repositories. request s intuitive user interface streamlines the process of exploring and searching genomic data. GTFuse amplifies GeneTorrent s fast transfer speeds by allowing users to download selected portions of large genomic data files such as those at CGHub. GTFuse allows researchers to find and quickly access sequence data files as swiftly as if they were on the local network. GTFuse s option to select and retrieve a designated subset or region of a BAM file dramatically reduces data transfer times and costs ANNAI SYSTEMS ALL RIGHTS RESERVED 2
3 There are a growing number of public and private repositories emerging as integral parts of the drug discovery and therapeutic treatment process. These data repositories vary greatly in data use, efficiency of data upload/ download and access, regulatory compliance and security configurations. Furthermore, genomic data comes in a wide variety of formats and from various sequencing platforms. As the integration of genomic data with clinical data becomes increasingly required, there is an urgent need for genomic data tools that provide flexible, scalable solutions for a wide diversity of uses. The Cancer Genomics Hub (CGHub) is a vast repository of cancer genome data accessed freely by hundreds of researchers and clinicians, in both academic and commercial environments. CGHub uses Annai- GNOS to provide highly scalable access to The Cancer Genome Atlas (TCGA) and other cancer genome data sets. CGHub was launched in 2012 at UC Santa Cruz and now holds over 55,000 cancer genome files totaling 675 Terabytes. Hundreds of researchers from dozens of institutions rely on CGHub for access to cancer genome data from ten world-class sequencing centers, including the Broad Institute, Washington University, and Baylor College of Medicine. The repository is expected to grow to 5 Petabytes in the next few years. Annai supports both research and clinical settings by providing a powerful and flexible environment for enabling users at all levels of IT skill to easily accomplish tasks of genomic data handling and analysis. AnnaiBCF AnnaireQuest Research Portal AnnaiGNOS Genome Network Operating System GNOS Web Services AnnaiGTFuse Federated Authentication GNOS Repository Public Genomic Data GeneTorrent Data Transfer Private Genomic Data FIGURE 1. The Annai-GNOS environment and related peripheral data management tools. The various components of the Annai platform can be deployed together as an integrated whole or independently. When deployed in full, the Annai-GNOS system boosts productivity, reduces timeto-insight, and ensures data security while facilitating collaboration. Researchers or clinicians can quickly search and extract specific segments from thousands of genomes, work independently or collaborate with a team to analyze the data, and prepare their findings for publication or use in the clinic to guide therapy. The Annai-GNOS platform is designed to accelerate genomic research. A closer examination of its components will provide insight into their collective synergy as a system with unique and comprehensive capabilities ANNAI SYSTEMS ALL RIGHTS RESERVED 3
4 Annai-GNOS A Platform for High Performance Genomic Analysis and Data Management Annai-GNOS is a unique integration of the data repository infrastructure and high-speed networking capabilities needed to accommodate large genomic data sets. These data sets are characterized by diverse file formats, extensive meta-data, large file sizes and individual sequence datasets ranging from 10 Gigabytes to more than 1 Terabyte in size (depending on the depth of coverage). Annai-GNOS allows the entire user community to see the state of data throughout the submission lifecycle, including data that has not yet been approved or submitted for download. Researchers can query the state of data as soon as it is submitted and quickly identify submissions that may require some attention due to formatting or other problematic issues, before they are available to users of the repository. Flexible meta-data searching greatly simplifies finding the right sequence file, and highly fault-tolerant design ensures services continue to be available. The GNOS network functionality integrates secure, high-speed network protocols to mobilize petabyte scale genomic data analysis. Annai-GNOS can also be integrated with federated authentication systems like InCommon and the National Cancer Institute s authorization systems. Technical Specifications GNOS features the following capabilities: User-programmable meta-data format validation engine Support for multiple meta-data formats including customer defined formats and the Sequence Read Archive (SRA) schemas used by NCBI, EBI and DDBJ Support for multiple sequence data file types Ability to store other file types, such as compressed sequences Accelerated file transfer using GeneTorrent and GTFuse Incommon (Shibboleth) based, federated user authentication. Project-based data authorization to control individual researcher access Support for commonly used file format standards and analysis tools, including NCBI SRA Meta-data format; TCGA v2 BAM and VCF File Formats and GATK, BowTie, TopHat, CuffLinks and additional tools. The GNOS platform streamlines all aspects of genomic data management and access for researchers and clinicians. Setting up a GNOS repository consists of two steps: 1) data ingestion (duration depends on the state of the data) and 2) data deployment as indexed, meta-data tags in the GNOS database. Sequence data are entered into the repository using Annai s proprietary GeneTorrent tool and metadata submission API. Researchers can use the request web portal to quickly and easily explore GNOS-enabled data. For example, a simple search of ovarian cancer in CGHub using request can instantly output the number of ovarian cancer genome files contained in the database and how many are RNA-Seq, exome, or whole genome. The interface also enables the user to further drill down quickly to the specific files of interest. The ability to quickly visualize the contents of a GNOS repository is based on searching meta-data attributes that are extracted from sequencing files, catalogued and indexed. Query parameters are unlimited, but typically include file type, disease, sample collection date, sequencing platform, date of sequencing, and mapping and alignment tools. GNOS is suitable for public and/or private genomic databases of translational and basic research centers, pharmaceutical R&D labs, diagnostic companies, and similar organizations generating significant volumes of sequence data. GNOS provides tools to help catalogue, index, upload and download files, and to make the data available for collaboration. GNOS can also be integrated with any data management and transfer method or protocol. Use Case 1 CGHub Cancer Genome Repository The University of California Santa Cruz (UCSC) provides CGHub, the world s largest repository of cancer genome data. CGHub is built on GNOS and, after rigorous testing with active TCGA users, was established as the new secure repository for the Cancer Genome Atlas (TGCA) on April 30, Use Case 2 Drug Development Pharmaceuticals companies have strict requirements for data protection and security. Corporate policies may mandate keeping data behind a firewall. In this case, an in-house GNOS repository is an optimal solution. After installation by Annai, this type of repository will be managed by the company s local experts within its existing highperformance computing infrastructure ANNAI SYSTEMS ALL RIGHTS RESERVED 4
5 GeneTorrent Accelerated Secure File Transport Whole genome sequence data files range from several hundred gigabytes to over one terabyte in size. GeneTorrent enables accelerated transfers of terabyte-scale data. It employs a proprietary variant of the popular BitTorrent algorithm to securely transfer files at speeds limited only by the base network bandwidth. Technical Specifications Use Case Translational and Clinical Research Translational researchers and clinicians use GeneTorrent to push sequence data, either locally or from an external sequencing lab, into a GNOS repository either installed in their facility or hosted by Annai in the BioCompute Farm. GNOS-enabled repositories can also be hosted on Amazon Web Services (AWS) or in similar cloud environments. GeneTorrent s key functionality is as follows: High-fidelity parallel file transfer at up to multi-gbits/sec (speeds as high as 200 Mbps are routinely achieved) Highly resilient to in-network and computing failures with automatic recovery Highly secure 256-bit encrypted file transfer request One-stop Portal for Data Access, Collaboration and Management One of the most difficult aspects of genomics research is finding specific data across multiple, growing and often separate, disparate data repositories. Individual files can also be very large and the metadata extensive and difficult to interpret. The request portal addresses these challenges by providing a single point of access to the contents of all accessible GNOS-enabled repositories. Researchers can employ request s data exploration capabilities to analyze the data trends across available repositories. The portal s Access and Download capabilities allow researchers to drill down to find and download specific data sets. The Explore, Access, Download, and Collaboration capabilities of request are available to the community through standard web browsers enabling users to query, retrieve, and monitor download progress without having to install or master complex proprietary tools or query syntax. Technical Specifications The following describes request s key functionality: Explore a graphical interface to interrogate and analyze the contents of any Annai-GNOS enabled data repository using data statistics and meta-data. This function enables searches based on organization, study, disease, and other key terms to explore the genomic data set. Access a powerful, yet user-friendly meta-data query building capability allowing the researcher to find and select a set of individual sequence files for download. The download of files can be initiated from the Access area once the desired files are designated. Annai request offers conditional access, as some data repositories, such as the TCGA data hosted on CGHub, require access authorization credentials in order to download sequence files. The status of current and past download requests can be reviewed from a single dashboard. Download users can view the status of each file within their download requests, and a complete history of downloads is maintained to support experiment reproducibility. Collaborate provides public and private collaboration sites to engage with colleagues and share knowledge around common projects and frequently accessed datasets to broaden and expand the community of academic and clinical researchers ANNAI SYSTEMS ALL RIGHTS RESERVED 5
6 System Management Data Explorer Annai request portal Data Access Portal Management Database Data Download Metadata Ingest The collaborative capabilities of request facilitate cross team communication and allow for better distribution of tasks. For example, a team member responsible for defining the experimental parameters could select the appropriate data and pass it to a bioinformatician who is performing the analysis. Operating System Communications Broker FIGURE 2. request Portal helping to expedite research through a wellmanaged, user-friendly portal environment. GTFuse Accelerated Data Queries GTFuse enables researchers to directly access remote sequence data files as if they were on the local file system. GTFuse allows researchers to mount the desired data and immediately run any existing tools such as SamTools to inspect the header and begin accessing specific regions of the sequence data (i.e. if you are interested in analyzing data from a particular chromosome, gene, or region). GNOS Genomic File GTFuse client HPC Analysis Clusters Technical Specifications The following describes GTFuse s key functionality: Mounts remote file on local file system Relevant data within file GTFuse client Local Analysis Tools Provides asynchronous access to files via GeneTorrent protocol No data transfer until file is accessed by the user on local file system FIGURE 3. GTFuse provides the option to search and download the specific genes or regions required instead of the entire file. It requires no tools integration and allows any analysis tool to access data files as if they were local. Researchers often want to quickly examine specific regions of genomic data in remote repositories without retrieving the entire BAM file or analysis object. Alternately, researchers may need to read entire files but do not have the storage capacity to maintain local copies of large numbers of BAM files. Other tasks are difficult due to the large size of sequence files. For example, a researcher may spend hours downloading BAM files to inspect their headers and determine if there is sufficient coverage depth for their analysis. For all of these scenarios, GTFuse provides a speedy and economical solution by substantially shortening the time researchers spend preparing to undertake the analysis that interests them and helping to conserve IT resources. Use Case 1 Asynchronous BAM file access A researcher wants to use SAMTools to view specific genome data coordinates. The researcher uses GTFuse to open a BAM file and its corresponding BAI file and perform seek operations to read small portions from the BAM file asynchronously. Use Case 2 Process remote file locally A researcher avoids using large amounts of local disk storage by mounting a remote BAM file using GTFuse before building a BAI index file locally ANNAI SYSTEMS ALL RIGHTS RESERVED 6
7 BioCompute Farm Enabling Simple, Streamlined Data Analysis The BioCompute Farm is a private cloud designed specifically for genomic data analysis. The BioCompute Farm allows collaborators to use an elastic pool of compute servers and run cross-organizational experiments without up front capital expense, IT development effort, ongoing maintenance, or significant lead-time. Local GNOS-enabled compute databases, a pre-installed set of analysis tools, a stored set of reference genomes, and specialized data access greatly simplify genomic data gathering and analysis. The BioCompute Farm s unique efficiencies reduce the resources and time needed to accomplish complex genomic data analysis. Researchers can instantly activate virtual machines in our highly secure BioCompute Farm and collaborate with colleagues across the globe. Data input and output is free on the BioCompute Farm. The BioCompute Farm s high-speed network transfer capability removes the need to ship hard disks containing potentially sensitive data between organizations with the attendant risks and delays. The BioCompute Farm s flexible storage allows researchers to import large volumes of data to be utilized for performing data analysis and to discard it afterwards. This allows researchers to avoid the difficulties and delays of expanding existing local IT infrastructure to cope with moving and processing large volumes of sequencing data. Customer Site Access Control Researcher Researcher Researcher CGHub Compute Console request Portal Transfer Control Sequence Data DataCenter Fabric San Diego Supercomputing Center Internet ANNAI BioCompute Farm FIGURE 4. The BioCompute Farm offers high performance computing, storage, and networking resources in a virtualized computing environment Genome Analysis Tools & GTFuse Technical specifications The BioCompute Farm has the following key functionalities: High-performance compute power including 10G networking, 100GB memory and highly scalable storage capacity, to deliver performance optimized for bioinformatics application needs. Users have complete control over their virtual instances. Additional instances, memory and storage capacity can be added as needed. Custom user tracking and reporting can be enabled. Instances include bioinformatics and data extraction tools for large-scale and complex genomic analysis. Users can add additional tools and save them for future reuse. Workflows can be set up to launch automatically. There are two primary uses of the BioCompute Farm. One use is serving clients who need to do analytical research with repositories such as CGHub, and do not need to store data at the compute center. Typically, they want to do analysis of primary sequence data in the BioCompute Farm and pull results datasets back to their local environments. By using GTFuse researchers can extract the genes or regions of interest, instead of bulk copying whole sequence files. This is one of the most significant advantages of GTFuse used in conjunction with the BioCompute Farm. In some particular cases where a handful of genes are studied across many genomes, TCGA researchers use up to one hundred times less compute and storage capacity by working only with the actively used TCGA data. Use Case 1 CGHub BioCompute Farm The CGHub BioCompute Farm is co-located with CGHub, home of genomic data from The Cancer Genome Atlas, within the San Diego Supercomputer Center. The BioCompute Farm has a 10Gb/sec connection to CGHub and the Internet. Annai s request web portal enables users to rapidly browse the genomic data sets via customized and automated searches, and to bring the desired data into the user applications running in the BioCompute Farm ANNAI SYSTEMS ALL RIGHTS RESERVED 7
8 Use Case 2 Private BioCompute Farm A private BioCompute Farm can be co-located with an in-house GNOS-enabled data repository tailored to meet the particular requirements of a research organization. Annai provides installation, configuration and GeneTorrent training to researchers. Optionally, mapping, alignment, and variant calling tools can also be pre-installed in the BioCompute Farm. Having data analysis capacity co-located with in-house data can substantially reduce costs and speed up genomic data analysis. Conclusion Advancing translational research and genomic medicine requires distilling valuable, actionable information from hundreds or thousands of genomic sequence files and raises a unique set of big data challenges. Responding to these challenges, Annai Systems has developed the Annai-GNOS platform that drives robust repository operations to meet the real-world needs of users by providing metadata-based indexing, search query, and access to multiple distributed data sets, high-speed file transfer, rapid extraction of designated elements from multiple files, and a user-friendly alternative to command line interface. Annai Systems Inc. Tel Alberto Way, Suite 120 Los Gatos, California, ANNAI SYSTEMS ALL RIGHTS RESERVED 8
Delivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationLeading Genomics. Diagnostic. Discove. Collab. harma. Shanghai Cambridge, MA Reykjavik
Leading Genomics Diagnostic harma Discove Collab Shanghai Cambridge, MA Reykjavik Global leadership for using the genome to create better medicine WuXi NextCODE provides a uniquely proven and integrated
More informationLarge-scale Research Data Management and Analysis Using Globus Services. Ravi Madduri Argonne National Lab University of Chicago @madduri
Large-scale Research Data Management and Analysis Using Globus Services Ravi Madduri Argonne National Lab University of Chicago @madduri Outline Who we are Challenges in Big Data Management and Analysis
More informationNIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons
The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,
More informationCGHub Client Security Guide Documentation
CGHub Client Security Guide Documentation Release 3.1 University of California, Santa Cruz April 16, 2014 CONTENTS 1 Abstract 1 2 GeneTorrent: a secure, client/server BitTorrent 2 2.1 GeneTorrent protocols.....................................
More informationFour Ways High-Speed Data Transfer Can Transform Oil and Gas WHITE PAPER
Transform Oil and Gas WHITE PAPER TABLE OF CONTENTS Overview Four Ways to Accelerate the Acquisition of Remote Sensing Data Maximize HPC Utilization Simplify and Optimize Data Distribution Improve Business
More informationUsing the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
More informationediscovery and Search of Enterprise Data in the Cloud
ediscovery and Search of Enterprise Data in the Cloud From Hype to Reality By John Patzakis & Eric Klotzko ediscovery and Search of Enterprise Data in the Cloud: From Hype to Reality Despite the enormous
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationDigital Asset Management. Content Control for Valuable Media Assets
Digital Asset Management Content Control for Valuable Media Assets Overview Digital asset management is a core infrastructure requirement for media organizations and marketing departments that need to
More informationUsing the Bionimbus Protected Data Cloud (PDC): Obtaining Access Credentials FAQ
Using the Bionimbus Protected Data Cloud (PDC): Obtaining Access Credentials FAQ It s very important that a PDC user is the only one who logs in with an account. If you have members of your lab that would
More informationBig Data Challenges. technology basics for data scientists. Spring - 2014. Jordi Torres, UPC - BSC www.jorditorres.
Big Data Challenges technology basics for data scientists Spring - 2014 Jordi Torres, UPC - BSC www.jorditorres.eu @JordiTorresBCN Data Deluge: Due to the changes in big data generation Example: Biomedicine
More informationGlobus Genomics Tutorial GlobusWorld 2014
Globus Genomics Tutorial GlobusWorld 2014 Agenda Overview of Globus Genomics Example Collaborations Demonstration Globus Genomics interface Globus Online integration Scenario 1: Using Globus Genomics for
More informationAlternative Deployment Models for Cloud Computing in HPC Applications. Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix
Alternative Deployment Models for Cloud Computing in HPC Applications Society of HPC Professionals November 9, 2011 Steve Hebert, Nimbix The case for Cloud in HPC Build it in house Assemble in the cloud?
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationT a c k l i ng Big Data w i th High-Performance
Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A
More informationBuilding a Scalable Big Data Infrastructure for Dynamic Workflows
Building a Scalable Big Data Infrastructure for Dynamic Workflows INTRODUCTION Organizations of all types and sizes are looking to big data to help them make faster, more intelligent decisions. Many efforts
More informationIntroduction to Arvados. A Curoverse White Paper
Introduction to Arvados A Curoverse White Paper Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source... 12
More informationWHITEPAPER. A Technical Perspective on the Talena Data Availability Management Solution
WHITEPAPER A Technical Perspective on the Talena Data Availability Management Solution BIG DATA TECHNOLOGY LANDSCAPE Over the past decade, the emergence of social media, mobile, and cloud technologies
More informationHIGH-SPEED BRIDGE TO CLOUD STORAGE
HIGH-SPEED BRIDGE TO CLOUD STORAGE Addressing throughput bottlenecks with Signiant s SkyDrop 2 The heart of the Internet is a pulsing movement of data circulating among billions of devices worldwide between
More informationWhite Paper. Version 1.2 May 2015 RAID Incorporated
White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively
More informationKey Considerations and Major Pitfalls
: Key Considerations and Major Pitfalls The CloudBerry Lab Whitepaper Things to consider before offloading backups to the cloud Cloud backup services are gaining mass adoption. Thanks to ever-increasing
More informationAmazon Cloud Storage Options
Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object
More informationComputational Requirements
Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects Computational Requirements Steve Sherry, Lisa Brooks, Paul Flicek, Anton Nekrutenko, Kenna Shaw, Heidi Sofia High-density
More informationWhite Paper. Amazon in an Instant: How Silver Peak Cloud Acceleration Improves Amazon Web Services (AWS)
Amazon in an Instant: How Silver Peak Cloud Acceleration Improves Amazon Web Services (AWS) Amazon in an Instant: How Silver Peak Cloud Acceleration Improves Amazon Web Services (AWS) Amazon Web Services
More informationTaking Big Data to the Cloud. Enabling cloud computing & storage for big data applications with on-demand, high-speed transport WHITE PAPER
Taking Big Data to the Cloud WHITE PAPER TABLE OF CONTENTS Introduction 2 The Cloud Promise 3 The Big Data Challenge 3 Aspera Solution 4 Delivering on the Promise 4 HIGHLIGHTS Challenges Transporting large
More informationIntelligent Systems for Health Solutions
Bringing People, Systems, and Information Together Today s health organizations are increasingly challenged to accomplish what we call the triple aim of effective healthcare: deliver higher quality care
More informationHow To Write A Blog Post On Globus
Globus Software as a Service data publication and discovery Kyle Chard, University of Chicago Computation Institute, chard@uchicago.edu Jim Pruyne, University of Chicago Computation Institute, pruyne@uchicago.edu
More informationBUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS
BUILDING A SCALABLE BIG DATA INFRASTRUCTURE FOR DYNAMIC WORKFLOWS ESSENTIALS Executive Summary Big Data is placing new demands on IT infrastructures. The challenge is how to meet growing performance demands
More informationWhitepaper. The ABC of Private Clouds. A viable option or another cloud gimmick?
Whitepaper The ABC of Private Clouds A viable option or another cloud gimmick? Although many organizations have adopted the cloud and are reaping the benefits of a cloud computing platform, there are still
More informationA Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent
More informationKeystones for supporting collaborative research using multiple data sets in the medical and bio-sciences
Keystones for supporting collaborative research using multiple data sets in the medical and bio-sciences David Fergusson Head of Scientific Computing The Francis Crick Institute The Francis Crick Institute
More informationNetApp Big Content Solutions: Agile Infrastructure for Big Data
White Paper NetApp Big Content Solutions: Agile Infrastructure for Big Data Ingo Fuchs, NetApp April 2012 WP-7161 Executive Summary Enterprises are entering a new era of scale, in which the amount of data
More informationCloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers
Cloud Computing Solutions for Genomics Across Geographic, Institutional and Economic Barriers Ntinos Krampis Asst. Professor J. Craig Venter Institute kkrampis@jcvi.org http://www.jcvi.org/cms/about/bios/kkrampis/
More informationKeystone Image Management System
Image management solutions for satellite and airborne sensors Overview The Keystone Image Management System offers solutions that archive, catalogue, process and deliver digital images from a vast number
More informationOPTIMIZING PERFORMANCE IN AMAZON EC2 INTRODUCTION: LEVERAGING THE PUBLIC CLOUD OPPORTUNITY WITH AMAZON EC2. www.boundary.com
OPTIMIZING PERFORMANCE IN AMAZON EC2 While the business decision to migrate to Amazon public cloud services can be an easy one, tracking and managing performance in these environments isn t so clear cut.
More informationMaking a Case for Including WAN Optimization in your Global SharePoint Deployment
Making a Case for Including WAN Optimization in your Global SharePoint Deployment Written by: Mauro Cardarelli Mauro Cardarelli is co-author of "Essential SharePoint 2007 -Delivering High Impact Collaboration"
More informationGenomeSpace Architecture
GenomeSpace Architecture The primary services, or components, are shown in Figure 1, the high level GenomeSpace architecture. These include (1) an Authorization and Authentication service, (2) an analysis
More informationAxceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2
Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2 In the movie making, visual effects and 3D animation industrues meeting project and timing deadlines is critical to success. Poor quality
More informationEMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE
EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE INTRODUCTION IT organizations today grapple with two critical data storage challenges: the exponential growth of data and an increasing need to keep more data for
More informationCisco UCS and Quantum StorNext: Harnessing the Full Potential of Content
Solution Brief Cisco UCS and Quantum StorNext: Harnessing the Full Potential of Content What You Will Learn StorNext data management with Cisco Unified Computing System (Cisco UCS ) helps enable media
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationQLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering
QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering June 2014 Page 1 Contents Introduction... 3 About Amazon Web Services (AWS)... 3 About Amazon Redshift... 3 QlikView on AWS...
More informationTestimony of. Paul Misener Vice President for Global Public Policy, Amazon.com. Before the
Testimony of Paul Misener Vice President for Global Public Policy, Before the United States House of Representatives Committee on Energy and Commerce Subcommittee on Communications and Technology Subcommittee
More informationEuropean Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute
European Genome-phenome Archive database of human data consented for use in biomedical research at the European Bioinformatics Institute Justin Paschall Team Leader Genetic Variation / EGA ! European Genome-phenome
More informationCGHub Web-based Metadata GUI Statement of Work
CGHub Web-based Metadata GUI Statement of Work Mark Diekhans Version 1 April 23, 2012 1 Goals CGHub stores metadata and data associated from NCI cancer projects. The goal of this project
More informationLifeScope Genomic Analysis Software 2.5
USER GUIDE LifeScope Genomic Analysis Software 2.5 Graphical User Interface DATA ANALYSIS METHODS AND INTERPRETATION Publication Part Number 4471877 Rev. A Revision Date November 2011 For Research Use
More informationCloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment
CloudCenter Full Lifecycle Management An application-defined approach to deploying and managing applications in any datacenter or cloud environment CloudCenter Full Lifecycle Management Page 2 Table of
More informationORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS
ORACLE HEALTH SCIENCES INFORM ADVANCED MOLECULAR ANALYTICS INCORPORATE GENOMIC DATA INTO CLINICAL R&D KEY BENEFITS Enable more targeted, biomarker-driven clinical trials Improves efficiencies, compressing
More informationCisco Virtualized Multiservice Data Center Reference Architecture: Building the Unified Data Center
Solution Overview Cisco Virtualized Multiservice Data Center Reference Architecture: Building the Unified Data Center What You Will Learn The data center infrastructure is critical to the evolution of
More informationPowerful analytics. and enterprise security. in a single platform. microstrategy.com 1
Powerful analytics and enterprise security in a single platform microstrategy.com 1 Make faster, better business decisions with easy, powerful, and secure tools to explore data and share insights. Enterprise-grade
More informationUCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production
Page 1 of 6 UCLA Team Sequences Cell Line, Puts Open Source Software Framework into Production February 05, 2010 Newsletter: BioInform BioInform - February 5, 2010 By Vivien Marx Scientists at the department
More informationHow to Ingest Data into Google BigQuery using Talend for Big Data. A Technical Solution Paper from Saama Technologies, Inc.
How to Ingest Data into Google BigQuery using Talend for Big Data A Technical Solution Paper from Saama Technologies, Inc. July 30, 2013 Table of Contents Intended Audience What you will Learn Background
More informationProduct Brief SysTrack VMP
for VMware View Product Brief SysTrack VMP Benefits Optimize VMware View desktop and server virtualization and terminal server projects Anticipate and handle problems in the planning stage instead of postimplementation
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationRelocating Windows Server 2003 Workloads
Relocating Windows Server 2003 Workloads An Opportunity to Optimize From Complex Change to an Opportunity to Optimize There is much you need to know before you upgrade to a new server platform, and time
More informationDesktop Virtualization for the Banking Industry. Resilient Desktop Virtualization for Bank Branches. A Briefing Paper
Desktop Virtualization for the Banking Industry Resilient Desktop Virtualization for Bank Branches A Briefing Paper September 2012 1 Contents Introduction VERDE Cloud Branch for Branch Office Management
More informationBusiness-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000
Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth
More informationObject Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
More informationGeneProf and the new GeneProf Web Services
GeneProf and the new GeneProf Web Services Florian Halbritter florian.halbritter@ed.ac.uk Stem Cell Bioinformatics Group (Simon R. Tomlinson) simon.tomlinson@ed.ac.uk December 10, 2012 Florian Halbritter
More informationIBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand.
IBM Global Technology Services September 2007 NAS systems scale out to meet Page 2 Contents 2 Introduction 2 Understanding the traditional NAS role 3 Gaining NAS benefits 4 NAS shortcomings in enterprise
More informationDELL s Oracle Database Advisor
DELL s Oracle Database Advisor Underlying Methodology A Dell Technical White Paper Database Solutions Engineering By Roger Lopez Phani MV Dell Product Group January 2010 THIS WHITE PAPER IS FOR INFORMATIONAL
More informationEnd-to-End E-Clinical Coverage with Oracle Health Sciences InForm GTM
End-to-End E-Clinical Coverage with InForm GTM A Complete Solution for Global Clinical Trials The broad market acceptance of electronic data capture (EDC) technology, coupled with an industry moving toward
More informationTABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models
1 THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY TABLE OF CONTENTS 3 Introduction 14 Examining Third-Party Replication Models 4 Understanding Sharepoint High Availability Challenges With Sharepoint
More informationHow In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time
SCALEOUT SOFTWARE How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time by Dr. William Bain and Dr. Mikhail Sobolev, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 T wenty-first
More informationDigital Asset Management
A collaborative digital asset management system for marketing organizations that improves performance, saves time and reduces costs. MarketingPilot provides powerful digital asset management software for
More informationBig Data at Cloud Scale
Big Data at Cloud Scale Pushing the limits of flexible & powerful analytics Copyright 2015 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For
More informationMedia Exchange really puts the power in the hands of our creative users, enabling them to collaborate globally regardless of location and file size.
Media Exchange really puts the power in the hands of our creative users, enabling them to collaborate globally regardless of location and file size. Content Sharing Made Easy Media Exchange (MX) is a browser-based
More informationScalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationThe Recipe for Sarbanes-Oxley Compliance using Microsoft s SharePoint 2010 platform
The Recipe for Sarbanes-Oxley Compliance using Microsoft s SharePoint 2010 platform Technical Discussion David Churchill CEO DraftPoint Inc. The information contained in this document represents the current
More informationUtilizing the SDSC Cloud Storage Service
Utilizing the SDSC Cloud Storage Service PASIG Conference January 13, 2012 Richard L. Moore rlm@sdsc.edu San Diego Supercomputer Center University of California San Diego Traditional supercomputer center
More informationCrossPoint for Managed Collaboration and Data Quality Analytics
CrossPoint for Managed Collaboration and Data Quality Analytics Share and collaborate on healthcare files. Improve transparency with data quality and archival analytics. Ajilitee 2012 Smarter collaboration
More informationWE RUN SEVERAL ON AWS BECAUSE WE CRITICAL APPLICATIONS CAN SCALE AND USE THE INFRASTRUCTURE EFFICIENTLY.
WE RUN SEVERAL CRITICAL APPLICATIONS ON AWS BECAUSE WE CAN SCALE AND USE THE INFRASTRUCTURE EFFICIENTLY. - Murari Gopalan Director, Technology Expedia Expedia, a leading online travel company for leisure
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
More informationCisco Unified Data Center
Solution Overview Cisco Unified Data Center Simplified, Efficient, and Agile Infrastructure for the Data Center What You Will Learn The data center is critical to the way that IT generates and delivers
More informationHow To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5
Cisco MDS 9000 Family Solution for Cloud Storage All enterprises are experiencing data growth. IDC reports that enterprise data stores will grow an average of 40 to 60 percent annually over the next 5
More informationebook Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry.
Utilizing MapReduce to address Big Data Enterprise Needs Leveraging Big Data to shorten drug development cycles in Pharmaceutical industry. www.persistent.com 3 4 5 5 7 9 10 11 12 13 From the Vantage Point
More informationHow To Build A Cloud Computer
Introducing the Singlechip Cloud Computer Exploring the Future of Many-core Processors White Paper Intel Labs Jim Held Intel Fellow, Intel Labs Director, Tera-scale Computing Research Sean Koehl Technology
More informationClodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage
Clodoaldo Barrera Chief Technical Strategist IBM System Storage Making a successful transition to Software Defined Storage Open Server Summit Santa Clara Nov 2014 Data at the core of everything Data is
More informationGlobus Research Data Management: Introduction and Service Overview
Globus Research Data Management: Introduction and Service Overview Kyle Chard chard@uchicago.edu Ben Blaiszik blaiszik@uchicago.edu Thank you to our sponsors! U. S. D E P A R T M E N T OF ENERGY 2 Agenda
More informationCluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationHow To Build A Clustered Storage Area Network (Csan) From Power All Networks
Power-All Networks Clustered Storage Area Network: A scalable, fault-tolerant, high-performance storage system. Power-All Networks Ltd Abstract: Today's network-oriented computing environments require
More informationUnderstanding the Benefits of IBM SPSS Statistics Server
IBM SPSS Statistics Server Understanding the Benefits of IBM SPSS Statistics Server Contents: 1 Introduction 2 Performance 101: Understanding the drivers of better performance 3 Why performance is faster
More informationStorReduce Technical White Paper Cloud-based Data Deduplication
StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog
More informationPentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System
Pentaho High-Performance Big Data Reference Configurations using Cisco Unified Computing System By Jake Cornelius Senior Vice President of Products Pentaho June 1, 2012 Pentaho Delivers High-Performance
More informationGlobus Research Data Management: Introduction and Service Overview. Steve Tuecke Vas Vasiliadis
Globus Research Data Management: Introduction and Service Overview Steve Tuecke Vas Vasiliadis Presentations and other useful information available at globus.org/events/xsede15/tutorial 2 Thank you to
More informationData management challenges in todays Healthcare and Life Sciences ecosystems
Data management challenges in todays Healthcare and Life Sciences ecosystems Jose L. Alvarez Principal Engineer, WW Director Life Sciences jose.alvarez@seagate.com Evolution of Data Sets in Healthcare
More informationcloud functionality: advantages and Disadvantages
Whitepaper RED HAT JOINS THE OPENSTACK COMMUNITY IN DEVELOPING AN OPEN SOURCE, PRIVATE CLOUD PLATFORM Introduction: CLOUD COMPUTING AND The Private Cloud cloud functionality: advantages and Disadvantages
More informationWOS Cloud. ddn.com. Personal Storage for the Enterprise. DDN Solution Brief
DDN Solution Brief Personal Storage for the Enterprise WOS Cloud Secure, Shared Drop-in File Access for Enterprise Users, Anytime and Anywhere 2011 DataDirect Networks. All Rights Reserved DDN WOS Cloud
More informationUNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High
More informationScalable Services for Digital Preservation
Scalable Services for Digital Preservation A Perspective on Cloud Computing Rainer Schmidt, Christian Sadilek, and Ross King Digital Preservation (DP) Providing long-term access to growing collections
More informationAnalyzing HTTP/HTTPS Traffic Logs
Advanced Threat Protection Automatic Traffic Log Analysis APTs, advanced malware and zero-day attacks are designed to evade conventional perimeter security defenses. Today, there is wide agreement that
More informationTableau Online. Understanding Data Updates
Tableau Online Understanding Data Updates Author: Francois Ajenstat July 2013 p2 Whether your data is in an on-premise database, a database, a data warehouse, a cloud application or an Excel file, you
More informationIncreased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER
Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES TABLE OF CONTENTS Introduction... 3 Overview: Delphix Virtual Data Platform... 4 Delphix for AWS... 5 Decrease the
More informationPart V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts
Part V Applications Cloud Computing: General concepts Copyright K.Goseva 2010 CS 736 Software Performance Engineering Slide 1 What is cloud computing? SaaS: Software as a Service Cloud: Datacenters hardware
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationCAREER TRACKS PHASE 1 UCSD Information Technology Family Function and Job Function Summary
UCSD Applications Programming Involved in the development of server / OS / desktop / mobile applications and services including researching, designing, developing specifications for designing, writing,
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More information