Introduction to Arvados. A Curoverse White Paper
|
|
- Lindsey Imogene Weaver
- 8 years ago
- Views:
Transcription
1 Introduction to Arvados A Curoverse White Paper
2 Contents Arvados in a Nutshell... 4 Why Teams Choose Arvados... 4 The Technical Architecture... 6 System Capabilities... 7 Commitment to Open Source Copyright Curoverse, Inc. All rights reserved. First published April We believe the information in this paper is accurate as of the publication date, however it is subject to change without notice. The Curoverse 1.0 release described in this paper is still under development with a planned release in We make no representations or warranties of any kind with respect to the content. Curoverse, the Curoverse logo, Arvados, and the Arvados logo are trademarks of Curoverse. All other trademarks used herein are the property of their respective owners.
3 Research IT leaders face significant challenges posed by the need for systems that can scale for new genomic and biomedical datasets. Along with this data tsunami, an array of changes is transforming data center architectures and service delivery models. At the same time, the expectations of researchers are shifting. They want more self-service, better computational reproducibility, as well as faster performance and they still don t care about the little things such as massive data duplication, unpredictable scaling, and the limitations of your budget. We built Arvados to address these challenges.* * Yes, that s a bold claim. In reality, we won t solve all your problems, but we can put a big dent in them read on to see how.
4 Arvados in a Nutshell Curoverse solutions are all based on Arvados ( Arvados is an open source platform for managing, processing, and sharing genomic and biomedical data. The system provides capabilities that bioinformaticians and computational biologists use to manage and analyze their data. We call it a platform because you can run pipelines and applications on top of it. Arvados is built for big data. We designed it for sequencing data such as genomes, tumor/normal pairs, and microbiomes. People also use it for imaging, sensor, and other data. We call these data big because the files can be large (for example, 100+ GB), there can be a lot of them (for example, billions of mass spectrometer files), and the total amount of data ranges from tens of terabytes to petabytes. You can download and run Arvados yourself, but in production, it works best in a modern, hyperconverged, elastic computing environment. We provide Arvados as a SaaS solution on public cloud providers, and we support deploying Arvados clusters on-premise in your data center. Why Teams Choose Arvados Today, Arvados is used by bioinformaticians, computational biologists, and developers informaticians for short. In the future, biologists, geneticists, pathologists, and ultimately clinicians will make discoveries and deliver precision medicine with applications that run on Arvados. For Informaticians Informaticians use Arvados to track, organize, and manage their data. They use it both as a workspace to develop new analyses and as a platform for scaled, distributed, computational analysis with custom and common pipelines. Arvados offers informaticians four major benefits: 1. Streamlined Work With the functionality in Arvados, informaticians work more productively, accomplish more, and gain easy access to powerful computational capacity. 4
5 2. Efficient Computation Many computational pipelines combine tools that each use widely different compute resources and runtime environments. Arvados handles these by provisioning the right resource for each tool. 3. Reproducible Analyses Arvados provides a breakthrough in computational reproducibility. The combination of data and job management capabilities makes it radically easier to track, record, and reproduce complex pipelines on large datasets. 4. Efficient Collaboration An array of features in Arvados help informaticians and researchers collaborate with each other, sharing pipelines, data, and results in ways that are fast, reliable, and secure. For IT Arvados isn t just for informaticians. We ve built the service to help IT managers solve their problems. Arvados offers IT leaders four major benefits: 1. Addresses Researcher Needs With Arvados, IT can give researchers a flexible, scalable selfservice platform that meets their requirements. 2. Delivers Operational Excellence With clear visibility into how the system is being used and strong support from the Arvados administration team, you can identify issues before they become problems and better manage user expectations. 3. Uses Infrastructure Cost-efficiently With Arvados, you can lower your total-cost-ofownership (TCO). Elastic computing, userlevel compute management, and usage tracking let you manage compute costs. A data management system helps you manage storage costs by automatically eliminating duplication and by helping you identify datasets that can be deleted. Finally, fine-grain usage tracking makes it easy to handle your budgeting and billing. 4. Avoids Vendor Lock-in Arvados is entirely built with open source software, primarily the Arvados platform. That means you always have the option to stop using our service and deploy the same software yourself. 5
6 The Technical Architecture Arvados uses a multilayer integrated stack of technologies built with proven services and open source software. All the layers work together as a complete solution, as described below. Curoverse platform architecture 1. Cloud Infrastructure The Cloud Infrastructure layer includes raw storage, compute management, network services, and low-level security. 2. System Services The System Services layer includes innovative approaches to data and job management that make it simple to manage massive datasets and implement large-scale, easily reproducible distributed computations. 3. API All the services in the system are accessed through a RESTful API. The API can be used directly from any language or a command line interface. In addition, we provide SDKs for Python, Perl, Ruby, Java, and Go. 4. System Interfaces At the Interface layer, Arvados provides a number of different ways for users and administrators to access the capabilities of the system. 5. Security Security is woven throughout the system. At the Cloud Infrastructure layer, security is implemented with a mix of physical and network security controls, as well as encryption. At the Access Control level, a flexible system governs who can access different datasets, pipelines, and other objects in the system. The Authentication layer is implemented with industry-standard federated identity protocols. 6
7 System Capabilities Working together, the layers in the system solve the major problems informaticians face as they organize and analyze data. Cloud Infrastructure Curoverse hosts Arvados as a SaaS service on public cloud providers, such as Amazon Web Services (AWS) and Google Cloud Platform (GCP). We also support on-premise clusters that run in your datacenter. On-premise, we deploy Arvados in state-of-the-art hyperconverged clusters that combine storage, compute, RAM, and high-performance networking. Placed in your data center, our clusters are designed to easily scale. Most informatics is still done with a traditional highperformance computing architecture that combines networkattached storage (NAS) with a storage area network and a compute cluster. While the scientific community has stuck with this architecture, other industries working with very large datasets have transitioned to cloud computing architectures. These industries have significantly reduced their costs and gained powerful new functionality without sacrificing performance. Arvados is designed to use a hyperconverged elastic computing architecture that leverages low-cost hardware and uses software for fault tolerance. It takes advantage of virtualization and nodes that can more seamlessly scale, allowing the system to move computation closer to data, instead of moving data to compute. Arvados clusters can be integrated with existing HPC systems. This provides a new way to increase utilization and improve data management in existing infrastructure at the same time that it creates a pathway to a new more scalable and lower cost architecture. Storage Management with Keep Keep is a data management system designed to solve the challenges of managing biomedical big data for scientific and clinical analysis. Content Addressing Keep identifies datasets using content addresses globally unique cryptographic hashes generated from the bits in a dataset. With content addressing, Keep can provide a number of data management services: 7
8 6 Benefits of Keep 1. Reliable File Addresses Ensures reliable and durable data retrieval. 2. De-duplication Eliminates duplicate data storage by checking for duplication on write. 3. Origin and Use Tracks the origin of datasets and how they are used across the system. 4. Fast Throughput Manages data distribution within the underlying file system to optimize for distributed computation. 5. Flexible Metadata Enables the application of multiple metadata schema without file duplication. 6. Portable API Provides a consistent API across cloud providers. Data Validation By design, the system ensures that when a dataset is retrieved it is, in fact, the dataset requested. This makes reproducible computations possible without depending on inherently impermanent file names or directory paths. De-duplication Content addressing automatically eliminates file duplication. If a user tries to save data that already exist in the system, Keep will not save another copy. Flexible Organization The most popular way to organize metadata in traditional POSIX file systems is to use the directory structure (for example, \study1\participantid\). When users want to reorganize data, they duplicate it and change the directory structure. Within Arvados, they can reorganize data and change how it s tagged without ever making duplicate copies. Content addressing is a powerful storage technology that has been used for many years in other fields. Data Organization with Datasets Keep gives informaticians the ability to create datasets from multiple files without physically reorganizing those files on disk. Keep defines datasets with a simple manifest that contains a structured list of the content addresses for the files in the dataset. Each manifest is content addressed, providing a cryptographically verifiable canonical reference for the dataset. This approach results in both inexpensive descriptions of datasets (for instance, PBs of data can be described in MBs) as well as highly durable representations. At the same time, Keep datasets eliminate a common pattern of unnecessary file duplication as informaticians attempt to reorganize files into new datasets for different computations. Distributed Storage Keep uses a well-established pattern for storing large data sets and large files first developed for the Google File System. Large files are chunked into 64 MiB blocks and small files are packed into 64 MiB blocks. These blocks are then replicated across multiple disks on multiple nodes. As a result, Keep has a high degree of fault tolerance to disk and node failures. Also, the system makes it possible to move distributed computations near the data for faster throughput. High Throughput Keep is optimized for throughput on file access using several strategies for disk and network management. Keep does not maintain a name node or specialized database to reference file locations. Clients can find files algorithmically across 8
9 nodes through their content addresses, which increases system reliability by eliminating another potential point of failure. Provenance (Origin) and Usage Tracking Working with Crunch (see below), the platform tracks metrics on dataset creation and usage. (For example, it may track who created the dataset, how much compute it took, how long it took, how often the dataset is used, and if it can be reliably reproduced.) These metrics help informaticians manage temp data and redundant datasets more easily and efficiently. Integration with Existing Storage Keep can be tightly integrated with existing storage systems using a variety of approaches. For example, you could use an Isilon NAS as primary storage. In this scenario, Keep would index the data on the NAS, load it when it needs to be processed, and then write the output files back to the NAS. Compute Management with Crunch Crunch is a distributed job manager designed to ensure computational reproducibility. Crunch makes it easy for informaticians to create, schedule, provision, and track distributed computing jobs on large datasets. 10 Benefits of Crunch 1. Reproducibility Reliably reproduce complex analyses. 2. Origin and Use Tracking Record the origin and use of every dataset. 3. Fault Tolerance Automatically recover from disk and node failures. 4. Portability Easily move computations between clouds. 5. Sharing Quickly and reliably share pipeline templates between users. 6. Self-service Run jobs without assistance in cluster management. 7. Scaling Easily scale jobs to run in parallel on multiple nodes. 8. Status Reporting Access job status reports during and after job execution. 9. Optimized Re-running Save time and money by skipping jobs that don t need to be re-run. 10. Pipeline Comparisons Quickly compare multiple pipeline runs. Creation and Invocation Users can run almost any algorithm written in any language as a Crunch job; it s particularly well suited to distributed jobs that can run in parallel. A user invokes a job by simply specifying the desired script version, inputs and parameters, and optionally the worker node configuration. Crunch handles everything else. Pipelines (computational workflows) can be written for Crunch using a Python script or a JSON document. We plan to add support for Common Workflow Language in Scheduling Crunch schedules jobs, deciding which jobs to run and when to run them based on the rules established for prioritization. Provisioning Crunch sets up and configures nodes, attaches storage, and ensures the runtime environment is properly configured. Jobs are run inside Linux containers using Docker. This design provides a reliable transition from testing to scaled deployment, and enables reproduction of the complete runtime environment. More importantly, Crunch efficiently manages complex heterogeneous 9
10 pipelines where each job requires different computing resources. Many pipelines use tools that require different types of nodes and runtime environments. For example, one job may be a single-threaded Perl app with specialized libraries, the next a multi-threaded Java tool that needs more RAM, and the third a distributed process that can run in parallel on multiple nodes. For each job, Crunch dynamically provisions the correct computing resources and ensures they are properly configured at runtime. Supervising As jobs run, Crunch supervises their operation. It reports status, identifies problems, and automatically restarts jobs when nodes fail. In real time, users can watch the provenance graph that shows the results as each job in a pipeline is completed. System Interfaces Users access Arvados through several interfaces. Virtual Private Servers (VPS) In a typical configuration, we give each users their own virtual machine or virtual private server on the Arvados cluster with an operating system, informatics tools, popular pipelines, and a command line interface to the API. Users have root access in their VPS, and several configurations are available to accommodate different use cases. Workbench Users and administrators can use the browser-based tools in Workbench to manage data, initiate and track jobs, visualize pipeline provenance, administer security, see operating data, and complete other tasks associated with using the service. Data Transfer Interfaces Users can transfer data into and out of their Arvados account through several different mechanisms. Virtual Private Clouds SFTP for routine, manual, or automated data ingestion Import/Export (sending drives directly) for large batch data ingest On-premise Private Clouds SFTP for routine, manual, or automated data ingestion Data ingestion from or export to NFS-mounted volumes 10
11 Third-party Applications Arvados is a platform, and it supports deploying third-party applications that use the API to provide a wide range of functionality to users. Security We ve woven security and compliance capabilities throughout the Arvados platform. Arvados can operate compliant with HIPAA, SOC2, and FISMA. Infrastructure Security At the infrastructure layer we ve taken several steps to create a secure environment. Virtual Private Cloud We leverage the security levels that AWS and GCP have achieved, including significant physical and network security capabilities (see summary of AWS compliance or summary of GCP compliance). In Team Accounts, we provide a single-instance VPC that isolates data and network access. User and administrator access uses a least-privileges model. Data are encrypted in transit and at rest. On-premise Private Cloud On-premise clouds live on your network, usually behind a firewall or in a DMZ. We can work with your team to ensure compliance with your HIPAA standards and help enforce physical and digital access controls. Arvados local clouds can be integrated with your SSO; they also leverage the Secure Shell cryptographic network protocol (SSH) for VPS access (see below). Authentication User authentication for Workbench and the data transfer interfaces use OAuth2.0. If you use another SSO standard (such as SAML or LDAP), we can work with you to support it. VPS access is authenticated with SSH, and users can manage their public keys through Workbench. Access Control There are several access control mechanisms, including API keys. At the data management level, users and administrators can control access to specific datasets. For example, Researcher 1 could share a 20,000-file dataset with Researcher 2. Then Researcher 1 could create a second dataset that includes 10,000 files from the first dataset and 5,000 new files from a different dataset, and 11
12 share those with Researcher 3 without duplicating data on disk, managing filelevel permissions on thousands of files, or moving data between directories. HIPAA-specific Controls Curoverse plans to sign BAAs for HIPAA compliance. Users who want to store and use data that are covered by HIPAA will be required to go through further manual verification. Commitment to Open Source Arvados is open source. The platform was first developed at the Harvard Medical School to handle the challenges of large-scale distributed computing with genomic and other biomedical data. The project is managed through an open source community, and we plan to form a nonprofit foundation to oversee the project. The core system is licensed under the AGPL v3 license. All of the SDKs and client libraries are licensed under the Apache 2 open source license, so you can confidently deploy proprietary applications on the platform. Arvados is deployed in an integrated solution that also leverages a wide range of other open source software systems such as Debian, Xen, Docker, and others. Curoverse on-premise clusters use standard hardware components and don t rely on custom ASICs or esoteric components that are not readily available. As a result, you never face the challenge of vendor lock-in. Curoverse, Inc. 212 Elm St., 3 rd Floor Somerville, MA
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationCLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service
CLOUD TECH SOLUTION AT INTEL INFORMATION TECHNOLOGY ICApp Platform as a Service Open Data Center Alliance, Inc. 3855 SW 153 rd Dr. Beaverton, OR 97003 USA Phone +1 503-619-2368 Fax: +1 503-644-6708 Email:
More informationRED HAT OPENSTACK PLATFORM A COST-EFFECTIVE PRIVATE CLOUD FOR YOUR BUSINESS
WHITEPAPER RED HAT OPENSTACK PLATFORM A COST-EFFECTIVE PRIVATE CLOUD FOR YOUR BUSINESS INTRODUCTION The cloud is more than a marketing concept. Cloud computing is an intentional, integrated architecture
More informationAmazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
More informationAssignment # 1 (Cloud Computing Security)
Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual
More informationHow To Protect Data On Network Attached Storage (Nas) From Disaster
White Paper EMC FOR NETWORK ATTACHED STORAGE (NAS) BACKUP AND RECOVERY Abstract This white paper provides an overview of EMC s industry leading backup and recovery solutions for NAS systems. It also explains
More informationDLT Solutions and Amazon Web Services
DLT Solutions and Amazon Web Services For a seamless, cost-effective migration to the cloud PREMIER CONSULTING PARTNER DLT Solutions 2411 Dulles Corner Park, Suite 800 Herndon, VA 20171 Duane Thorpe Phone:
More informationBig data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
More informationObject Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.
Object Storage: A Growing Opportunity for Service Providers Prepared for: White Paper 2012 Neovise, LLC. All Rights Reserved. Introduction For service providers, the rise of cloud computing is both a threat
More informationSECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX
White Paper SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX Abstract This white paper explains the benefits to the extended enterprise of the on-
More informationIntroduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research
Introduction to Cloud : Cloud and Cloud Storage Lecture 2 Dr. Dalit Naor IBM Haifa Research Storage Systems 1 Advanced Topics in Storage Systems for Big Data - Spring 2014, Tel-Aviv University http://www.eng.tau.ac.il/semcom
More informationFinancial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers
Financial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers (Please consult http://aws.amazon.com/whitepapers for the latest version of this paper) Page 1 of 15 Contents Abstract...
More informationCloud for Your Business
Whitepaper Red Hat Enterprise Linux OpenStack Platform A Cost-Effective Private Cloud for Your Business Introduction The cloud is more than a marketing concept. Cloud computing is an intentional, integrated
More informationNEXT-GENERATION, CLOUD-BASED SERVER MONITORING AND SYSTEMS MANAGEMENT
NEXT-GENERATION, CLOUD-BASED SERVER MONITORING AND SYSTEMS MANAGEMENT COVERS INTRODUCTION A NEW APPROACH CUSTOMER USE CASES FEATURES ARCHITECTURE V 1.0 INTRODUCTION & OVERVIEW Businesses use computers
More informationProtect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam
Protect SAP HANA Based on SUSE Linux Enterprise Server with SEP sesam Many companies of different sizes and from all sectors of industry already use SAP s inmemory appliance, HANA benefiting from quicker
More informationStorReduce Technical White Paper Cloud-based Data Deduplication
StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog
More informationBusiness-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000
Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000 Clear the way for new business opportunities. Unlock the power of data. Overcoming storage limitations Unpredictable data growth
More informationExpand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager
Expand Your Infrastructure with the Elastic Cloud Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager Today we re going to talk about The Cloud Scenarios Questions You Probably
More informationCloud Computing. Adam Barker
Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles
More informationMigration Scenario: Migrating Batch Processes to the AWS Cloud
Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog
More informationCisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
More informationUNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure
UNINETT Sigma2 AS: architecture and functionality of the future national data infrastructure Authors: A O Jaunsen, G S Dahiya, H A Eide, E Midttun Date: Dec 15, 2015 Summary Uninett Sigma2 provides High
More informationIBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads
89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com @EdisonGroupInc 212.367.7400 IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads A Competitive Test and Evaluation Report
More informationA REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information
More informationPractical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute (madduri@anl.gov) Paul Dave (pdave@uchicago.edu) Dinanath Sulakhe (sulakhe@uchicago.edu) Alex Rodriguez (arodri7@uchicago.edu)
More informationSolution White Paper Build the Right Cloud, Quickly
Solution White Paper Build the Right Cloud, Quickly BMC Express Cloud Table of Contents 1 THE PROMISE OF CLOUD COMPUTING Getting Started 2 SUCCEEDING WITH CLOUD COMPUTING 3 INTRODUCING BMC EXPRESS CLOUD
More informationRED HAT CLOUD SUITE FOR APPLICATIONS
RED HAT CLOUD SUITE FOR APPLICATIONS DATASHEET AT A GLANCE Red Hat Cloud Suite: Provides a single platform to deploy and manage applications. Offers choice and interoperability without vendor lock-in.
More informationwww.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
More informationScala Storage Scale-Out Clustered Storage White Paper
White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current
More informationAchieve Economic Synergies by Managing Your Human Capital In The Cloud
Achieve Economic Synergies by Managing Your Human Capital In The Cloud By Orblogic, March 12, 2014 KEY POINTS TO CONSIDER C LOUD S OLUTIONS A RE P RACTICAL AND E ASY TO I MPLEMENT Time to market and rapid
More informationHIGH-SPEED BRIDGE TO CLOUD STORAGE
HIGH-SPEED BRIDGE TO CLOUD STORAGE Addressing throughput bottlenecks with Signiant s SkyDrop 2 The heart of the Internet is a pulsing movement of data circulating among billions of devices worldwide between
More informationIBM Global Technology Services September 2007. NAS systems scale out to meet growing storage demand.
IBM Global Technology Services September 2007 NAS systems scale out to meet Page 2 Contents 2 Introduction 2 Understanding the traditional NAS role 3 Gaining NAS benefits 4 NAS shortcomings in enterprise
More informationMake the Most of Big Data to Drive Innovation Through Reseach
White Paper Make the Most of Big Data to Drive Innovation Through Reseach Bob Burwell, NetApp November 2012 WP-7172 Abstract Monumental data growth is a fact of life in research universities. The ability
More informationWHITEPAPER. Network-Attached Storage in the Public Cloud. Introduction. Red Hat Storage for Amazon Web Services
WHITEPAPER Network-Attached Storage in the Public Cloud Red Hat Storage for Amazon Web Services Introduction Cloud computing represents a major transformation in the way enterprises deliver a wide array
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationTechnical. Overview. ~ a ~ irods version 4.x
Technical Overview ~ a ~ irods version 4.x The integrated Ru e-oriented DATA System irods is open-source, data management software that lets users: access, manage, and share data across any type or number
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationBuilding a Continuous Integration Pipeline with Docker
Building a Continuous Integration Pipeline with Docker August 2015 Table of Contents Overview 3 Architectural Overview and Required Components 3 Architectural Components 3 Workflow 4 Environment Prerequisites
More informationModern Application Architecture for the Enterprise
Modern Application Architecture for the Enterprise Delivering agility, portability and control with Docker Containers as a Service (CaaS) Executive Summary Developers don t adopt locked down platforms.
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationZADARA STORAGE. Managed, hybrid storage EXECUTIVE SUMMARY. Research Brief
ZADARA STORAGE Managed, hybrid storage Research Brief EXECUTIVE SUMMARY In 2013, Neuralytix first documented Zadara s rise to prominence in the then, fledgling integrated on-premise and in-cloud storage
More informationWith Eversync s cloud data tiering, the customer can tier data protection as follows:
APPLICATION NOTE: CLOUD DATA TIERING Eversync has developed a hybrid model for cloud-based data protection in which all of the elements of data protection are tiered between an on-premise appliance (software
More informationVistara Lifecycle Management
Vistara Lifecycle Management Solution Brief Unify IT Operations Enterprise IT is complex. Today, IT infrastructure spans the physical, the virtual and applications, and crosses public, private and hybrid
More informationHowtomanage and protectend usersdata? Mati Raidma
Howtomanage and protectend usersdata? Mati Raidma Data Protection Has Changed Increased mobility of the workforce and business adoption of tablets are forcing organizations to face the challenge ofprotecting
More informationBig Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect
on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze
More informationEXTENDING SINGLE SIGN-ON TO AMAZON WEB SERVICES
pingidentity.com EXTENDING SINGLE SIGN-ON TO AMAZON WEB SERVICES Best practices for identity federation in AWS Table of Contents Executive Overview 3 Introduction: Identity and Access Management in Amazon
More informationPart V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts
Part V Applications Cloud Computing: General concepts Copyright K.Goseva 2010 CS 736 Software Performance Engineering Slide 1 What is cloud computing? SaaS: Software as a Service Cloud: Datacenters hardware
More informationCloudCenter Full Lifecycle Management. An application-defined approach to deploying and managing applications in any datacenter or cloud environment
CloudCenter Full Lifecycle Management An application-defined approach to deploying and managing applications in any datacenter or cloud environment CloudCenter Full Lifecycle Management Page 2 Table of
More informationBackground on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
More informationSecuring Privileges in the Cloud. A Clear View of Challenges, Solutions and Business Benefits
A Clear View of Challenges, Solutions and Business Benefits Introduction Cloud environments are widely adopted because of the powerful, flexible infrastructure and efficient use of resources they provide
More informationModern App Architecture for the Enterprise Delivering agility, portability and control with Docker Containers as a Service (CaaS)
Modern App Architecture for the Enterprise Delivering agility, portability and control with Docker Containers as a Service (CaaS) Executive Summary Developers don t adopt locked down platforms. In a tale
More information2012 LABVANTAGE Solutions, Inc. All Rights Reserved.
LABVANTAGE Architecture 2012 LABVANTAGE Solutions, Inc. All Rights Reserved. DOCUMENT PURPOSE AND SCOPE This document provides an overview of the LABVANTAGE hardware and software architecture. It is written
More informationMigration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud
Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud Use case Figure 1: Company C Architecture (Before Migration) Company C is an automobile insurance claim processing company with
More informationIBM Cognos TM1 on Cloud Solution scalability with rapid time to value
IBM Solution scalability with rapid time to value Cloud-based deployment for full performance management functionality Highlights Reduced IT overhead and increased utilization rates with less hardware.
More informationDouble-Take Replication in the VMware Environment: Building DR solutions using Double-Take and VMware Infrastructure and VMware Server
Double-Take Replication in the VMware Environment: Building DR solutions using Double-Take and VMware Infrastructure and VMware Server Double-Take Software, Inc. 257 Turnpike Road; Suite 210 Southborough,
More informationENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE. October 2013
ENABLING DATA TRANSFER MANAGEMENT AND SHARING IN THE ERA OF GENOMIC MEDICINE October 2013 Introduction As sequencing technologies continue to evolve and genomic data makes its way into clinical use and
More informationProtecting Big Data Data Protection Solutions for the Business Data Lake
White Paper Protecting Big Data Data Protection Solutions for the Business Data Lake Abstract Big Data use cases are maturing and customers are using Big Data to improve top and bottom line revenues. With
More informationQuantum Q-Cloud Backup-as-a-Service Reference Architecture
Quantum Q-Cloud Backup-as-a-Service Reference Architecture NOTICE This Technology Brief may contain proprietary information protected by copyright. Information in this Technology Brief is subject to change
More informationRelocating Windows Server 2003 Workloads
Relocating Windows Server 2003 Workloads An Opportunity to Optimize From Complex Change to an Opportunity to Optimize There is much you need to know before you upgrade to a new server platform, and time
More informationcloud functionality: advantages and Disadvantages
Whitepaper RED HAT JOINS THE OPENSTACK COMMUNITY IN DEVELOPING AN OPEN SOURCE, PRIVATE CLOUD PLATFORM Introduction: CLOUD COMPUTING AND The Private Cloud cloud functionality: advantages and Disadvantages
More informationUsing DeployR to Solve the R Integration Problem
DEPLOYR WHITE PAPER Using DeployR to olve the R Integration Problem By the Revolution Analytics DeployR Team March 2015 Introduction Organizations use analytics to empower decision making, often in real
More informationFREE computing using Amazon EC2
FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationIntroduction to Gluster. Versions 3.0.x
Introduction to Gluster Versions 3.0.x Table of Contents Table of Contents... 2 Overview... 3 Gluster File System... 3 Gluster Storage Platform... 3 No metadata with the Elastic Hash Algorithm... 4 A Gluster
More informationThe SparkWeave Private Cloud & Secure Collaboration Suite. Core Features
The SparkWeave Private Cloud & Secure Collaboration Suite The SparkWeave Private Cloud is a virtual platform hosted in the customer s data center. SparkWeave is storage agnostic, autonomously providing
More informationDelivering the power of the world s most successful genomics platform
Delivering the power of the world s most successful genomics platform NextCODE Health is bringing the full power of the world s largest and most successful genomics platform to everyday clinical care NextCODE
More informationCUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series www.cumulux.com
` CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS Review Business and Technology Series www.cumulux.com Table of Contents Cloud Computing Model...2 Impact on IT Management and
More informationEMC DATA DOMAIN OPERATING SYSTEM
ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read
More informationThe Virtualization Practice
The Virtualization Practice White Paper: Managing Applications in Docker Containers Bernd Harzog Analyst Virtualization and Cloud Performance Management October 2014 Abstract Docker has captured the attention
More informationWHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?
WHAT IS FALCONSTOR? FalconStor Optimized Backup and Deduplication is the industry s market-leading virtual tape and LAN-based deduplication solution, unmatched in performance and scalability. With virtual
More informationEMC SOLUTION FOR SPLUNK
EMC SOLUTION FOR SPLUNK Splunk validation using all-flash EMC XtremIO and EMC Isilon scale-out NAS ABSTRACT This white paper provides details on the validation of functionality and performance of Splunk
More informationBuild A private PaaS. www.redhat.com
Build A private PaaS WITH Red Hat CloudForms and JBoss Enterprise Middleware www.redhat.com Introduction Platform-as-a-service (PaaS) is a cloud service model that provides consumers 1 with services for
More informationScaling LS-DYNA on Rescale HPC Cloud Simulation Platform
Scaling LS-DYNA on Rescale HPC Cloud Simulation Platform Joris Poort, President & CEO, Rescale, Inc. Ilea Graedel, Manager, Rescale, Inc. 1 Cloud HPC on the Rise 1.1 Background Engineering and science
More information19.10.11. Amazon Elastic Beanstalk
19.10.11 Amazon Elastic Beanstalk A Short History of AWS Amazon started as an ECommerce startup Original architecture was restructured to be more scalable and easier to maintain Competitive pressure for
More informationCluster, Grid, Cloud Concepts
Cluster, Grid, Cloud Concepts Kalaiselvan.K Contents Section 1: Cluster Section 2: Grid Section 3: Cloud Cluster An Overview Need for a Cluster Cluster categorizations A computer cluster is a group of
More informationAmazon Cloud Storage Options
Amazon Cloud Storage Options Table of Contents 1. Overview of AWS Storage Options 02 2. Why you should use the AWS Storage 02 3. How to get Data into the AWS.03 4. Types of AWS Storage Options.03 5. Object
More informationAvailability Digest. www.availabilitydigest.com. @availabilitydig. HPE Helion Private Cloud and Cloud Broker Services February 2016
the Availability Digest @availabilitydig HPE Helion Private Cloud and Cloud Broker Services February 2016 HPE Helion is a complete portfolio of cloud products and services that offers enterprise security,
More informationIBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE
White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores
More informationCloud Computing Now and the Future Development of the IaaS
2010 Cloud Computing Now and the Future Development of the IaaS Quanta Computer Division: CCASD Title: Project Manager Name: Chad Lin Agenda: What is Cloud Computing? Public, Private and Hybrid Cloud.
More informationDevOps with Containers. for Microservices
DevOps with Containers for Microservices DevOps is a Software Development Method Keywords Communication, collaboration, integration, automation, measurement Goals improved deployment frequency faster time
More informationWe look beyond IT. Cloud Offerings
Cloud Offerings cstor Cloud Offerings As today s fast-moving businesses deal with increasing demands for IT services and decreasing IT budgets, the onset of cloud-ready solutions has provided a forward-thinking
More informationHigh Availability of the Polarion Server
Polarion Software CONCEPT High Availability of the Polarion Server Installing Polarion in a high availability environment Europe, Middle-East, Africa: Polarion Software GmbH Hedelfinger Straße 60 70327
More informationProtecting the Microsoft Data Center with NetBackup 7.6
Protecting the Microsoft Data Center with NetBackup 7.6 Amit Sinha NetBackup Product Management 1 Major Components of a Microsoft Data Center Software Hardware Servers Disk Tape Networking Server OS Applications
More informationCloud Computing and Open Source: Watching Hype meet Reality
Cloud Computing and Open Source: Watching Hype meet Reality Rich Wolski UCSB Computer Science Eucalyptus Systems Inc. May 26, 2011 Exciting Weather Forecasts 99 M 167 M 6.5 M What is a cloud? SLAs Web
More informationCloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise
Cloud Service Model Selecting a cloud service model Different cloud service models within the enterprise Single cloud provider AWS for IaaS Azure for PaaS Force fit all solutions into the cloud service
More informationLandscape Design and Integration. SAP Mobile Platform 3.0 SP02
Landscape Design and Integration SAP Mobile Platform 3.0 SP02 DOCUMENT ID: DC01916-01-0302-01 LAST REVISED: February 2014 Copyright 2014 by SAP AG or an SAP affiliate company. All rights reserved. No part
More informationCloudera in the Public Cloud
Cloudera in the Public Cloud Deployment Options for the Enterprise Data Hub Version: Q414-102 Table of Contents Executive Summary 3 The Case for Public Cloud 5 Public Cloud vs On-Premise 6 Public Cloud
More informationArchitecture Overview
Qubell Adaptive Platform-as-a-Service, Enterprise Edition Architecture Overview 4600 Bohannon Drive, Menlo Park, CA 94025 T 888 855-8940 http://qubell.com Introduction Introduction Qubell Adaptive Platform-as-a-Service
More informationEnhanced Research Data Management and Publication with Globus
Enhanced Research Data Management and Publication with Globus Vas Vasiliadis Jim Pruyne Presented at OR2015 June 8, 2015 Presentations and other useful information available at globus.org/events/or2015/tutorial
More informationEMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise
EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise ESSENTIALS Easy-to-use, single volume, single file system architecture Highly scalable with
More informationReal-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising
Real-Time Analytics on Large Datasets: Predictive Models for Online Targeted Advertising Open Data Partners and AdReady April 2012 1 Executive Summary AdReady is working to develop and deploy sophisticated
More informationScientific and Technical Applications as a Service in the Cloud
Scientific and Technical Applications as a Service in the Cloud University of Bern, 28.11.2011 adapted version Wibke Sudholt CloudBroker GmbH Technoparkstrasse 1, CH-8005 Zurich, Switzerland Phone: +41
More informationIaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures
IaaS Cloud Architectures: Virtualized Data Centers to Federated Cloud Infrastructures Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF Introduction
More informationediscovery and Search of Enterprise Data in the Cloud
ediscovery and Search of Enterprise Data in the Cloud From Hype to Reality By John Patzakis & Eric Klotzko ediscovery and Search of Enterprise Data in the Cloud: From Hype to Reality Despite the enormous
More informationAxceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2
Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2 In the movie making, visual effects and 3D animation industrues meeting project and timing deadlines is critical to success. Poor quality
More informationOracle Database Backup Service. Secure Backup in the Oracle Cloud
Oracle Database Backup Service Secure Backup in the Oracle Cloud Today s organizations are increasingly adopting cloud-based IT solutions and migrating on-premises workloads to public clouds. The motivation
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationWhite Paper: Cloud Identity is Different. World Leading Directory Technology. Three approaches to identity management for cloud services
World Leading Directory Technology White Paper: Cloud Identity is Different Three approaches to identity management for cloud services Published: March 2015 ViewDS Identity Solutions A Changing Landscape
More informationCloud Server. Parallels. An Introduction to Operating System Virtualization and Parallels Cloud Server. White Paper. www.parallels.
Parallels Cloud Server White Paper An Introduction to Operating System Virtualization and Parallels Cloud Server www.parallels.com Table of Contents Introduction... 3 Hardware Virtualization... 3 Operating
More information