Parallel Computing with the Bioconductor Amazon Machine Image
|
|
|
- Thomas Stanley
- 10 years ago
- Views:
Transcription
1 Parallel Computing with the Bioconductor Amazon Machine Image Valerie Obenchain Fred Hutchinson Cancer Research Center Seattle, WA August 2014
2 Introduction Parallel programming Parallel infrastructure in R / Bioconductor packages BiocParallel back-ends Bioconductor Amazon Machine Image (AMI) Exercise 1: Instance resources Exercise 2: S3 buckets Copy Number Analysis Exercise 3: Set-up Exercise 4: Shared memory vs non Exercise 5A: Nested evaluation (built-in) Exercise 5B: Nested evaluation (roll your own) Resources Acknowledgements
3 Introduction This presentation provides an introduction to parallel computing with the Bioconductor Amazon Machine Image (AMI). It is designed for those who are familiar with R and Bioconductor. We ll review parallel infrastructure available in Bioconductor packages, specifically BiocParallel and GenomicFiles. Problem types well suited for parallel execution are discussed as well as advantages / disadvantages of computing in a single machine vs cluster environments. Exercises develop familarity with the AMI resources and explore data import and storage options. Steps in a copy number analysis are used to demonstrate parallel applications over different dimensions and problem configurations.
4 Parallel programming Computationally intensive problems that involve many independent calculations Involves: Decomposing an algorithm or data into parts ( split ) Distributing the tasks to multiple processors to be worked on simultaneously ( apply ) Gathering results ( combine ) Considerations: Type of parallel architecture Type of processor communication Candidate problems: Simulations, bootstrap, cross validation, local convergence algorithms Repeated operation on many genomic ranges or files such as coverage, counting, normalization, etc.
5 Other approaches Limiting resource consumption: Restriction: Appropriate when query only requires a fraction of the data (e.g., ScanBamParam) Compression: Represent the same data with fewer resources (Rle, Views, CompressedList classes) Iterating: Chunking through data to meet resource constraints (yieldsize)... and more at
6 Packages with parallel infrastructure parallel Incorporates multicore and snow packages; included in base R. BatchJobs Schedules jobs on batch systems foreach Framework with do* backends, domc, dosnow, dompi, etc. BiocParallel Incorporates parallel and BatchJobs packages. (... and there are others)
7 BiocParallel Why use BiocParallel? Load one package for all parallel back-ends Mirrored terminology of existing *apply functions bplapply, bpmapply, bpsapply, bpvec... Unified interface to back-ends via param object > registered() $MulticoreParam class: MulticoreParam; bpisup: TRUE; bpworkers: 4; catch.errors: TRUE setseed: TRUE; recursive: TRUE; cleanup: TRUE; cleanupsignal: 15; verbose: FALSE $SnowParam class: SnowParam; bpisup: FALSE; bpworkers: 4; catch.errors: TRUE cluster spec: 4; type: PSOCK $BatchJobsParam class: BatchJobsParam; bpisup: TRUE; bpworkers: NA; catch.errors: TRUE cleanup: TRUE; stop.on.error: FALSE; progressbar: TRUE $SerialParam class: SerialParam; bpisup: TRUE; bpworkers: 1; catch.errors: TRUE
8 GenomicFiles package (devel branch) reducebyfile(), reducebyrange(): Use BiocParallel under the hood to perform parallel computations by file or by range MAP and REDUCE concepts to reduce the dimension of the data reducebyyield(): Iterates through a file reducing output to single result (no built-in parallelism)
9 Single machine with multiple CPUs Multi-core approach uses fork system call to create workers Shared memory enables data sharing and code initialization No Windows support Appropriate for computationally-limiting problems, not memory-limiting > bpworkers() [1] 4 > MulticoreParam() class: MulticoreParam; bpisup: TRUE; bpworkers: 4; catch.errors: TRUE setseed: TRUE; recursive: TRUE; cleanup: TRUE; cleanupsignal: 15; verbose: FALSE
10 Clusters Ad hoc cluster of multiple machines: Create a cluster of workers communicating with MPI or sockets Memory not shared; all data must be passed, code initialized Need SSH access to all machines > SnowParam(workers = 4, type = "PSOCK") class: SnowParam; bpisup: FALSE; bpworkers: 4; catch.errors: TRUE cluster spec: 4; type: PSOCK > SnowParam(workers = 4, type = "MPI") class: SnowParam; bpisup: FALSE; bpworkers: 4; catch.errors: TRUE cluster spec: 4; type: MPI Cluster with formal scheduler: Uses software application to submit, control and monitor jobs Specify resources and script, software creates the cluster > BatchJobsParam(workers = 4, resources = list(ncpus=1)) class: BatchJobsParam; bpisup: TRUE; bpworkers: 4; catch.errors: TRUE cleanup: TRUE; stop.on.error: FALSE; progressbar: TRUE
11 Bioconductor AMI What is it? Amazon Machine Image that runs in Elastic Compute Cloud (EC2) Comes pre-loaded with many Bioconductor software and annotation packages and their dependencies Why use it? AMIs for different versions of R Add CPUs and/or memory in flexible configurations Run from any device, PC, tablet or other
12 Bioconductor AMI: Getting started General steps: Create AWS account Create key pair Single machine: Launch AMI, choose resources, create stack Cluster: Install StarCluster, edit config file, start cluster Access AMI vis SSH or RStudio Full details: bioconductor-cloud-ami/#overview
13 Bioconductor AMI: Available resources Instances optimized for compute, memory, storage, GPU... Instance details: Pricing details: Free usage tier to encourage new users... Max resources currently available: - Up to 32 virtual cores per instance - Up to 244 GIG RAM per instance
14 Selecting and allocating resources Experiment with a small example to understand cpu (time) and memory (space) needs of your program. General rule of thumb is to specify 1 worker per core. May want to allocate more workers than cores if a process were limiting or irregular (i.e., data input). This would result in a pool of idle cores waiting for next available chunk of data vs stalling a pipeline. What usually happens is that workers are running at the same time and swapping in and out of the available processors as they compete for time. The overhead seen here is that swapping - sometimes referred to as over-subscribing. > workers <- c(bpworkers(), bpworkers() + 5, bpworkers() + 10) > FUN <- function(i) + system.time(bplapply(1:1000, sqrt, BPPARAM = MulticoreParam(i))) > do.call(rbind, lapply(workers, FUN)) user.self sys.self elapsed user.child sys.child [1,] [2,] [3,]
15 Exercise 1: AMI resources In this exercise we investigate local resources associated with the c3.4xlarge AMI created for this lab. Full descriptions of the AWS instances can be found at a. Processors: Load BiocParallel package and check the number of workers with bpworkers. b. Local storage: Check the amount of local storage on the instance by selecting Shell from the Tools drop-down menu. In the box type df -h. The df command stands for disk filesystem and the -h option shows disk space in human readable form (i.e., units along with raw numbers). c. RAM: Again using the Shell, type free -m -h. d. Compare the statistics for the c3.4xlarge AMI to those of the conference AMI (m3.large).
16 Data storage options EC2 instance store: Storage physically attached to instance Evaporates when instance is shut down EBS: S3: Persistent block-level storage for use with EC2 instances Network attached so some latency can be expected Volume is attached to instance in same availability zone Data storage service not a filesystem Transfer between S3 and EC2 is free. Best for storing items that must persist - push nightly backups, results etc.
17 Data storage options No clear consensus regarding best I/O performance. A Systematic Look at EC2 I/O Take home was that performance varies widely across instances; offers a few general guidelines depending upon I/O situation. Cloud Application Architechtures Table: Comparison of EC2 data storage options S3 Instance EBS Speed Low Unpredictable High Reliability Medium High High Durability Super high Super low High
18 Data access from and transfer to EC2 RStudio: Import Dataset: imports file into workspace Upload files: uploads file to local EC2 storage SSH: Use wget, ftp, Dropbox to transfer files to local storage Other command line tools: wget, web applications s3cmd ( aws cli ( RAmazonS3 package
19 Exercise 2: S3 buckets This exercise explores the data in Amazon S3 storage. We use the RAmazonS3 package instead of a command line tool because it allows us to browse and download public S3 data without authorization credentials. Package details are available at a. Load the RAmazonS3 package. b. Look at the man page for listbucket. Get a list of items in the S genomes bucket by calling listbucket with arguments name = "1000genomes", auth = NA, and maxkeys = 500. c. Retrieve the HG00096 chromosome 20 low coverage bam file. First construct a url composed of the key (file path) and bucket name. > bucketname <- "1000genomes" > key <- "data/hg00096/alignment/ + HG00096.chrom20.ILLUMINA.bwa.GBR.low_coverage bam" > url <- paste0(" bucketname, "/", key) Upload the file to the EC2 local storage using the Upload file tab in RStudio or the download.file function.
20 Copy Number Analysis The following exercises use data from a copy number analysis performed in High-resolution mapping of copy-number alterations with massively parallel sequencing, http: // We use one of the three matched tumor normal pairs presented in the paper, the HCC1954 cell line from breast adenocarcinoma. Thanks to Sonali Arora who downloaded the raw data from the Short Read Archive, converted the files to fastq, checked quality and aligned the reads with bowtie2 to create the BAM files used here.
21 Exercise 3: Set-up a. Load the BiocParallel, Rsamtools, cn.mops, and DNAcopy packages. b. Locate the 4 BAM files (2 tumor, 2 normal) exp srx bam, exp srx bam, exp srx bam and exp srx bam. c. Create a BamFileList from the BAM filenames. d. Create a character vector group of tumor and control corresponding to the order of the files in the BamFileList. e. Extract the SeqInfo calling seqinfo on on of the BamFiles.
22 Exercise 4: Shared memory vs non Before we assess copy number let s look at the read coverage in a region of interest on chromosome 4. a. Extract chromosome 4 from the SeqInfo object created in Exercise 3. Create a tiling of ranges over chromosome 4 with tilegenome; use the chromosome 4 SeqInfo as the seqlengths argument and set tilewidth = 1e4. b. Create a ScanBamParam using the tiled ranges as the as the which argument. which requires a GRanges so the tiled ranges (GRangesList) need to be unlisted first. c. Load the GenomicAlignments package. d. Create a MulticoreParam and a SnowParam with enough workers to iterate over the BAM files.
23 Exercise 4 continued... e. Create two versions of a function, FUN1 and FUN2, that compute coverage on the BAM files in the regions specified by the ScanBamParam. FUN1 should take advantage of shared memory by not passing (i.e., making a copy of) large objects defined in the workspace not loading / initializing libraries currently loaded in the workspace. The man page for coverage on a BamFile is at?`coverage,bamfile-method` (NOTE: you must type the quotes.) f. Use bplapply to compute coverage on each file in the BamFileList with FUN1 and the MultiCoreParam as the BPPARAM argument to bplapply. g. Repeat the coverage calculation with FUN2 and the SnowParam. We don t have a cluster available for testing; configuring the problem with the SnowParam simulates the non-shared memory environment of a cluster.
24 Exercise 5A: Nested evaluation (built-in) Next we count reads with getreadcountsfrombam and compute copy number with referencecn.mops. Read counting is an independent operation and can be done separately by file. Computing copy with referencecn.mops is not independent and requires that the case/control data be analyzed together. Many R / Bioconductor functions offer built in parallel evaluation. getreadcountsfrombam can compute in parallel over the files. In this example we couple parallel evaluation over the chromosomes with the built-in parallel evaluation over the files. a. FUN3 counts reads with getreadcountsfrombam and computes copy number with referencecn.mops. > FUN3 <- function(chrom, files, WL, group,...) { + library(cn.mops) + counts <- getreadcountsfrombam(files, WL = WL, mode = "unpaired", + refseqname = chrom, + parallel = length(files)) + referencecn.mops(cases = counts[,group == "tumor"], + controls = counts[,group == "normal"]) + }
25 Exercise 5A continued... b. Create a vector of chromosomes of interest, chrom <- c("chr1", "chr4", "chr17"). c. Create a SnowParam with one worker per chromosome. d. How many total workers are required (SnowParam + parallel argument)? Are there enough workers on the machine for the job? e. Use bplapply to apply FUN3 over the chrom vector.
26 Exercise 5B: nested evaluation (roll your own) Let s assume getreadcountsfrombam does not have built-in parallel evaluation. In this case we can roll our own second level of execution. a. Modify FUN3 by wrapping getreadcountsfrombam in a bplapply statement with a MulticoreParam as the BPPARAM. (Remove the parallel argument from getreadcountsfrombam. Using a SnowParam to distribute the first level of parallel work followed by a MulticoreParam simulates how a job might be run on a particular cluster configuration. The first distribution passes data across cluster workers (could be individual nodes; no shared memory) while the second round of workers are spawned in a shared memory environment. b. Call bplapply over chrom with the modified FUN3.
27 Resources CRAN task view: HighPerformanceComputing.html Scalable Genomic Computing and Visualization with R and Bioconductor (2014) State of the Art Parallel Computing with R (2009) Scalable Integrative Bioinformatics with Bioconductor /ISMB2014/
28 Acknowledgements Bioconductor AMI author: Dan Tenenbaum BiocParallel authors: Martin Morgan, Michel Lang, and Ryan Thompson Bioconductor team: Marc Carlson, Hervé Pagès, Dan Tenenbaum, Sonali Arora, Nate Hayden, Paul Shannon, Martin Morgan Technical advisory council: Vincent Carey, Wolfgang Huber, Robert Gentleman, Rafael Irizzary, Sean Davis, Kasper Hansen, Michael Lawrence. Scientific advisory board: Simon Tavaré, Vivian Bonazzi, Vincent Carey, Wolfgang Huber, Robert Gentleman, Rafael Irizzary, Paul Flicek, Simon Urbanek. NIH / NHGRI U41HG and the Bioconductor community
Cloud Computing. Adam Barker
Cloud Computing Adam Barker 1 Overview Introduction to Cloud computing Enabling technologies Different types of cloud: IaaS, PaaS and SaaS Cloud terminology Interacting with a cloud: management consoles
CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei
CSE 344 Introduction to Data Management Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei Homework 8 Big Data analysis on billion triple dataset using Amazon Web Service (AWS) Billion Triple Set: contains
Introduction to analyzing big data using Amazon Web Services
Introduction to analyzing big data using Amazon Web Services This tutorial accompanies the BARC seminar given at Whitehead on January 31, 2013. It contains instructions for: 1. Getting started with Amazon
Zend Server Amazon AMI Quick Start Guide
Zend Server Amazon AMI Quick Start Guide By Zend Technologies www.zend.com Disclaimer This is the Quick Start Guide for The Zend Server Zend Server Amazon Machine Image The information in this document
Backup and Recovery of SAP Systems on Windows / SQL Server
Backup and Recovery of SAP Systems on Windows / SQL Server Author: Version: Amazon Web Services sap- on- [email protected] 1.1 May 2012 2 Contents About this Guide... 4 What is not included in this guide...
Building a Private Cloud Cloud Infrastructure Using Opensource
Cloud Infrastructure Using Opensource with Ubuntu Server 10.04 Enterprise Cloud (Eucalyptus) OSCON (Note: Special thanks to Jim Beasley, my lead Cloud Ninja, for putting this document together!) Introduction
Resource Sizing: Spotfire for AWS
Resource Sizing: for AWS With TIBCO for AWS, you can have the best in analytics software available at your fingertips in just a few clicks. On a single Amazon Machine Image (AMI), you get a multi-user
Hadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
FREE computing using Amazon EC2
FREE computing using Amazon EC2 Seong-Hwan Jun 1 1 Department of Statistics Univ of British Columbia Nov 1st, 2012 / Student seminar Outline Basics of servers Amazon EC2 Setup R on an EC2 instance Stat
Managing and Conducting Biomedical Research on the Cloud Prasad Patil
Managing and Conducting Biomedical Research on the Cloud Prasad Patil Laboratory for Personalized Medicine Center for Biomedical Informatics Harvard Medical School SaaS & PaaS gmail google docs app engine
Cloud Computing and Amazon Web Services
Cloud Computing and Amazon Web Services Gary A. McGilvary edinburgh data.intensive research 1 OUTLINE 1. An Overview of Cloud Computing 2. Amazon Web Services 3. Amazon EC2 Tutorial 4. Conclusions 2 CLOUD
Cloud Computing For Bioinformatics
Cloud Computing For Bioinformatics Cloud Computing: what is it? Cloud Computing is a distributed infrastructure where resources, software, and data are provided in an on-demand fashion. Cloud Computing
Early Cloud Experiences with the Kepler Scientific Workflow System
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow
Hadoopizer : a cloud environment for bioinformatics data analysis
Hadoopizer : a cloud environment for bioinformatics data analysis Anthony Bretaudeau (1), Olivier Sallou (2), Olivier Collin (3) (1) [email protected], INRIA/Irisa, Campus de Beaulieu, 35042,
Chapter 9 PUBLIC CLOUD LABORATORY. Sucha Smanchat, PhD. Faculty of Information Technology. King Mongkut s University of Technology North Bangkok
CLOUD COMPUTING PRACTICE 82 Chapter 9 PUBLIC CLOUD LABORATORY Hand on laboratory based on AWS Sucha Smanchat, PhD Faculty of Information Technology King Mongkut s University of Technology North Bangkok
TECHNOLOGY WHITE PAPER Jun 2012
TECHNOLOGY WHITE PAPER Jun 2012 Technology Stack C# Windows Server 2008 PHP Amazon Web Services (AWS) Route 53 Elastic Load Balancing (ELB) Elastic Compute Cloud (EC2) Amazon RDS Amazon S3 Elasticache
19.10.11. Amazon Elastic Beanstalk
19.10.11 Amazon Elastic Beanstalk A Short History of AWS Amazon started as an ECommerce startup Original architecture was restructured to be more scalable and easier to maintain Competitive pressure for
Getting Started with Amazon EC2 Management in Eclipse
Getting Started with Amazon EC2 Management in Eclipse Table of Contents Introduction... 4 Installation... 4 Prerequisites... 4 Installing the AWS Toolkit for Eclipse... 4 Retrieving your AWS Credentials...
Amazon EC2 Product Details Page 1 of 5
Amazon EC2 Product Details Page 1 of 5 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment, allowing you to use web service interfaces to launch instances with a variety of
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
A Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros
David Moses January 2014 Paper on Cloud Computing I Background on Tools and Technologies in Amazon Web Services (AWS) In this paper I will highlight the technologies from the AWS cloud which enable you
How AWS Pricing Works
How AWS Pricing Works (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction... 3 Fundamental
Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using
Cloud computing is a marketing term that means different things to different people. In this presentation, we look at the pros and cons of using Amazon Web Services rather than setting up a physical server
Online Backup Guide for the Amazon Cloud: How to Setup your Online Backup Service using Vembu StoreGrid Backup Virtual Appliance on the Amazon Cloud
Online Backup Guide for the Amazon Cloud: How to Setup your Online Backup Service using Vembu StoreGrid Backup Virtual Appliance on the Amazon Cloud Here is a step-by-step set of instructions to get your
ArcGIS 10.3 Server on Amazon Web Services
ArcGIS 10.3 Server on Amazon Web Services Copyright 1995-2015 Esri. All rights reserved. Table of Contents Introduction What is ArcGIS Server on Amazon Web Services?............................... 5 Quick
PostgreSQL Performance Characteristics on Joyent and Amazon EC2
OVERVIEW In today's big data world, high performance databases are not only required but are a major part of any critical business function. With the advent of mobile devices, users are consuming data
Running R from Amazon's Elastic Compute Cloud
Running R on the Running R from Amazon's Elastic Compute Cloud Department of Statistics University of NebraskaLincoln April 30, 2014 Running R on the 1 Introduction 2 3 Running R on the Pre-made AMI Building
Hands-on Exercises with Big Data
Hands-on Exercises with Big Data Lab Sheet 1: Getting Started with MapReduce and Hadoop The aim of this exercise is to learn how to begin creating MapReduce programs using the Hadoop Java framework. In
MATLAB Distributed Computing Server Cloud Center User s Guide
MATLAB Distributed Computing Server Cloud Center User s Guide How to Contact MathWorks Latest news: Sales and services: User community: Technical support: www.mathworks.com www.mathworks.com/sales_and_services
Cloud Models and Platforms
Cloud Models and Platforms Dr. Sanjay P. Ahuja, Ph.D. 2010-14 FIS Distinguished Professor of Computer Science School of Computing, UNF A Working Definition of Cloud Computing Cloud computing is a model
Best Practices for Sharing Imagery using Amazon Web Services. Peter Becker
Best Practices for Sharing Imagery using Amazon Web Services Peter Becker Objectives Making Imagery Accessible Store massive volumes of imagery on inexpensive cloud storage Use elastic compute for image
www.basho.com Technical Overview Simple, Scalable, Object Storage Software
www.basho.com Technical Overview Simple, Scalable, Object Storage Software Table of Contents Table of Contents... 1 Introduction & Overview... 1 Architecture... 2 How it Works... 2 APIs and Interfaces...
Cloud computing - Architecting in the cloud
Cloud computing - Architecting in the cloud [email protected] 1 Outline Cloud computing What is? Levels of cloud computing: IaaS, PaaS, SaaS Moving to the cloud? Architecting in the cloud Best practices
AWS Account Setup and Services Overview
AWS Account Setup and Services Overview 1. Purpose of the Lab Understand definitions of various Amazon Web Services (AWS) and their use in cloud computing based web applications that are accessible over
TECHNOLOGY WHITE PAPER Jan 2016
TECHNOLOGY WHITE PAPER Jan 2016 Technology Stack C# PHP Amazon Web Services (AWS) Route 53 Elastic Load Balancing (ELB) Elastic Compute Cloud (EC2) Amazon RDS Amazon S3 Elasticache CloudWatch Paypal Overview
Amazon Elastic Compute Cloud Getting Started Guide. My experience
Amazon Elastic Compute Cloud Getting Started Guide My experience Prepare Cell Phone Credit Card Register & Activate Pricing(Singapore) Region Amazon EC2 running Linux(SUSE Linux Windows Windows with SQL
Code and Process Migration! Motivation!
Code and Process Migration! Motivation How does migration occur? Resource migration Agent-based system Details of process migration Lecture 6, page 1 Motivation! Key reasons: performance and flexibility
wu.cloud: Insights Gained from Operating a Private Cloud System
wu.cloud: Insights Gained from Operating a Private Cloud System Stefan Theußl, Institute for Statistics and Mathematics WU Wirtschaftsuniversität Wien March 23, 2011 1 / 14 Introduction In statistics we
Chapter 19 Cloud Computing for Multimedia Services
Chapter 19 Cloud Computing for Multimedia Services 19.1 Cloud Computing Overview 19.2 Multimedia Cloud Computing 19.3 Cloud-Assisted Media Sharing 19.4 Computation Offloading for Multimedia Services 19.5
ST 810, Advanced computing
ST 810, Advanced computing Eric B. Laber & Hua Zhou Department of Statistics, North Carolina State University January 30, 2013 Supercomputers are expensive. Eric B. Laber, 2011, while browsing the internet.
VX 9000E WiNG Express Manager INSTALLATION GUIDE
VX 9000E WiNG Express Manager INSTALLATION GUIDE 2 VX 9000E WiNG Express Manager Service Information If you have a problem with your equipment, contact support for your region. Support and issue resolution
How AWS Pricing Works May 2015
How AWS Pricing Works May 2015 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction...
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
An Introduction to Cloud Computing Concepts
Software Engineering Competence Center TUTORIAL An Introduction to Cloud Computing Concepts Practical Steps for Using Amazon EC2 IaaS Technology Ahmed Mohamed Gamaleldin Senior R&D Engineer-SECC [email protected]
CONDOR CLUSTERS ON EC2
CONDOR CLUSTERS ON EC2 Val Hendrix, Roberto A. Vitillo Lawrence Berkeley National Lab ATLAS Cloud Computing R & D 1 INTRODUCTION This is our initial work on investigating tools for managing clusters and
Scalable Cloud Computing Solutions for Next Generation Sequencing Data
Scalable Cloud Computing Solutions for Next Generation Sequencing Data Matti Niemenmaa 1, Aleksi Kallio 2, André Schumacher 1, Petri Klemelä 2, Eija Korpelainen 2, and Keijo Heljanko 1 1 Department of
Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud Edition Lustre* and Amazon Web Services
Reference Architecture Developing Storage Solutions with Intel Cloud Edition for Lustre* and Amazon Web Services Developing High-Performance, Scalable, cost effective storage solutions with Intel Cloud
Deploying Splunk on Amazon Web Services
Copyright 2014 Splunk Inc. Deploying Splunk on Amazon Web Services Simeon Yep Senior Manager, Business Development Technical Services Roy Arsan Senior SoHware Engineer Disclaimer During the course of this
Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH
Cloud Computing for Control Systems CERN Openlab Summer Student Program 9/9/2011 ARSALAAN AHMED SHAIKH CONTENTS Introduction... 4 System Components... 4 OpenNebula Cloud Management Toolkit... 4 VMware
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.
Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc. 2015 The MathWorks, Inc. 1 Challenges of Big Data Any collection of data sets so large and complex that it becomes difficult
Alfresco Enterprise on AWS: Reference Architecture
Alfresco Enterprise on AWS: Reference Architecture October 2013 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 13 Abstract Amazon Web Services (AWS)
Amazon Web Services Student Tutorial
Amazon Web Services Free Usage Tier Elastic Compute Cloud Amazon Web Services Student Tutorial David Palma Joseph Snow CSC 532: Advanced Software Engineering Louisiana Tech University October 4, 2012 Amazon
AdWhirl Open Source Server Setup Instructions
AdWhirl Open Source Server Setup Instructions 11/09 AdWhirl Server Setup Instructions The server runs in Amazon s web cloud. To set up the server, you need an Amazon Web Services (AWS) account and the
Understanding AWS Storage Options
Understanding AWS Storage Options Ian Massingham, Technical Evangelist @IanMmmm [email protected] 30 April 2014 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified,
Tableau Server Scalability Explained
Tableau Server Scalability Explained Author: Neelesh Kamkolkar Tableau Software July 2013 p2 Executive Summary In March 2013, we ran scalability tests to understand the scalability of Tableau 8.0. We wanted
Towards Elastic Application Model for Augmenting Computing Capabilities of Mobile Platforms. Mobilware 2010
Towards lication Model for Augmenting Computing Capabilities of Mobile Platforms Mobilware 2010 Xinwen Zhang, Simon Gibbs, Anugeetha Kunjithapatham, and Sangoh Jeong Computer Science Lab. Samsung Information
Rstudio Server on Amazon EC2
Rstudio Server on Amazon EC2 Liad Shekel [email protected] June 2015 Liad Shekel Rstudio Server on Amazon EC2 1 / 72 Rstudio Server on Amazon EC2 Outline 1 Amazon Web Services (AWS) History Services
Tableau Server 7.0 scalability
Tableau Server 7.0 scalability February 2012 p2 Executive summary In January 2012, we performed scalability tests on Tableau Server to help our customers plan for large deployments. We tested three different
This computer will be on independent from the computer you access it from (and also cost money as long as it s on )
Even though you need a computer to access your instance, you are running on a machine with different capabilities, great or small This computer will be on independent from the computer you access it from
How To Build A Supermicro Computer With A 32 Core Power Core (Powerpc) And A 32-Core (Powerpc) (Powerpowerpter) (I386) (Amd) (Microcore) (Supermicro) (
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 7 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
Introduction to grid technologies, parallel and cloud computing. Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber
Introduction to grid technologies, parallel and cloud computing Alaa Osama Allam Saida Saad Mohamed Mohamed Ibrahim Gaber OUTLINES Grid Computing Parallel programming technologies (MPI- Open MP-Cuda )
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# This Lecture" The Big Data Problem" Hardware for Big Data" Distributing Work" Handling Failures and Slow Machines" Map Reduce and Complex Jobs"
Determining the IOPS Needs for Oracle Database on AWS
Determining the IOPS Needs for Oracle Database on AWS Abdul Sathar Sait Jinyoung Jung Amazon Web Services December 2014 Last update: April 2016 Contents Abstract 2 Introduction 2 Storage Options for Oracle
Moving Drupal to the Cloud: A step-by-step guide and reference document for hosting a Drupal web site on Amazon Web Services
Moving Drupal to the Cloud: A step-by-step guide and reference document for hosting a Drupal web site on Amazon Web Services MCN 2009: Cloud Computing Primer Workshop Charles Moad
Comsol Multiphysics. Running COMSOL on the Amazon Cloud. VERSION 4.3a
Comsol Multiphysics Running COMSOL on the Amazon Cloud VERSION 4.3a Running COMSOL on the Amazon Cloud 1998 2012 COMSOL Protected by U.S. Patents 7,519,518; 7,596,474; and 7,623,991. Patents pending. This
Cloud-Based Big Data Analytics in Bioinformatics
Cloud-Based Big Data Analytics in Bioinformatics Presented By Cephas Mawere Harare Institute of Technology, Zimbabwe 1 Introduction 2 Big Data Analytics Big Data are a collection of data sets so large
Chapter 2: Getting Started
Chapter 2: Getting Started Once Partek Flow is installed, Chapter 2 will take the user to the next stage and describes the user interface and, of note, defines a number of terms required to understand
Monitoring and Scaling My Application
Monitoring and Scaling My Application In the last chapter, we looked at how we could use Amazon's queuing and notification services to add value to our existing application. We looked at how we could use
Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud
Parallel Data Mining and Assurance Service Model Using Hadoop in Cloud Aditya Jadhav, Mahesh Kukreja E-mail: [email protected] & [email protected] Abstract : In the information industry,
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
jbase 5 Install on Amazon AWS a Primer
jbase 5 Install on Amazon AWS a Primer Revision 1.0 August 2012 1 jbase 5 Install on Amazon AWS This document contains proprietary information that is protected by copyright. No part of this document may
Practical Solutions for Big Data Analytics
Practical Solutions for Big Data Analytics Ravi Madduri Computation Institute ([email protected]) Paul Dave ([email protected]) Dinanath Sulakhe ([email protected]) Alex Rodriguez ([email protected])
The Cost of the Cloud. Steve Saporta CTO, SwipeToSpin Mar 20, 2015
The Cost of the Cloud Steve Saporta CTO, SwipeToSpin Mar 20, 2015 The SwipeToSpin product SpinCar 360 WalkAround JPEG images HTML JavaScript CSS WA for short Creating a WA 1. Download and parse CSV file
Parallel Analysis and Visualization on Cray Compute Node Linux
Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory
A Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
Computational infrastructure for NGS data analysis. José Carbonell Caballero Pablo Escobar
Computational infrastructure for NGS data analysis José Carbonell Caballero Pablo Escobar Computational infrastructure for NGS Cluster definition: A computer cluster is a group of linked computers, working
Amazon Web Services EC2 & S3
2010 Amazon Web Services EC2 & S3 John Jonas FireAlt 3/2/2010 Table of Contents Introduction Part 1: Amazon EC2 What is an Amazon EC2? Services Highlights Other Information Part 2: Amazon Instances What
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud
IBM Platform Computing Cloud Service Ready to use Platform LSF & Symphony clusters in the SoftLayer cloud February 25, 2014 1 Agenda v Mapping clients needs to cloud technologies v Addressing your pain
Hadoop. Bioinformatics Big Data
Hadoop Bioinformatics Big Data Paolo D Onorio De Meo Mattia D Antonio [email protected] [email protected] Big Data Too much information! Big Data Explosive data growth proliferation of data capture
File Protection using rsync. Setup guide
File Protection using rsync Setup guide Contents 1. Introduction... 2 Documentation... 2 Licensing... 2 Overview... 2 2. Rsync technology... 3 Terminology... 3 Implementation... 3 3. Rsync data hosts...
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2
DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing Slide 1 Slide 3 A style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet.
Parallel Options for R
Parallel Options for R Glenn K. Lockwood SDSC User Services [email protected] Motivation "I just ran an intensive R script [on the supercomputer]. It's not much faster than my own machine." Motivation "I
Using SUSE Studio to Build and Deploy Applications on Amazon EC2. Guide. Solution Guide Cloud Computing. www.suse.com
Using SUSE Studio to Build and Deploy Applications on Amazon EC2 Guide Solution Guide Cloud Computing Cloud Computing Solution Guide Using SUSE Studio to Build and Deploy Applications on Amazon EC2 Quickly
Fault-Tolerant Computer System Design ECE 695/CS 590. Putting it All Together
Fault-Tolerant Computer System Design ECE 695/CS 590 Putting it All Together Saurabh Bagchi ECE/CS Purdue University ECE 695/CS 590 1 Outline Looking at some practical systems that integrate multiple techniques
Continuous Delivery on AWS. Version 1.0 DO NOT DISTRIBUTE
Continuous Version 1.0 Copyright 2013, 2014 Amazon Web Services, Inc. and its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written
Bringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks [email protected] 2015 The MathWorks, Inc. 1 Data is the sword of the
StorReduce Technical White Paper Cloud-based Data Deduplication
StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog
ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy
ZEN LOAD BALANCER EE v3.04 DATASHEET The Load Balancing made easy OVERVIEW The global communication and the continuous growth of services provided through the Internet or local infrastructure require to
Big Data Primer. 1 Why Big Data? Alex Sverdlov [email protected]
Big Data Primer Alex Sverdlov [email protected] 1 Why Big Data? Data has value. This immediately leads to: more data has more value, naturally causing datasets to grow rather large, even at small companies.
COURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
Migration Scenario: Migrating Batch Processes to the AWS Cloud
Migration Scenario: Migrating Batch Processes to the AWS Cloud Produce Ingest Process Store Manage Distribute Asset Creation Data Ingestor Metadata Ingestor (Manual) Transcoder Encoder Asset Store Catalog
GeoCloud Project Report GEOSS Clearinghouse
GeoCloud Project Report GEOSS Clearinghouse Qunying Huang, Doug Nebert, Chaowei Yang, Kai Liu 2011.12.06 Description of Application GEOSS clearinghouse is a FGDC, GEO, and NASA project that connects directly
