Cloud Computing @ JPL Science Data Systems



Similar documents
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Creating A Galactic Plane Atlas With Amazon Web Services

Cloud 101. Mike Gangl, Caltech/JPL, 2015 California Institute of Technology. Government sponsorship acknowledged

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Cloud Computing Technology

Cloud Computing. Adam Barker

Keystone Image Management System

Deploying a Geospatial Cloud

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Open Cloud System. (Integration of Eucalyptus, Hadoop and AppScale into deployment of University Private Cloud)

Eucalyptus-Based. GSAW 2010 Working Group Session 11D. Nehal Desai

Viswanath Nandigam Sriram Krishnan Chaitan Baru

Open source software for building a private cloud

Using WebSphere Application Server on Amazon EC2. Speaker(s): Ed McCabe, Arthur Meloy

Introduction to Cloud Computing

White Paper on CLOUD COMPUTING

Data Refinery with Big Data Aspects

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect


Cloud Computing. Course: Designing and Implementing Service Oriented Business Processes

SOLUTION WHITE PAPER. Building a flexible, intelligent cloud

Cluster, Grid, Cloud Concepts

Cloud Models and Platforms

ESPRESSO: An Encryption as a Service for Cloud Storage Systems

Ganzheitliches Datenmanagement

Cloud computing - Architecting in the cloud

Final Project Proposal. CSCI.6500 Distributed Computing over the Internet

Clodoaldo Barrera Chief Technical Strategist IBM System Storage. Making a successful transition to Software Defined Storage

White Paper. Cloud Native Advantage: Multi-Tenant, Shared Container PaaS. Version 1.1 (June 19, 2012)

Sistemi Operativi e Reti. Cloud Computing

Session 3. the Cloud Stack, SaaS, PaaS, IaaS

Cloud Computing for Spacecraft Operations

Cloud Computing and Amazon Web Services

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Scalable Architecture on Amazon AWS Cloud

A CLOUD-BASED FRAMEWORK FOR ONLINE MANAGEMENT OF MASSIVE BIMS USING HADOOP AND WEBGL

Cloud Computing Trends

Permanent Link:

Leveraging BlobSeer to boost up the deployment and execution of Hadoop applications in Nimbus cloud environments on Grid 5000

Investigation of Cloud Computing: Applications and Challenges

PROPOSAL To Develop an Enterprise Scale Disease Modeling Web Portal For Ascel Bio Updated March 2015

See Appendix A for the complete definition which includes the five essential characteristics, three service models, and four deployment models.

C Examcollection.Premium.Exam.34q

Cloud Lifecycle Management

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

How AWS Pricing Works

Sriram Krishnan, Ph.D.

How To Compare Cloud Computing To Cloud Platforms And Cloud Computing

Session 5. Mixing and matching Public, Private and Hybrid Clouds for maximum benefits

A STUDY ON CLOUD STORAGE

Relocating Windows Server 2003 Workloads

Building Blocks of the Private Cloud

CiteSeer x in the Cloud

Cloud-based Infrastructures. Serving INSPIRE needs

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Hybrid Cloud Delivery Managing Cloud Services from Request to Retirement SOLUTION WHITE PAPER

HPC technology and future architecture

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Can High-Performance Interconnects Benefit Memcached and Hadoop?

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Cloud computing: the IBM point of view

ediscovery and Search of Enterprise Data in the Cloud

How AWS Pricing Works May 2015

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Apache Hadoop. Alexandru Costan

Bringing Big Data Modelling into the Hands of Domain Experts

Cloud Computing Training

International Journal of Advance Research in Computer Science and Management Studies

StorReduce Technical White Paper Cloud-based Data Deduplication

Learning Management Redefined. Acadox Infrastructure & Architecture

Duke University

A Service for Data-Intensive Computations on Virtual Clusters

Reference Architecture and Best Practices for Virtualizing Hadoop Workloads Justin Murray VMware

Big data blue print for cloud architecture

Technical aspects of Cloud computing. Luís Ferreira Pires University of Twente Meeting of the NVvIR, 17 June 2010

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Cloud Computing Summary and Preparation for Examination

How cloud computing can transform your business landscape

Oracle Big Data SQL Technical Update

Data Mining for Data Cloud and Compute Cloud

Planning, Provisioning and Deploying Enterprise Clouds with Oracle Enterprise Manager 12c Kevin Patterson, Principal Sales Consultant, Enterprise

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Cloud Computing Submitted By : Fahim Ilyas ( ) Submitted To : Martin Johnson Submitted On: 31 st May, 2009

Role of Cloud Computing in Big Data Analytics Using MapReduce Component of Hadoop

Analyzing Cloud Costs

9/26/2011. What is Virtualization? What are the different types of virtualization.

Microsoft Private Cloud Fast Track

Optimizing IT Deployment Issues

CSE 344 Introduction to Data Management. Section 9: AWS, Hadoop, Pig Latin TA: Yi-Shu Wei

NASA Earth System Science: Structure and data centers

Powerful analytics. and enterprise security. in a single platform. microstrategy.com 1

Transcription:

Cloud Computing @ JPL Science Data Systems Emily Law, GSAW 2011

Outline Science Data Systems (SDS) Space & Earth SDSs SDS Common Architecture Components Key Components using Cloud Computing Use Case 1: LMMP Use Case 2: ACCE Use Case 3: PO.DAAC Conclusion 03/02/2011 GSAW 2011 Cloud Computing Workshop 2

National Aeronautics and Space Administration Jet Propulsion Laboratory Science Data Systems (SDS) California Institute of Technology Pasadena, California Covers a wide variety of domain disciplines Solar system exploration, Astrophysics, Earth science, Biomedicine etc, Biomedicine, etc Each has its own communities, standards and systems But, there is a set of common components Some can greatly benefit from proven cloud computing technology 03/02/2011 GSAW 2011 Cloud Computing Workshop 3

Space Science Data Systems 03/02/2011 GSAW 2011 Cloud Computing Workshop 4

Earth Science Data Systems TDRS Network Network w/ Cloud Storage & Computation NASA Mission/Multi Mission Data & Science Centers Archive Data Centers Other Data Systems (e.g. NOAA) 03/02/2011 GSAW 2011 Cloud Computing Workshop 5

SDS Common Architecture Components Users WebService / API Analysis Visualization Tools Client Portal Presentation Ground Processing Center Search Transformation Transport Data Processing Application Service Data Processing Ingest File Manager Resource Manager Data Distribution Data Management Legends: System Layer Data System Component System Resources Catalog Metadata Data Repository Resource External Entity Interface Workflow Tracking Monitoring Security System Service 03/02/2011 GSAW 2011 Cloud Computing Workshop 6

Key components using Cloud Computing Data Analysis Portal Data Processing Data Repository Data Distribution Tools 03/02/2011 GSAW 2011 Cloud Computing Workshop 7

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Use Case 1: Lunar Mapping & Modeling Project (LMMP) Provides science and exploration community a suite of lunar mapping pp g and modeling tools and products that support the lunar exploration activities The tools and products are made available through a common common, intuitive NASA portal A public release of the system is scheduled at the end of March 2011 03/02/2011 GSAW 2011 Cloud Computing Workshop 8

Challenge The image files LMMP manages range from a few gigabytes to hundreds d of gigabytes in size with new data arriving every day Lunar surface images are too large to efficiently load and manipulate in memory LMMP must make the data readily available in a timely manner for users to view and analyze LMMP needs to accommodate large numbers of users with minimal latency 03/02/2011 GSAW 2011 Cloud Computing Workshop 9

Cloud Computing Solutions Slice a large image into many small images and to merge and resize until the last merge and reduce yields a reasonably sized image that depicts the entire image Amazon E2C/S3 Used distributed approach with Elastic MapReduce to tile images Developed a hybrid solution (multi-tiered data access approach) to serve images to users by cloud storage 03/02/2011 GSAW 2011 Cloud Computing Workshop 10

Use Case 1: Results Computing performance Comparable especially for the new machines with significant processing capability EC2 s rental model offers better performance per dollar than having to purchase and maintain local servers Storage Pay for just the bandwidth consumed Eliminate the need to purchase extra hardware and bandwidth to handle the occasional spikes in usage Security Store only publicly available data on cloud Host private data on local servers to eliminate potential security hole 03/02/2011 GSAW 2011 Cloud Computing Workshop 11

Use Case 2: Airborne Cloud Computing Environment (ACCE) Multi-mission capability providing distributed SDS services applicable to space-borne missions File Management Workflow Management Resource Management Extend the existing services to utilize cloud services, commercial, community and private Storage Compute Resources A full service stack is deployed for each mission, utilizing cloud computing infrastructure. 03/02/2011 GSAW 2011 Cloud Computing Workshop 12

Cloud Computing @ ACCE Explore the benefits of performing science data processing for airborne missions in the cloud Evaluate different cloud technologies Amazon EC2/S3 Elastic compute resources and on-demand d storage Eucalyptus Infrastructure software for establishing a private cloud Hadoop Distributed ib t File System (DFS) and MapReduce Increased processing performance on large data sets 03/02/2011 GSAW 2011 Cloud Computing Workshop 13

Prototype @ ACCE A typical science data processing job, Product Generation Executable (PGE), is an executable that performs a task on n inputs to produce outputs. The ACCE Cloud must be able find an available resource for the PGE, transfer both the PGE and its input files to that resource, execute the PGE, then retrieve the output files from that resource. Initiate Processing Review Results Science Data System Services File Mgmt Workflow Mgmt Resource Mgmt Input File(s) PGE PGE Local Data Storage Output File(s) localhost.jpl.nasa.gov 03/02/2011 GSAW 2011 Cloud Computing Workshop 14

Use Case 2: Results Challenges Host Environment Security Network Processing cost reduction No investment in capital required (upfront or refresh costs) Pay only for what you use Possible limiting factors for public cloud viability Support for ITAR-sensitive data Data transfer rates between JPL and commercial cloud JPL Firewall More work ahead Amazon EC2/S3 reported an ITAR Region available in Fall 2011 Continued benchmarking and optimization has demonstrated increased data transfer rates, 250 400 mbps JPL developing a Virtual Private Cloud connection to Amazon, causing EC2 nodes to appear inside the JPL Firewall 03/02/2011 GSAW 2011 Cloud Computing Workshop 15

Use Case 3: Physical Oceanography Distributed Active Archive Center (PO.DAAC) Preserve NASA s ocean data and make these and related information accessible and meaningful 03/02/2011 GSAW 2011 Cloud Computing Workshop 16

Cloud Computing @ PO.DAAC Understand and articulate the cost/benefit of cloud technologies for the NASA Data Centers Evaluate the cloud services provided by Amazon Pricing Data transfer procedures es Software deployment and monitoring procedures Conduct a pilot study using cloud service for processing oceanographic climatology, data management and data access Utilize MapReduce parallel processing algorithm Benchmark between cloud and standalone processing 03/02/2011 GSAW 2011 Cloud Computing Workshop 17

Science Approach Extract daily time-series and climatologies for different regions over Antarctica (from the cloud) Compute climate proxies: Spatial extend, onset and length of melting on the ice sheet Anomalous ous events e Temporal changes/trend in sigma-0 Examine patterns in the climate proxies and look for correlations with ocean/atmosphere indices 03/02/2011 GSAW 2011 Cloud Computing Workshop 18

Cloud Computing (Hadoop) solution Transfer Copy data into Hadoop HDFS from S3 26GB Uncompressed data (550M data points) 3x replication MapReduce Computation Jobs Compute daily/monthly climatologies Store climatology data, all time data points in HBase Add point average & standard dev. in future Query/Aggregate Data Scan ranges of data (lat/long bounding box) Aggregate and return average pixel values for given regions 03/02/2011 GSAW 2011 Cloud Computing Workshop 19

National Aeronautics and Space Administration Jet Propulsion Laboratory Use Case 3: Results California Institute of Technology Pasadena, California 10 Years at a glance Time In Seconds Time (in seconds) to complete climatology jobs 50000 40000 30000 20000 10000 0 Monthly Dailyy DB HZ 5 node AllTime 10 node Process Type Monthly Daily AllTime 03/02/2011 Oracle 9239 11502 42158 GSAW 2011 Cloud Computing Workshop Desktop 5453 7169 36627 5 node 1200 11204 14143 10 node 1238 1848 10540 20

Conclusion The Silver Bullet? Cloud Computing 03/02/2011 GSAW 2011 Cloud Computing Workshop 21

Many Benefits Broad network access Accessible from anywhere On demand self service Increase/decrease number of machines based on user defined parameters Resource Pooling Shared pool of configurable computing resources Rapid elasticity Resizable compute capacity for unlimited growth Increase/decrease number of machines based on user defined parameters Measured Service Utility Computing, pay by the drink, rapidly provisioned 03/02/2011 GSAW 2011 Cloud Computing Workshop 22

But. There are disadvantages to watch out Privacy / Ownership Security Reliability Feasibility Standards 03/02/2011 GSAW 2011 Cloud Computing Workshop 23

Thank You Question 03/02/2011 GSAW 2011 Cloud Computing Workshop 24