Cloud Computing @ JPL Science Data Systems Emily Law, GSAW 2011
Outline Science Data Systems (SDS) Space & Earth SDSs SDS Common Architecture Components Key Components using Cloud Computing Use Case 1: LMMP Use Case 2: ACCE Use Case 3: PO.DAAC Conclusion 03/02/2011 GSAW 2011 Cloud Computing Workshop 2
National Aeronautics and Space Administration Jet Propulsion Laboratory Science Data Systems (SDS) California Institute of Technology Pasadena, California Covers a wide variety of domain disciplines Solar system exploration, Astrophysics, Earth science, Biomedicine etc, Biomedicine, etc Each has its own communities, standards and systems But, there is a set of common components Some can greatly benefit from proven cloud computing technology 03/02/2011 GSAW 2011 Cloud Computing Workshop 3
Space Science Data Systems 03/02/2011 GSAW 2011 Cloud Computing Workshop 4
Earth Science Data Systems TDRS Network Network w/ Cloud Storage & Computation NASA Mission/Multi Mission Data & Science Centers Archive Data Centers Other Data Systems (e.g. NOAA) 03/02/2011 GSAW 2011 Cloud Computing Workshop 5
SDS Common Architecture Components Users WebService / API Analysis Visualization Tools Client Portal Presentation Ground Processing Center Search Transformation Transport Data Processing Application Service Data Processing Ingest File Manager Resource Manager Data Distribution Data Management Legends: System Layer Data System Component System Resources Catalog Metadata Data Repository Resource External Entity Interface Workflow Tracking Monitoring Security System Service 03/02/2011 GSAW 2011 Cloud Computing Workshop 6
Key components using Cloud Computing Data Analysis Portal Data Processing Data Repository Data Distribution Tools 03/02/2011 GSAW 2011 Cloud Computing Workshop 7
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Use Case 1: Lunar Mapping & Modeling Project (LMMP) Provides science and exploration community a suite of lunar mapping pp g and modeling tools and products that support the lunar exploration activities The tools and products are made available through a common common, intuitive NASA portal A public release of the system is scheduled at the end of March 2011 03/02/2011 GSAW 2011 Cloud Computing Workshop 8
Challenge The image files LMMP manages range from a few gigabytes to hundreds d of gigabytes in size with new data arriving every day Lunar surface images are too large to efficiently load and manipulate in memory LMMP must make the data readily available in a timely manner for users to view and analyze LMMP needs to accommodate large numbers of users with minimal latency 03/02/2011 GSAW 2011 Cloud Computing Workshop 9
Cloud Computing Solutions Slice a large image into many small images and to merge and resize until the last merge and reduce yields a reasonably sized image that depicts the entire image Amazon E2C/S3 Used distributed approach with Elastic MapReduce to tile images Developed a hybrid solution (multi-tiered data access approach) to serve images to users by cloud storage 03/02/2011 GSAW 2011 Cloud Computing Workshop 10
Use Case 1: Results Computing performance Comparable especially for the new machines with significant processing capability EC2 s rental model offers better performance per dollar than having to purchase and maintain local servers Storage Pay for just the bandwidth consumed Eliminate the need to purchase extra hardware and bandwidth to handle the occasional spikes in usage Security Store only publicly available data on cloud Host private data on local servers to eliminate potential security hole 03/02/2011 GSAW 2011 Cloud Computing Workshop 11
Use Case 2: Airborne Cloud Computing Environment (ACCE) Multi-mission capability providing distributed SDS services applicable to space-borne missions File Management Workflow Management Resource Management Extend the existing services to utilize cloud services, commercial, community and private Storage Compute Resources A full service stack is deployed for each mission, utilizing cloud computing infrastructure. 03/02/2011 GSAW 2011 Cloud Computing Workshop 12
Cloud Computing @ ACCE Explore the benefits of performing science data processing for airborne missions in the cloud Evaluate different cloud technologies Amazon EC2/S3 Elastic compute resources and on-demand d storage Eucalyptus Infrastructure software for establishing a private cloud Hadoop Distributed ib t File System (DFS) and MapReduce Increased processing performance on large data sets 03/02/2011 GSAW 2011 Cloud Computing Workshop 13
Prototype @ ACCE A typical science data processing job, Product Generation Executable (PGE), is an executable that performs a task on n inputs to produce outputs. The ACCE Cloud must be able find an available resource for the PGE, transfer both the PGE and its input files to that resource, execute the PGE, then retrieve the output files from that resource. Initiate Processing Review Results Science Data System Services File Mgmt Workflow Mgmt Resource Mgmt Input File(s) PGE PGE Local Data Storage Output File(s) localhost.jpl.nasa.gov 03/02/2011 GSAW 2011 Cloud Computing Workshop 14
Use Case 2: Results Challenges Host Environment Security Network Processing cost reduction No investment in capital required (upfront or refresh costs) Pay only for what you use Possible limiting factors for public cloud viability Support for ITAR-sensitive data Data transfer rates between JPL and commercial cloud JPL Firewall More work ahead Amazon EC2/S3 reported an ITAR Region available in Fall 2011 Continued benchmarking and optimization has demonstrated increased data transfer rates, 250 400 mbps JPL developing a Virtual Private Cloud connection to Amazon, causing EC2 nodes to appear inside the JPL Firewall 03/02/2011 GSAW 2011 Cloud Computing Workshop 15
Use Case 3: Physical Oceanography Distributed Active Archive Center (PO.DAAC) Preserve NASA s ocean data and make these and related information accessible and meaningful 03/02/2011 GSAW 2011 Cloud Computing Workshop 16
Cloud Computing @ PO.DAAC Understand and articulate the cost/benefit of cloud technologies for the NASA Data Centers Evaluate the cloud services provided by Amazon Pricing Data transfer procedures es Software deployment and monitoring procedures Conduct a pilot study using cloud service for processing oceanographic climatology, data management and data access Utilize MapReduce parallel processing algorithm Benchmark between cloud and standalone processing 03/02/2011 GSAW 2011 Cloud Computing Workshop 17
Science Approach Extract daily time-series and climatologies for different regions over Antarctica (from the cloud) Compute climate proxies: Spatial extend, onset and length of melting on the ice sheet Anomalous ous events e Temporal changes/trend in sigma-0 Examine patterns in the climate proxies and look for correlations with ocean/atmosphere indices 03/02/2011 GSAW 2011 Cloud Computing Workshop 18
Cloud Computing (Hadoop) solution Transfer Copy data into Hadoop HDFS from S3 26GB Uncompressed data (550M data points) 3x replication MapReduce Computation Jobs Compute daily/monthly climatologies Store climatology data, all time data points in HBase Add point average & standard dev. in future Query/Aggregate Data Scan ranges of data (lat/long bounding box) Aggregate and return average pixel values for given regions 03/02/2011 GSAW 2011 Cloud Computing Workshop 19
National Aeronautics and Space Administration Jet Propulsion Laboratory Use Case 3: Results California Institute of Technology Pasadena, California 10 Years at a glance Time In Seconds Time (in seconds) to complete climatology jobs 50000 40000 30000 20000 10000 0 Monthly Dailyy DB HZ 5 node AllTime 10 node Process Type Monthly Daily AllTime 03/02/2011 Oracle 9239 11502 42158 GSAW 2011 Cloud Computing Workshop Desktop 5453 7169 36627 5 node 1200 11204 14143 10 node 1238 1848 10540 20
Conclusion The Silver Bullet? Cloud Computing 03/02/2011 GSAW 2011 Cloud Computing Workshop 21
Many Benefits Broad network access Accessible from anywhere On demand self service Increase/decrease number of machines based on user defined parameters Resource Pooling Shared pool of configurable computing resources Rapid elasticity Resizable compute capacity for unlimited growth Increase/decrease number of machines based on user defined parameters Measured Service Utility Computing, pay by the drink, rapidly provisioned 03/02/2011 GSAW 2011 Cloud Computing Workshop 22
But. There are disadvantages to watch out Privacy / Ownership Security Reliability Feasibility Standards 03/02/2011 GSAW 2011 Cloud Computing Workshop 23
Thank You Question 03/02/2011 GSAW 2011 Cloud Computing Workshop 24