Big data, big deal? Presented by: David Stanley, CTO PCI Geomatics



Similar documents
On Demand Satellite Image Processing

Best Practices for Sharing Imagery using Amazon Web Services. Peter Becker

How To Use Arcgis For Free On A Gdb (For A Gis Server) For A Small Business

Advanced Image Management using the Mosaic Dataset

Technical Aspects to GIS in the Cloud

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Cloud Computing. Chapter 1 Introducing Cloud Computing

Cloud Computing and Amazon Web Services

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

Cloud Computing. Chapter 1 Introducing Cloud Computing

Enterprise Architectures for Large Tiled Basemap Projects. Tommy Fauvell

Using ArcGIS for Server in the Amazon Cloud

Scalable Application. Mikalai Alimenkou

Cloud Computing. Chapter 1 Introducing Cloud Computing

GeoCloud Project Report USGS/EROS Spatial Data Warehouse Project

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

TECHNOLOGY WHITE PAPER Jan 2016

Enterprise Applications on AWS

Use of Cloud Computing for scalable geospatial data processing and access

Creating A Galactic Plane Atlas With Amazon Web Services

Amazon Cloud Storage Options

Findings in High-Speed OrthoMosaic

STeP-IN SUMMIT June 18 21, 2013 at Bangalore, INDIA. Performance Testing of an IAAS Cloud Software (A CloudStack Use Case)

Technology and Cost Considerations for Cloud Deployment: Amazon Elastic Compute Cloud (EC2) Case Study

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

TECHNOLOGY WHITE PAPER Jun 2012

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

Virtualization and Cloud Computing A Testing Solution. Reema Majumdar

Mark Bennett. Search and the Virtual Machine

Axceleon s CloudFuzion Turbocharges 3D Rendering On Amazon s EC2

Description of Application

Amazon Hosted ESRI GeoPortal Server. GeoCloud Project Report

Cloud computing - Architecting in the cloud

Cloud Computing For Bioinformatics

ArcGIS for Server: In the Cloud

Estimating the Cost of a GIS in the Amazon Cloud. An Esri White Paper August 2012

Open source Google-style large scale data analysis with Hadoop

How AWS Pricing Works May 2015

Overview: X5 Generation Database Machines

Simplifying Storage Operations By David Strom (published 3.15 by VMware) Introduction

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Recognization of Satellite Images of Large Scale Data Based On Map- Reduce Framework

Large-Scale Data Engineering. Cloud Computing - Computing as a Service

StorReduce Technical White Paper Cloud-based Data Deduplication

ediscovery and Search of Enterprise Data in the Cloud

A programming model in Cloud: MapReduce

Solution for private cloud computing

GeoImaging Accelerator Pansharp Test Results

Deploying ArcGIS for Server using Managed Services

Oracle Applications and Cloud Computing - Future Direction

Matsu Workflow: Web Tiles and Analytics over MapReduce for Multispectral and Hyperspectral Images

How AWS Pricing Works

Deploying for Success on the Cloud: EBS on Amazon VPC. Phani Kottapalli Pavan Vallabhaneni AST Corporation August 17, 2012

AWS Account Setup and Services Overview

GeoCloud Project Report GEOSS Clearinghouse

Bringing Big Data Modelling into the Hands of Domain Experts

IOS110. Virtualization 5/27/2014 1

BIG DATA USING HADOOP

Background on Elastic Compute Cloud (EC2) AMI s to choose from including servers hosted on different Linux distros

SEAIP 2009 Presentation

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Amazon Elastic Beanstalk

Colgate-Palmolive selects SAP HANA to improve the speed of business analytics with IBM and SAP

Volunteer Computing, Grid Computing and Cloud Computing: Opportunities for Synergy. Derrick Kondo INRIA, France

Migration Scenario: Migrating Batch Processes to the AWS Cloud

Introduction What is the cloud

Intro to AWS: Storage Services

How To Create A Grid On A Microsoft Web Server On A Pc Or Macode (For Free) On A Macode Or Ipad (For A Limited Time) On An Ipad Or Ipa (For Cheap) On Pc Or Micro

Vendor Questions Infrastructure Products and Services RFP #

Cloud+ Experiences with New Technologies bringing GIS to the World. Eamon Walsh CTO, espatial

National Center for Education Statistics. Amazon Hosted ESRI ArcGIS Servers Project Final Report

Big Data and Market Surveillance. April 28, 2014

Last time. Today. IaaS Providers. Amazon Web Services, overview

Drupal in the Cloud. Scaling with Drupal and Amazon Web Services. Northern Virginia Drupal Meetup

GIS IN THE CLOUD THE ESRI EXAMPLE DAVID CHAPPELL SEPTEMBER 2010 SPONSORED BY ESRI

Leveraging Public Clouds to Ensure Data Availability

Amazon EC2 Product Details Page 1 of 5

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Introduction to Imagery and Raster Data in ArcGIS

Laurence Liew General Manager, APAC. Economics Is Driving Big Data Analytics to the Cloud

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Understanding ArcGIS Deployments in Public and Private Cloud. Marwa Mabrouk

Automated Application Provisioning for Cloud

Analysis and Research of Cloud Computing System to Comparison of Several Cloud Computing Platforms

NoSQL and Hadoop Technologies On Oracle Cloud

CONDOR CLUSTERS ON EC2

High Performance Compu2ng Facility

Keystone Image Management System

Migrating a (Large) Science Database to the Cloud

Capacity Management for Oracle Database Machine Exadata v2

Controlling the Linux ecognition GRID server v9 from a ecognition Developer client

Understanding AWS Storage Options

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

Expert Reference Series of White Papers. Introduction to Amazon Relational Database Service (Amazon RDS)

Chapter 7. Using Hadoop Cluster and MapReduce

INCREASING THE CLOUD PERFORMANCE WITH LOCAL AUTHENTICATION

ECE6130 Grid and Cloud Computing

Cloud Simulator for Scalability Testing

Getting Started with Amazon EC2 Management in Eclipse

A New Strategy for Data Management in a Time of Hyper Change

Transcription:

Big data, big deal? Presented by: David Stanley, CTO PCI Geomatics

A quick word about me Ah, the early 1980s

Aha! Moments in my Career First personal computer (Pet 2001, 1978) First windowing OS (SUN, ~1985) First Web Browser (Mosaic 1992) Mobile computing (iphone 2007) And now The Cloud

What s the Cloud?

What about Esri Maps? Esri recognized the need for high quality imagery on its online platform ArcGIS Online Imagery is a key component of Esri s new Cloud Services providing high resolution, high quality coverage is very important Wanted a comprehensive 1m imagery base map Not an easy thing to do witness Apples recent ios 6 map application problems

The Project: King Kong In 2011, Esri approached PCI to assist with given its 30 year history in image processing and high capacity computing Project named King Kong, represented a considerable challenge for both Esri and PCI Huge data volumes Large data management issues Massive processing requirements Difficult, finicky algorithms One Year (12 months) time frame

Big Data 50 million km 2 of imagery at 1m resolution covering world 300,000+ IKONOS Scenes Hundreds of Terabytes of raw input data Thousands of Terabytes (Petabytes) in intermediate data Blue Marble Credits: NASA Goddard Space Flight Center Image

The Traditional Approach Find a large amount of space Purchase 200+ compute servers Purchase farm of network data servers Setup a complex network/infrastructure Hire 30+ person team for processing, QA and IT After project done in 12 months try to figure out what to do with it all

The Bold Approach Small permanent team : 4 from ESRI, 1 from PCI Rely on enormous blocks of imagery (1000 s) and huge number of tie-points (100,000 s) to average out errors in colour and geometry to dramatically reduce labour. Rely on massive parallelism for computation and speed Gamble on clever people and algorithms Go with the Amazon Cloud!

Why consider the cloud? Reason High Availability Highly Accessible Scalable Fault Tolerant Running costs Avoid Capital outlays The Bad thing though Data? Description Always on Access via any web portal/browser Expand or contract processing needs as required. Can get 100x more power for a while, then let go. 1000 s of shared servers, a few going down is not even noticed. Pay as you go for what you actually use. No processing = no cost. No need to buy hardware and maintain it. Moving large amounts of imagery on and off the cloud can be challenging, slow and perhaps costly

Amazon Cloud Components Simple Storage Service (S3) Buckets Elastic Block Storage (EBS) Volumes (1 TB Max) EBS Snapshot (S3) Amazon Machine Instance (AMI) Config Operation S/S Source Data Transport Scaled Instances Internet S3 Object (5GB Limit) Object Key

Standard compute node Standard Compute Node 15 GB memory 8 EC2 Compute Units (4 virtual cores with 2 EC2 Compute Units each) 1,690 GB ephemeral storage ( free ) 8 GB EC2 Volume 64-bit Linux platform I/O Performance: High Out of many available configurations this was best

Processing workflow Ingest QA Tie Point & Bundle QA Ortho Pan Sharp Color Balance Seam Line Tile Cache PCI GXL/Geomatica ArcGIS

Why the cloud worked Ingest (25+ Nodes) Tie Point (20+ Nodes) Bundle (1 Node) Ortho / Pan Sharpen (20+ Nodes) Color Balance (1 Node) Tiling (25+ Nodes) Caching (8+ Nodes)

The Project in Action 1 st 6 months: getting it working Struggled with our own software when scaling from hundreds to thousands Struggled with finding the right cloud configuration for computing and storage Attempting to reduce, or automate, manual QA steps 2 nd 6 months: crank it up At the end processing 70,000+ scenes per month

A big data set: Afghanistan Chart Title Arial 18 PT 16 9,500 Ikonos scenes. 15TB of Data. 1,000,000+ km 2

Colour Balancing: Before Chart Title Arial 18 PT 17

Colour Balancing: after 18

World Coverage 50 million km2

World Coverage Examples

Sample output Taj Mahal, India

Sample output Corsica, France

Sample output Athens, Greece

Sample output Abu Dhabi, UAE

Summary Approximately 4m accuracy on the ground Alignment between images good Now that done, could probably redo in 3 months. The cloud is effective! As with most leading edge projects: it was an amazing, stressful and fun experience. Enjoy the new 1m image data layers!

Q & A THANK YOU David Stanley, CTO PCI Geomatics stanley@pcigeomatics.com