Berlin 2015. Storage, Backup and Disaster Recovery in the Cloud AWS Customer Case Study: HERE Maps for Life



Similar documents
Intro to AWS: Storage Services

Amazon Cloud Storage Options

Understanding AWS Storage Options

Amazon Web Services Yu Xiao

Real Time Big Data Processing

DLT Solutions and Amazon Web Services

How AWS Pricing Works

DataStax Enterprise, powered by Apache Cassandra (TM)

Rethink Cloud Strategies for Cost Effective Enterprise Storage Management. PRESENTATION TITLE GOES HERE Laz Vekiarides ClearSky Data

Storage as a Service: Leverage the benefits of scalability and elasticity with Storage as a Service

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Amazon EC2 Product Details Page 1 of 5

A programming model in Cloud: MapReduce

Graph Database Proof of Concept Report

Deploying for Success on the Cloud: EBS on Amazon VPC. Phani Kottapalli Pavan Vallabhaneni AST Corporation August 17, 2012

Service Organization Controls 3 Report

AWS Performance Tuning

Introduction to Amazon Web Services! Leo Senior Solutions Architect

Study concluded that success rate for penetration from outside threats higher in corporate data centers

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

How AWS Pricing Works May 2015

Storage Solutions in the AWS Cloud. Miles Ward Enterprise Solutions Architect

Service Organization Controls 3 Report

Simple Storage Service (S3)

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Technical Brief: Global File Locking

SteelFusion with AWS Hybrid Cloud Storage

UniFS A True Global File System

Accelerating Real Time Big Data Applications. PRESENTATION TITLE GOES HERE Bob Hansen

Introduction to AWS in Higher Ed

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Amazon S3 Essentials

Deploying ArcGIS for Server Using Esri Managed Services

With DDN Big Data Storage

Deploying for Success on the Cloud: EBS on Amazon VPC Session ID#11312

Introduction to AWS Economics

Amazon Web Services Annual ALGIM Conference. Tim Dacombe-Bird Regional Sales Manager Amazon Web Services New Zealand

Research on AWS. Genomics Research Statistical Analysis Mathematical Modeling Computational Fluid Dynamics Grid/Cluster/High Performance Computing

Leveraging Public Clouds to Ensure Data Availability

Big data Devices Apps

Software Defined Microsoft. PRESENTATION TITLE GOES HERE Siddhartha Roy Cloud + Enterprise Division Microsoft Corporation

VMware Hybrid Cloud. Accelerate Your Time to Value

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

Building Storage-as-a-Service Businesses

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

Cloud Courses Description

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson,Nelson Araujo, Dennis Gannon, Wei Lu, and

Backing up to the Cloud

TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE

HyperQ Storage Tiering White Paper

White Paper. Amazon in an Instant: How Silver Peak Cloud Acceleration Improves Amazon Web Services (AWS)

HYBRID ARCHITECTURE IN THE CLOUD

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

Cloud Courses Description

Cloud Models and Platforms

Deploying a Geospatial Cloud

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

Case Study. Cloud Adoption, Fault Tolerant AWS Support & Magento ecommerce Implementation. Case Study

T a c k l i ng Big Data w i th High-Performance

Accelerating Cloud Based Services

Performance Benchmark for Cloud Block Storage

Expand Your Infrastructure with the Elastic Cloud. Mark Ryland Chief Solutions Architect Jenn Steele Product Marketing Manager

Amazon EFS (Preview) User Guide

Scalable Architecture on Amazon AWS Cloud

Amazon Web Services and Maginatics Solution Brief

Amazon Compute - EC2 and Related Services

How To Scale Myroster With Flash Memory From Hgst On A Flash Flash Flash Memory On A Slave Server

Database Scalability {Patterns} / Robert Treat

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

Chapter 19 Cloud Computing for Multimedia Services

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Hitachi Cloud Service for Content Archiving. Delivered by Hitachi Data Systems

Storage Options in the AWS Cloud

Financial Services Grid Computing on Amazon Web Services January 2013 Ian Meyers

Matrix: Adaptive Middleware for Distributed Multiplayer Games. Goal: Clean Separation. Design Criteria. Matrix Architecture

Deploying ArcGIS for Server Using Managed Services

Enabling Science in the Cloud: A Remote Sensing Data Processing Service for Environmental Science Analysis

How To Store Data On A Server Or Hard Drive (For A Cloud)

Accelerating Web-Based SQL Server Applications with SafePeak Plug and Play Dynamic Database Caching

DELL s Oracle Database Advisor

CONNECTRIA MANAGED AMAZON WEB SERVICES (AWS)

Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

Building an AWS-Compatible Hybrid Cloud with OpenStack

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

May 2013 Oracle Spatial and Graph User Conference

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

AWS Storage: Minimizing Costs While Retaining Functionality

Uses, considerations and recommendations for AWS Scalar Decisions Inc. Not for distribution outside of intended audience.

Hitachi Cloud Services for Private File Tiering. Low Risk Cloud at Your Own Pace. The Hitachi Vision on Cloud

Amazon Relational Database Service (RDS)

Big data management with IBM General Parallel File System

Storage and Disaster Recovery

Big Data With Hadoop

Introduction to NetApp Infinite Volume

Aleksandar Nenov. Devops Talk Belgrade 2015

Flash Use Cases Traditional Infrastructure vs Hyperscale

Microsoft s Cloud Networks

Transcription:

Berlin 2015 Storage, Backup and Disaster Recovery in the Cloud AWS Customer Case Study: HERE Maps for Life

Storage, Backup and Disaster Recovery in the Cloud Robert Schmid, Storage Business Development, AWS Ali Abbas, Principal Architect, HERE Case Study: AWS Customer HERE Maps for Life: Satellite Imagery - S3 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

What we will cover in this session Amazon storage options Amazon Elastic File System Use cases (Backup, Archive, DR) Customer Use Case: HERE Maps for Life, Satellite Imagery-S3

S3 usage 102% year-over-year increase in data transfer to and from S3 (Q4 2014 vs Q4 2013, not including Amazon use)

Amazon S3 Simple Storage Service

Amazon S3 Simple Storage Service 99.999999999% durability $0.03 per GB-month $360 per TB/year

Amazon Glacier Low-cost archiving service

Amazon Glacier Low-cost archiving service $0.01 per GB-month $120 per TB/year 99.999999999% durability 3 5 hours data retrieval

Amazon EBS Elastic Block Storage

EBS General Purpose (SSD) Up to 16 TB 10,000 IOPS Provisioned IOPS (SSD) Up to 16 TB 20,000 IOPS $0.10 per GB-month $0.125 per GB-month 0.065/provisioned IOPS

Amazon Storage Gateway

Storage Gateway Your on-ramp to AWS cloud storage: Back up into S3 Archive into Amazon Glacier iscsi or VTL interface

Summary: AWS Storage Options Object Storage (S3, Glacier) Elastic Block Storage (EBS) Storage Gateway (iscsi, VTL) Elastic File System for EC2 (EFS)

Introducing Amazon Elastic File System for EC2 Instances pilot availability later this summer US-WEST (Oregon)

What is EFS? Fully managed file system for EC2 instances Provides standard file system semantics (NFSv4) Elastically grows to petabyte scale and shrinks elastically Delivers performance for a wide variety of workloads Highly available and durable 1 2 3 simple elastic scalable

Amazon Storage Use Cases: Backup, Archive, Disaster Recovery

Backup, Archive, Disaster Recovery Customer Data Center Block File Archive Backup Disaster Recovery Colocation Data Center Customer /CSP Assets Storage Gateways AWS SGW DirectConnect Private Storage for AWS Internet AWS Direct Connect S3 Glacier AWS Cloud S3 Glacier

AWS Customer Case Study Ali Abbas HERE: Maps for Life Principal Architect High Resolution Satellite Imagery Predictive Analytics/Machine Learning ali.abbas@here.com http://www.here.com 18

19 HERE Maps HERE Drive HERE Transit HERE City Lens Explore

Maps for Life Web and Mobile App available on: Android/iOS/Windows Phone 20

Save the maps of your country or state on your phone Use your phone offline Explore anywhere without an internet connection Offline Map 21

Unified Route Planning Route Alternatives Turn-by-turn Navigation Pocket Nav Sat 22

Route Alternatives Step-by-step transit Turn-by-turn walk guidance Urban Navigation 23

Collections Easy location sharing Personal Maps 24

Train Schedule Traffic incidents 3D Maps Interactive Maps 25

Reality Capture Processing Satellite/Aerial Delivery Enterprise Businesses End to End User Integration 26

99.99% availability, 99.999999999% durability High throughput/good Performance for most use-cases Good price ratio Design simplifies creating integration pipelines 27

28 The case f Satellite Imagery

29 Continuous increase global coverage with a higher frequency of refresh

Billion of tiles Huge storage requirements due to high resolution content across zoom levels Big amount of small tile size to keep track and deliver Challenges Exponential growth rate (today some billions, tomorrow some trillions) Increased data volume refresh rate 30 Maintain low latency requirements and service level agreement

Behind the curtain Specialized spatial file system to deliver tile imagery with sub-ms lookup time over the network. Simple Architecture with CDN Caches and Core sites (with full dataset) Remote sites had CDN type caches with geospatial shard-ing placement algorithms. Some select cache regions suffered sometimes from inter-continental network latency due to non-optimized routing The scale of data implies massive storage infrastructure to maintain on top 31

Mercator based shard-ing layer Specialized Spatial Blob Store Intelligent Filter layer Specialized Adaptive Spatial Blob Store Shared Store Singleton Store 32 Core Caches

Given the success of S3 usage across HERE and the recent enhancement to the offering, we started to look at S3 to solve 2 main problems with 1 solution Simplify the storage handling layer with getting rid of the storage compute from our architecture and simplify Operations. Reduce the network latency from core data to our delivery instances by adding core data presence in each availability regions. 33

Satellite on S3 Easy life-cycle management for recurring update Big Data store requirements on-demand (ease capacity planning) Easy pipeline integration with SQS/SNS for background jobs Good performance out of the box, however did not fulfill our requirements - Too much variation in response time ~ AVG 150-300ms. 34

S3 Load constrain Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/amazons3/latest/dev/request-rate-perf-considerations.html 35

S3 Load constrain + Satellite Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/amazons3/latest/dev/request-rate-perf-considerations.html 36

S3 Load constrain + Satellite Stored lexicographically across S3 partitions z x y Satellite example tile ID: 15/18106/11272 15/18089/11275 quadkey representation 302013232331232 302013232321201 17/72409/45094 30201323233033003 37

S3 Load constrain + Satellite Stored lexicographically across S3 partitions z x y Satellite example tile ID: 15/18106/11272 15/18089/11275 quadkey representation 302013232331232 302013232321201 17/72409/45094 30201323233033003 Each zoom level has 4^level_detail tiles, a quadkey length is equal to the level of detail of the corresponding tile. 38

S3 Load constrain + Satellite Stored lexicographically across S3 partitions Alternative to quadkeys use random hash, increase base number Remaining problem At the scale of satellite, the ratio of requests in regards to the lexicographic overlap produced with a random hash was still significant and would not scale well. Performance was still unacceptable in light of our requirements. Billion of PUT requests would considerably increase recurring-updates cost. 39

S3 Load constrain + Satellite Stored lexicographically across S3 partitions Better solution Reduce the amount of files by creating binary blob on S3, index the tiles inside the blobs and use HTTP range-request for access. New Challenge Managing updates got more complicated, more logic requires to distribute tiles inside the blobs and more important the predicted index size was in magnitude of terabytes and growing cost and complexity overhead. 40

41 Back on the whiteboard

New Pseudo-Quad Index New compact O(1) data-structure to work around the performance constrains of S3 It minimizes the index size constrain to keep track of tiles and random hashes 194.605% size reduction in comparison to generic optimized hash tables It reduces and sets boundaries for proximity regions to cause better dispersion on the n-gram load split algorithm used by S3 Simplified Imagery updates; geometrical consistency across all S3 buckets Performance: S3: >150-300ms S3 + PQI: <26ms 42

With S3 and PQI we have simplified our architecture PQI Backend Tiny ref file Simple infrastructure delivering from a few billion up to a few trillion images 43

Impact on Architecture Impact on day-day Operation of our services Brings us geographically closer to our customer while not compromising on design patterns to work around network latencies. Allows us to only focus on our core business and technologies while offloading compute/storage to AWS. 44

Thank you! please meet our Sponsors/Partners and see us in the EXPO area 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Further information: http://aws.amazon.com/solutions/ http://aws.amazon.com/efs/details/