Berlin 2015 Storage, Backup and Disaster Recovery in the Cloud AWS Customer Case Study: HERE Maps for Life
Storage, Backup and Disaster Recovery in the Cloud Robert Schmid, Storage Business Development, AWS Ali Abbas, Principal Architect, HERE Case Study: AWS Customer HERE Maps for Life: Satellite Imagery - S3 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
What we will cover in this session Amazon storage options Amazon Elastic File System Use cases (Backup, Archive, DR) Customer Use Case: HERE Maps for Life, Satellite Imagery-S3
S3 usage 102% year-over-year increase in data transfer to and from S3 (Q4 2014 vs Q4 2013, not including Amazon use)
Amazon S3 Simple Storage Service
Amazon S3 Simple Storage Service 99.999999999% durability $0.03 per GB-month $360 per TB/year
Amazon Glacier Low-cost archiving service
Amazon Glacier Low-cost archiving service $0.01 per GB-month $120 per TB/year 99.999999999% durability 3 5 hours data retrieval
Amazon EBS Elastic Block Storage
EBS General Purpose (SSD) Up to 16 TB 10,000 IOPS Provisioned IOPS (SSD) Up to 16 TB 20,000 IOPS $0.10 per GB-month $0.125 per GB-month 0.065/provisioned IOPS
Amazon Storage Gateway
Storage Gateway Your on-ramp to AWS cloud storage: Back up into S3 Archive into Amazon Glacier iscsi or VTL interface
Summary: AWS Storage Options Object Storage (S3, Glacier) Elastic Block Storage (EBS) Storage Gateway (iscsi, VTL) Elastic File System for EC2 (EFS)
Introducing Amazon Elastic File System for EC2 Instances pilot availability later this summer US-WEST (Oregon)
What is EFS? Fully managed file system for EC2 instances Provides standard file system semantics (NFSv4) Elastically grows to petabyte scale and shrinks elastically Delivers performance for a wide variety of workloads Highly available and durable 1 2 3 simple elastic scalable
Amazon Storage Use Cases: Backup, Archive, Disaster Recovery
Backup, Archive, Disaster Recovery Customer Data Center Block File Archive Backup Disaster Recovery Colocation Data Center Customer /CSP Assets Storage Gateways AWS SGW DirectConnect Private Storage for AWS Internet AWS Direct Connect S3 Glacier AWS Cloud S3 Glacier
AWS Customer Case Study Ali Abbas HERE: Maps for Life Principal Architect High Resolution Satellite Imagery Predictive Analytics/Machine Learning ali.abbas@here.com http://www.here.com 18
19 HERE Maps HERE Drive HERE Transit HERE City Lens Explore
Maps for Life Web and Mobile App available on: Android/iOS/Windows Phone 20
Save the maps of your country or state on your phone Use your phone offline Explore anywhere without an internet connection Offline Map 21
Unified Route Planning Route Alternatives Turn-by-turn Navigation Pocket Nav Sat 22
Route Alternatives Step-by-step transit Turn-by-turn walk guidance Urban Navigation 23
Collections Easy location sharing Personal Maps 24
Train Schedule Traffic incidents 3D Maps Interactive Maps 25
Reality Capture Processing Satellite/Aerial Delivery Enterprise Businesses End to End User Integration 26
99.99% availability, 99.999999999% durability High throughput/good Performance for most use-cases Good price ratio Design simplifies creating integration pipelines 27
28 The case f Satellite Imagery
29 Continuous increase global coverage with a higher frequency of refresh
Billion of tiles Huge storage requirements due to high resolution content across zoom levels Big amount of small tile size to keep track and deliver Challenges Exponential growth rate (today some billions, tomorrow some trillions) Increased data volume refresh rate 30 Maintain low latency requirements and service level agreement
Behind the curtain Specialized spatial file system to deliver tile imagery with sub-ms lookup time over the network. Simple Architecture with CDN Caches and Core sites (with full dataset) Remote sites had CDN type caches with geospatial shard-ing placement algorithms. Some select cache regions suffered sometimes from inter-continental network latency due to non-optimized routing The scale of data implies massive storage infrastructure to maintain on top 31
Mercator based shard-ing layer Specialized Spatial Blob Store Intelligent Filter layer Specialized Adaptive Spatial Blob Store Shared Store Singleton Store 32 Core Caches
Given the success of S3 usage across HERE and the recent enhancement to the offering, we started to look at S3 to solve 2 main problems with 1 solution Simplify the storage handling layer with getting rid of the storage compute from our architecture and simplify Operations. Reduce the network latency from core data to our delivery instances by adding core data presence in each availability regions. 33
Satellite on S3 Easy life-cycle management for recurring update Big Data store requirements on-demand (ease capacity planning) Easy pipeline integration with SQS/SNS for background jobs Good performance out of the box, however did not fulfill our requirements - Too much variation in response time ~ AVG 150-300ms. 34
S3 Load constrain Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/amazons3/latest/dev/request-rate-perf-considerations.html 35
S3 Load constrain + Satellite Amazon S3 maintains an index of object key names in each AWS region. Object keys are stored lexicographically across multiple partitions in the index. That is, Amazon S3 stores key names in alphabetical order. The key name dictates which partition the key is stored in. Using a sequential prefix, such as timestamp or an alphabetical sequence, increases the likelihood that Amazon S3 will target a specific partition for a large number of your keys, overwhelming the I/O capacity of the partition. http://docs.aws.amazon.com/amazons3/latest/dev/request-rate-perf-considerations.html 36
S3 Load constrain + Satellite Stored lexicographically across S3 partitions z x y Satellite example tile ID: 15/18106/11272 15/18089/11275 quadkey representation 302013232331232 302013232321201 17/72409/45094 30201323233033003 37
S3 Load constrain + Satellite Stored lexicographically across S3 partitions z x y Satellite example tile ID: 15/18106/11272 15/18089/11275 quadkey representation 302013232331232 302013232321201 17/72409/45094 30201323233033003 Each zoom level has 4^level_detail tiles, a quadkey length is equal to the level of detail of the corresponding tile. 38
S3 Load constrain + Satellite Stored lexicographically across S3 partitions Alternative to quadkeys use random hash, increase base number Remaining problem At the scale of satellite, the ratio of requests in regards to the lexicographic overlap produced with a random hash was still significant and would not scale well. Performance was still unacceptable in light of our requirements. Billion of PUT requests would considerably increase recurring-updates cost. 39
S3 Load constrain + Satellite Stored lexicographically across S3 partitions Better solution Reduce the amount of files by creating binary blob on S3, index the tiles inside the blobs and use HTTP range-request for access. New Challenge Managing updates got more complicated, more logic requires to distribute tiles inside the blobs and more important the predicted index size was in magnitude of terabytes and growing cost and complexity overhead. 40
41 Back on the whiteboard
New Pseudo-Quad Index New compact O(1) data-structure to work around the performance constrains of S3 It minimizes the index size constrain to keep track of tiles and random hashes 194.605% size reduction in comparison to generic optimized hash tables It reduces and sets boundaries for proximity regions to cause better dispersion on the n-gram load split algorithm used by S3 Simplified Imagery updates; geometrical consistency across all S3 buckets Performance: S3: >150-300ms S3 + PQI: <26ms 42
With S3 and PQI we have simplified our architecture PQI Backend Tiny ref file Simple infrastructure delivering from a few billion up to a few trillion images 43
Impact on Architecture Impact on day-day Operation of our services Brings us geographically closer to our customer while not compromising on design patterns to work around network latencies. Allows us to only focus on our core business and technologies while offloading compute/storage to AWS. 44
Thank you! please meet our Sponsors/Partners and see us in the EXPO area 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Further information: http://aws.amazon.com/solutions/ http://aws.amazon.com/efs/details/