MOVING THE ELEPHANT IN THE ROOM Data Migration at Scale

Similar documents
References. Introduction to Database Systems CSE 444. Motivation. Basic Features. Outline: Database in the Cloud. Outline

Introduction to Database Systems CSE 444

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

WINDOWS AZURE DATA MANAGEMENT

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

How To Choose Between A Relational Database Service From Aws.Com

Technology Enablement

MakeMyTrip CUSTOMER SUCCESS STORY

Software Testing in the Cloud. Tauhida Parveen, PhD

Cloud Computing: Meet the Players. Performance Analysis of Cloud Providers

Reference Model for Cloud Applications CONSIDERATIONS FOR SW VENDORS BUILDING A SAAS SOLUTION

Performance Benchmark for Cloud Databases

How To Use Big Data For Telco (For A Telco)

Migration Scenario: Migrating Batch Processes to the AWS Cloud

Developing Scalable Java Applications with Cacheonix

Cloud Computing. Chapter 1 Introducing Cloud Computing

THE CORNERSTONE DIFFERENCE

Cloud Computing. Chapter 1 Introducing Cloud Computing

Designing Database Solutions for Microsoft SQL Server 2012

MS 20465: Designing Database Solutions for Microsoft SQL Server 2012

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

On Premise Vs Cloud: Selection Approach & Implementation Strategies

How To Scale Out Of A Nosql Database

Moving the Web Security Log Database

Moving to the Cloud. Sam Hornstein Jetline Jason Nokes President, Distributor Central Garrett Ausfeldt Starline

Cloud Computing Trends

CLOUD DEVELOPMENT BEST PRACTICES & SUPPORT APPLICATIONS

White Paper November Technical Comparison of Perspectium Replicator vs Traditional Enterprise Service Buses

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

WHITE PAPER. 5 Ways Your Organization is Missing Out on Massive Opportunities By Not Using Cloud Software

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Zero Downtime In Multi tenant Software as a Service Systems

In Memory Accelerator for MongoDB

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Implement Hadoop jobs to extract business value from large and varied data sets

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Big Data at Cloud Scale

Scaling in the Cloud with AWS. By: Eli White (CTO & mojolive) eliw.com - mojolive.com

Introduction to Apache Cassandra

Windchill System Upgrade Methodology

OTM in the Cloud. Ryan Haney

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

How to Enhance Traditional BI Architecture to Leverage Big Data

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Performance Testing. Checklist Packet. Everything you need to trigger thoughts, discussions and actions in the projects you are working on

Cluster Computing. ! Fault tolerance. ! Stateless. ! Throughput. ! Stateful. ! Response time. Architectures. Stateless vs. Stateful.

DARMADI KOMO: Hello, everyone. This is Darmadi Komo, senior technical product manager from SQL Server marketing.

Scalable Architecture on Amazon AWS Cloud

NoSQL - What we ve learned with mongodb. Paul Pedersen, Deputy CTO paul@10gen.com DAMA SF December 15, 2011

High Performance MySQL Choices in Amazon Web Services: Beyond RDS. Andrew Shieh, SmugMug Operations smugmug.

Actian PSQL Licensing For SaaS

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Scalable Application. Mikalai Alimenkou

Moving Network Management from OnSite to SaaS. Key Challenges and How NMSaaS Helps Solve Them

LEARNING SOLUTIONS website milner.com/learning phone

Using Hadoop to Expand Data Warehousing

Open source Google-style large scale data analysis with Hadoop

RESILIENT PRODUCT LIFECYCLE MANAGEMENT STRATEGIES & SOLUTIONS FOR FUTURE-PROOFING PLM

DISTRIBUTED SYSTEMS [COMP9243] Lecture 9a: Cloud Computing WHAT IS CLOUD COMPUTING? 2

MySQL: Cloud vs Bare Metal, Performance and Reliability

a new generation software test automation framework - CIVIM

Cloud Computing Is In Your Future

Big Data & Cloud Computing. Faysal Shaarani

Big Data Analytics - Accelerated. stream-horizon.com

Oracle Database 12c Plug In. Switch On. Get SMART.

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

WILLAMALANE PARK AND RECREATION DISTRICT. Springfield, Oregon GIS ASSET MANAGEMENT IN A PARK AND RECREATION DISTRICT ESRI USER CONFERENCE JULY 2014

Enabling Database-as-a-Service (DBaaS) within Enterprises or Cloud Offerings

Upgrading Your SQL Server 2000 Database Administration (DBA) Skills to SQL Server 2008 DBA Skills Course 6317A: Three days; Instructor-Led

MySQL. Leveraging. Features for Availability & Scalability ABSTRACT: By Srinivasa Krishna Mamillapalli

Flash Use Cases Traditional Infrastructure vs Hyperscale

Bringing the Cloud into Focus. A Whitepaper by CMIT Solutions and Cadence Management Advisors

Informatica Application Information Lifecycle Management

Best Practices for Migrating from RDBMS to Amazon DynamoDB

White Paper: Evaluating Big Data Analytical Capabilities For Government Use

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

JDBC We don t need no stinking JDBC. How LinkedIn uses memcached, a spoonful of SOA, and a sprinkle of SQL to scale.

Who moved my cloud? Part I: Introduction to Private, Public and Hybrid clouds and smooth migration

INTRODUCTION THE CLOUD

Optimizing Performance. Training Division New Delhi

Datamation. 5 Reasons to Consider SaaS for Your Business Applications. Executive Brief. In This Paper

Cloud Computing and Open Source: Watching Hype meet Reality

DISASTER RECOVERY WITH AWS

Transcription:

MOVING THE ELEPHANT IN THE ROOM Data Migration at Scale

WHO AM I? BDPA Los Angeles Chapter 4 year HSCC participant Columbia University, CC 14 Conductor, Inc. linkedin.com/in/calltyrone 2

CONDUCTOR, INC. Web Presence Management SAAS Big data Collect 6TB of raw web data a week Scalable Collection & ETL pipelines Final Product: reports 6 years running Tons of data! 3

WHY WE CARE ABOUT SCALABILITY More users More data Systems have to keep up! 4

SCALABILITY IN THE REAL WORLD Yesterday s solution is tomorrow s problem Under-prioritized It s hard! Can require massive changes No cure-all 5

WHY REPLACE AN UNSCALABLE SYSTEM? Save money Improve performance Improve reliability Clear the way for progress 6

WHY NOT? If it ain t broke Significant Resource Investment Time Money Software Downtime Data Quality Concerns 7

BUT IT S SO SIMPLE! Identify an unscalable system Discover and vet a suitable successor Replace the legacy system with the new system, while minimizing risk and cost 8

TALKING ABOUT THE ELEPHANT Identifying an Unscalable System

CASE STUDY: LEGACY REPORTING DATABASE Overview MySql Multi-dimensional report data stored in normalized manner across many tables Helpful for initial modeling of our problem space Hosted by a single, very powerful machine 10 Talking about the Elephant: Diagnosing an Unscalable System

CASE STUDY: LEGACY REPORTING DATABASE Unsustainable Powerful EC2 hardware isn t cheap. Vertical Scaling Capacity issues? Get a bigger machine. Obsolete Schema Difficult to backup Queries aren t getting any faster. 11 Talking about the Elephant: Diagnosing an Unscalable System

SEE FOR YOURSELF If your solution Scales vertically Prevents progress Can t perform at scale Is difficult/slow/expensive to upgrade It s time for a change! 12 Talking about the Elephant: Diagnosing an Unscalable System

FINDING A BIGGER ROOM Vetting Scalable Alternatives

WHAT TO LOOK FOR Price-efficient Ease of maintenance Horizontal Scaling 14 Finding a Bigger Room: Vetting Scalable Alternatives

CASE STUDY: AWS S3 DATASTORE Our Use Case Write once, read many De-normalized reports High storage capacity High Availability 15

CASE STUDY: AWS S3 DATASTORE Technical Overview Poor write performance, great read performance Flat files No defined space limit Configurable file replication 16 Finding a Bigger Room: Vetting Scalable Alternatives

CASE STUDY: AWS S3 DATASTORE Benefits Cheap Elastic Architecture facilitates testing Easy to back up 17 Finding a Bigger Room: Vetting Scalable Alternatives

CASE STUDY: AWS S3 DATASTORE Caveats Eventual Consistency Switching to a non-relational solution is nontrivial Application code must change Migration path gets complicated 18 Finding a Bigger Room: Vetting Scalable Alternatives

MOVING THE ELEPHANT Migrating Legacy Data to the New System

INITIAL CONSIDERATIONS Time Frame Scheduling Constraints Operational Cost Resource Constraints Standards for data parity 20 Moving the Elephant: Migrating Legacy Data to the New System

CASE STUDY: OUR UPFRONT PLANNING Two-month finish line Developed COGS models Built data validation software 21 Moving the Elephant: Migrating Legacy Data to the New System

IDEAL MIGRATION SOFTWARE CHARACTERISTICS Can be scaled up or down Speed up to save time Slow down to save resources Can be run in a testing capacity Configurable data sources/sinks Configurable hardware resource use 22 Moving the Elephant: Migrating Legacy Data to the New System

OUR MIGRATION SOFTWARE Oozie and Hive Controllable time/resource tradeoff Testable in a qa environment 23

AN INCREMENTAL MIGRATION: PARTITIONING DATA Easy to track progress Enables concurrency Dilutes failure risks E.g. Conductor Time Periods 24 Moving the Elephant: Migrating Legacy Data to the New System

AN INCREMENTAL RELEASE Limit client exposure to bugs Crowd-source intensive QA Incorporate customer feedback Demonstrate progress early E.g. Conductor Searchlight 3.0 Beta Program Got customers excited Helped to find bugs 25

26 YOU CAN DO IT!

Thanks for Listening! QUESTIONS? 27

28 (We re Hiring!)