A Case Study of ebay UTF-8 Database Migration



Similar documents
Real-time Data Replication

Optimizing Your Database Performance the Easy Way

Database migration. from Sybase ASE to PostgreSQL. Achim Eisele and Jens Wilke. 1&1 Internet AG

SharePlex for SQL Server

Oracle Database Auditing Performance Guidelines

DB2 9 for LUW Advanced Database Recovery CL492; 4 days, Instructor-led

Best practices for data migration.

Linas Virbalas Continuent, Inc.

SQL Server to Oracle A Database Migration Roadmap

How AWS Pricing Works May 2015

Using Data Domain Storage with Symantec Enterprise Vault 8. White Paper. Michael McLaughlin Data Domain Technical Marketing

OPTIMIZE DMA CONFIGURATION IN ENCRYPTION USE CASE. Guillène Ribière, CEO, System Architect

DATABASE VIRTUALIZATION AND INSTANT CLONING WHITE PAPER

Oracle Database Creation for Perceptive Process Design & Enterprise

Cloud File Services: October 1, 2014

Optimizing the Performance of the Oracle BI Applications using Oracle Datawarehousing Features and Oracle DAC

Keep SQL Service Running On Replica Member While Replicating Data In Realtime

CLOUD SERVICES FOR EMS

SQL Replication Guide and Reference

Introduction. Setup of Exchange in a VM. VMware Infrastructure

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

DBMS Questions. 3.) For which two constraints are indexes created when the constraint is added?

Custom Software Development Approach

Oracle Database 12c Enables Quad Graphics to Quickly Migrate from Sybase to Oracle Exadata

Enterprise Java Applications on VMware: High Availability Guidelines. Enterprise Java Applications on VMware High Availability Guidelines

Oracle Data Guard OTN Case Study SWEDISH POST

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Virtual Server and Storage Provisioning Service. Service Description

Comparing Microsoft SQL Server 2005 Replication and DataXtend Remote Edition for Mobile and Distributed Applications

Infrastructure. Oren Barshishat IT Support Manager. Adi Finkels Customer Support Manager. Sparta Systems

Oracle Database In-Memory The Next Big Thing

Green Migration from Oracle

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Database as a Service (DaaS) Version 1.02

news from Tom Bacon about Monday's lecture

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

3. Where can I obtain the Service Pack 5 software?

Case Study - I. Industry: Social Networking Website Technology : J2EE AJAX, Spring, MySQL, Weblogic, Windows Server 2008.

Informatica Data Replication: Maximize Return on Data in Real Time Chai Pydimukkala Principal Product Manager Informatica

The Inevitable Unicode Project

Data Sheet: Disaster Recovery Veritas Volume Replicator by Symantec Data replication for disaster recovery

Oracle BI EE 11g - Security Auditing

System Copy GT Manual 1.8 Last update: 2015/07/13 Basis Technologies

Optimizing Business Continuity Management with NetIQ PlateSpin Protect and AppManager. Best Practices and Reference Architecture

SAP HANA implementation on SLT with a Non SAP source. Poornima Ramachandra

Recovery Principles in MySQL Cluster 5.1

So What s the Big Deal?

PERFORMANCE TUNING FOR PEOPLESOFT APPLICATIONS

Hypertable Architecture Overview

Monitoring Microsoft Exchange to Improve Performance and Availability

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

RDS Migration Tool Customer FAQ Updated 7/23/2015

EMC Business Continuity for Microsoft SQL Server Enabled by SQL DB Mirroring Celerra Unified Storage Platforms Using iscsi

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Data Backup and Restore (DBR) Overview Detailed Description Pricing... 5 SLAs... 5 Service Matrix Service Description

Managing your Red Hat Enterprise Linux guests with RHN Satellite

Oracle Hyperion Financial Management Virtualization Whitepaper

Installing and Administering VMware vsphere Update Manager

Unicode Enabling Java Web Applications

Load Testing How To. Load Testing Overview

1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 8

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

SQL Server An Overview

Annex A to the MPEG Audio Patent License Agreement Essential Philips, France Telecom and IRT Patents relevant to DVD-Video Player - MPEG Audio

SERVICE SCHEDULE PUBLIC CLOUD SERVICES

A Study of What Really Breaks SSL HITB Amsterdam 2011

Bluetooth HID Profile

How to Migrate your Database to Oracle Exadata. Noam Cohen, Oracle DB Consultant, E&M Computing

Getting Started with Attunity CloudBeam for Azure SQL Data Warehouse BYOL

Table of Contents. Introduction. Audience. At Course Completion

Adobe LiveCycle Data Services 3 Performance Brief

OpenStack Private Cloud Hosting in an Tier 3 Data Centre. G-Cloud Lot 1 IaaS

Supporting ZDOs with the XBee API

Release: 1. ICADBS601A Build a data warehouse

Best Practices for Monitoring Databases on VMware. Dean Richards Senior DBA, Confio Software

Monitoring Agent for Microsoft Exchange Server Fix Pack 9. Reference IBM

Using HP StoreOnce D2D systems for Microsoft SQL Server backups

Google Apps Overview

Conventional Files versus the Database. Files versus Database. Pros and Cons of Conventional Files. Pros and Cons of Databases. Fields (continued)

Automate Your BI Administration to Save Millions with Command Manager and System Manager

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

How AWS Pricing Works

Leveraging Public Clouds to Ensure Data Availability

Architecture. Outlook Synchronization in Microsoft Dynamics CRM. Microsoft Dynamics CRM White Paper:

Ultimate Guide to Oracle Storage

Bringing Value to the Organization with Performance Testing

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

IBM DB2 Recovery Expert June 11, 2015

QuickBooks Online: Security & Infrastructure

TESTING AND OPTIMIZING WEB APPLICATION S PERFORMANCE AQA CASE STUDY

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Why Not Oracle Standard Edition? A Dbvisit White Paper By Anton Els

SILVER PEAK ACCELERATION WITH EMC VSPEX PRIVATE CLOUD WITH RECOVERPOINT FOR VMWARE VSPHERE

Predicting Change Outcomes Leveraging SQL Server Profiler

Transcription:

A Case Study of ebay UTF-8 Database Migration Nelson Ng Chief Globalization Architect 1

What is the difficulty in migrating from ISO Latin 1 to UTF-8? 2

ebay: The World s Online Marketplace An online trading platform where everyone can trade almost anything from anywhere A global presence in 33 international markets 203 million registered users $44.3 billion in annualized gross merchandise volume (value of goods sold) 724,000 small business and individuals in U.S. use ebay as a primary or secondary sales channel 36,000 Developers via ebay API Data as of Q2 2006. 3

Agenda Reasons for UTF-8 Migration Challenges of UTF-8 Migration Approaches to UTF-8 Migration Lessons Learned 4

Reasons for UTF-8 Migration 5

Ambiguity in Textual Data Representation Raw Byte Sequence in Hex representation ISO 8859-1 CP1252 UTF-8 63 c c c 61 a a a C3 91 E4 B8 AD E5 9C 8B 64 C3 E2 81 82 AC Ã? ä - å d à â OR à ä - å œ d à â, OR Ñ 中 國 d Á 6

ebay Marketplace Textual Data Encoding Model ebay Marketplace supports both Latin 1 and UTF-8 sites on a single platform: Latin 1 Sites: US, CA, UK, AU, FR, DE, IT, BE (nl), BE (fr), NL, ES, CH, AT, IE, NZ UTF-8 Sites: TW, IN, CN, HK, MY, PH, PL, SG, SE Textual data for all sites are stored in the same set of tables and DB hosts A combination of ISO Latin 1 and UTF-8 DB hosts ISO Latin 1 DB host: Latin 1 Site records: Encoded in Cp1252 UTF-8 Site records: Encoded in UTF-8 UTF-8 DB host: All Site records: Encoded in UTF-8 7

Textual Data Encoding Flow 8

Issues with Textual Data Encoding Model Cross-Border Trade Impediments Increased code complexity and developmental costs to support mixed textual data encoding model Increased 3 rd Party integration costs to implement ebay textual data encoding model Textual data encoding model not supported by Oracle 9

Challenges of UTF-8 Migration 10

Summary of Challenges Stringent Availability Site Uptime Requirement: 99.94% Magnitude of Data Total Site DB Hosts: ~160 Total Active DB Tables: ~2000 Total DB Storage Usage: ~200TB Total Redo Log Generation: ~2TB/day Extreme Usage Peak time SQL Execution: ~462K/sec Avg. time for SELECT: ~7ms Avg. time for INSERT/UPDATE: ~15ms Total Application Servers: ~5K Frequency of Change Release Cycle: Every two weeks Notes: DB related metrics are for Primary site facing DB only. This doesn t including non site facing functions such as DR, Archive, DW etc. 11

Approach to UTF-8 Migration 12

How Big was the Problem? Scanning DB hosts with Oracle CSSCAN Identify different types of textual data encoding 7 bit ASCII (0x0 to 0x7F): No conversion. Same value as-is in UTF-8 8 bit Cp1252 (0x80 to 0xFF): Converted into 2 bytes or 3 bytes for UTF-8. UTF-8 (2bytes or 3 bytes): No conversion. Same value as-is for UTF-8 13 * Data was gathered in Oct 2003

Approaches 1. Conversion with External File 2. Conversion with Secondary DB 3. In-line Conversion WITHOUT Status flag 4. In-line Conversion WITH Status flag 14

Stages of UTF-8 DB Migration 15

Pre Conversion and Data Conversion Stages 1. Create a UTF-8 Status column with default value NULL in each table. 2. Modify code base to support UTF-8 Migration to check DB Charset a. If DB is UTF-8, then use UTF-8 rules to process textual data for all sites b. If DB is ISO Latin 1, then process textual data according to the UTF-8 Status column 3. Activate Implicit Conversion 4. Run conversion script for Explicit Conversion. 16

DB Migration Stage 1. Stop Incoming Traffic to DB Host 2. ALTER DATABASE CHARSET from WE8ISO8859P1 to UTF8 3. Restart Incoming Traffic to DB Host 17

Textual Data Encoding Flow Modification 18

Reasons for this Approach Pros No DB downtime during data conversion. Only minimal DB outage during ALTER DB Implicit Conversion to reduce Explicit Conversion effort Ability to pause Data Conversion if necessary Cons A new UTF-8 Status column has to be created for each table All DB access code has to be modified to handle the new UTF-8 Status column during transition period of data conversion Ability to rollback ALTER DB if necessary Does not require additional hardware for secondary DB 19

Lessons Learned 20

Lessons Learned Planning! Planning! Planning! Upfront Data Analysis DB Environment Preparation Leverage Site Maintenance Windows Gradual Activation of Data Conversion Data Monitoring Data Environment Changes Integrity of Data Conversion Problematic SQL in DB Log Throttle Data Conversion Speed Dedicated support resources from Development and Operations teams 21

Metrics of Migration Duration of Explicit Conversion: ~7 months Number of Records: ~18 billion Number of Columns: ~600 Number of Tables: ~200 Duration of DB Migration: ~7 months Number of DB Host: ~100 Site Uptime for Migration Period: > 99.94% 22

Q & A 23

Appendix 24

Alternative 1: Conversion with External File 1. Identify all records with 8 bit Cp1252(0x80 to 0xFF) textual data 2. Extract identified records into a file 3. Use a script to convert the textual data in the file to UTF-8 4. ALTER DATABASE CHARSET from WE8ISO8859P1 to UTF8 5. Reinsert all extracted records back into DB 25

Alternative 1: Conversion with External File Pros Avoids creating UTF-8 Status column in each table Code change to handle transition period during data migration is not required. Cons Data conversion must be completed in a single outage window Prolong DB downtime due to data conversion takes place during outage window Avoids requiring additional hardware for secondary DB 26

Alternative 2: Conversion with Secondary DB 1. Create Last Modified Date(LMD) column in each table 2. Replicate Primary DB to Secondary DB and save the timestamp 3. In Secondary DB, identify all records with 8 bit Cp1252(0x80 to 0xFF) textual data 4. In Secondary DB, use a script to convert the identified records to UTF-8 5. In Primary DB, run post conversion script to sync up modified records since DB replication 6. In Secondary DB, ALTER DB CHARSET from WE8ISO8859P1 to UTF8 7. Redirect site traffic to Secondary DB 27

Alternative 2: Conversion with Secondary DB Pros Avoids creating UTF-8 Status column in each table Code change to handle transition period during data migration is not required. Data conversion can be restarted if necessary Cons High hardware setup and administration cost for secondary DB Overhead to create and maintain Last Modified Date column High overhead to sync up all modified records since secondary DB replication No data quality impact to source during data conversion No DB outage for ALTER DB. Minimal time to redirect application servers traffic 28

Alternative 3: In-line Conversion WITHOUT Status column 1. Scan to identify all records with 8 bit Cp1252(0x80 to 0xFF) textual data to be converted 2. Use a script to convert all identified records to UTF-8 3. Repeat Step 1 & 2 to convert newly generated records with Cp1252 data until there is only a small number of Cp1252 records 4. During DB outage window, scan for remaining records with 8 bit Cp1252 textual data and convert them to UTF-8 5. ALTER DATABASE CHARSET from WE8ISO8859P1 to UTF8 29

Alternative 3: In-line Conversion WITHOUT Status flag Pros Creation of UTF-8 Status column in each table is not required Ability to pause Data Conversion if necessary Ability to rollback ALTER DB if necessary Does not require additional hardware for secondary DB Cons Implement charset detection logic into site code base could increase code complexity and performance overhead Charset detection to process each record is less reliable than explicit UTF-8 Status flag Multiple iteration of data conversion process Potential longer DB outage to convert remaining Cp1252 records during the final step of data conversion 30