HPE Vertica Integration with Tungsten: Connection Guide

Similar documents
Vertica OnDemand Getting Started Guide HPE Vertica Analytic Database. Software Version: 7.2.x

Connectivity Pack for Microsoft Guide

Database Migration from MySQL to RDM Server

MS ACCESS DATABASE DATA TYPES

Linas Virbalas Continuent, Inc.

sqlite driver manual

Supported Platforms HPE Vertica Analytic Database. Software Version: 7.2.x

Using SQL Server Management Studio

Plug-In for Informatica Guide

HP SiteScope. HP Vertica Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems. Software Version: 11.

HP Vertica Integration with SAP Business Objects: Tips and Techniques. HP Vertica Analytic Database

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide

From Dolphins to Elephants: Real-Time MySQL to Hadoop Replication with Tungsten

Informatica Data Replication FAQs

SQL Server An Overview

Replicating to everything

How, What, and Where of Data Warehouses for MySQL

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

Service Anywhere. Release Notes

MySQL+HandlerSocket=NoSQL

HP Device Manager 4.7

Database Administration with MySQL

HP Project and Portfolio Management Center

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Application Setup help topics for printing

HP AppPulse Active. Software Version: 2.2. Real Device Monitoring For AppPulse Active

1 Changes in this release

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.1.x

ODBC Client Driver Help Kepware, Inc.

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

HP Software as a Service

FileMaker 14. ODBC and JDBC Guide

Customizing Asset Manager for Managed Services Providers (MSP) Software Asset Management

Using Business Activity Monitoring

HP Quality Center. Software Version: Microsoft Word Add-in Guide

HP ThinPro. Table of contents. Connection Configuration for RDP Farm Deployments. Technical white paper

Architecting the Future of Big Data

HP IMC User Behavior Auditor

Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum)

HP Server Automation Enterprise Edition

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Request Management help topics for printing

HP Business Service Management

Tungsten Replicator, more open than ever!

AWS Schema Conversion Tool. User Guide Version 1.0

Configuring an Alternative Database for SAS Web Infrastructure Platform Services

FileMaker 13. ODBC and JDBC Guide

HP Business Service Management

ibolt V3.2 Release Notes

HP ThinPro. Table of contents. Enabling RemoteFX for RDP. Technical white paper

HP ALM. Software Version: Tutorial

IronKey Enterprise File Audit Admin Guide

Jet Data Manager 2012 User Guide

HP LeftHand SAN Solutions

C++ Wrapper Library for Firebird Embedded SQL

HP Quality Center. Upgrade Preparation Guide

Administering Windows-based HP Thin Clients with System Center 2012 R2 Configuration Manager SP1

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

HP Image Assistant. Table of contents. Quick Start Guide

HP Real User Monitor. Release Notes. For the Windows and Linux operating systems Software Version: Document Release Date: November 2012

HP Application Lifecycle Management

How To Create A Table In Sql (Ahem)

A table is a collection of related data entries and it consists of columns and rows.

Oracle Database 10g Express

HP Universal CMDB. Software Version: Data Flow Management Best Practices

AWS Schema Conversion Tool. User Guide Version 1.0

HP Device Manager 4.7

HP Asset Manager. Software version: Integration with software distribution and configuration management tools

Ontrack PowerControls User Guide Version 8.0

HP Operations Orchestration Software

HP Operations Orchestration Software

Microsoft SQL connection to Sysmac NJ Quick Start Guide

Integrating with Apache Kafka HPE Vertica Analytic Database. Software Version: 7.2.x

HP Device Manager 4.6

SAP HANA Client Installation and Update Guide

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

HP Vertica on Amazon Web Services Backup and Restore Guide

FileMaker 12. ODBC and JDBC Guide

HP SiteScope. Hadoop Cluster Monitoring Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems

Integrating with Apache Hadoop HPE Vertica Analytic Database. Software Version: 7.2.x

CA Workload Automation Agent for Microsoft SQL Server

HP Access Control Express Installation Guide

HP Quality Center. Software Version: Version Control Add-in Guide

Changes for Release 3.0 from Release 2.1.1

HP PolyServe Software upgrade guide

Package sjdbc. R topics documented: February 20, 2015

Backing up and restoring HP Systems Insight Manager 6.0 or greater data files in a Windows environment

HP Operations Orchestration Software

AlphaServer Management Station

HP OpenView Smart Plug-in for Microsoft Exchange Server

HP Operations Orchestration Software

HP Software as a Service. Federated SSO Guide

HP Thin Client Imaging Tool

Release Bulletin Sybase ETL Small Business Edition 4.2

HP AppPulse Mobile. Adding HP AppPulse Mobile to Your Android App

Transcription:

HPE Vertica Analytic Database

Legal Notices Warranty The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Restricted Rights Legend Confidential computer software. Valid license from Hewlett Packard Enterprise required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Copyright Notice Copyright 2015 Hewlett Packard Enterprise Development L.P. Trademark Notices Adobe is a trademark of Adobe Systems Incorporated. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. This product includes an interface of the 'zlib' general purpose compression library, which is Copyright 1995-2002 Jean-loup Gailly and Mark Adler. Copyright 2015 Hewlett Packard Enterprise Development LP. Page 2

Table of Contents About Vertica Connection Guides... 4 Tungsten Replicator Overview... 4 Replication to Vertica... 4 Download and Install Tungsten... 5 Download and Install the Vertica Client Drivers... 5 Download Vertica Client Drivers... 6 Install Vertica Client Drivers... 6 Set up MySQL to Vertica Replication... 6 Configure Hosts... 6 Prepare Schemas in Vertica... 6 Install Vertica Replication... 7 Monitor Vertica Deployment... 9 Test MySQL with Vertica Replication... 11 Example: Use the SimpleBatchApplier... 12 Known Limitations... 14 Data Type Mapping: MySQL to Vertica... 14 For More Information...15 Copyright 2015 Hewlett Packard Enterprise Development LP. Page 3

About Vertica Connection Guides HPE Vertica connection guides provide basic information about setting up connections to Vertica from software that our technology partners create. These documents provide guidance using one specific version of Vertica and one specific version of the third party vendor s software. Other versions of the third-party product may work with Vertica. However, Hewlett Packard Enterprise may not have tested these other versions. This document provides guidance using the latest versions of Vertica and Tungsten as of. Tungsten Replicator Overview The Tungsten Replicator is an open source replication engine that supports many different extractor and applier modules. Data can be extracted from MySQL and Oracle, and applied to transactional stores. Data can also be applied to NoSQL stores such as MongoDB, and data warehouse stores such as Vertica. The core of the replication functionality lies in three major components: Transaction History Log (THL): The THL is the data that Tungsten takes from the master's binary logs and transports to its servers, with the addition of some metadata. Extractor: The extractor component reads data from the source data server and writes that information to the Transaction History Log (THL). The extractor is also responsible for writing the data into the THL in the native or derived format either as a SQL statement or as row-based information. Applier: Appliers within the Tungsten Replicator convert the THL information and write it to a destination data server. The applier has the capability to work with a number of different target databases, such as Vertica or MySQL. This document demonstrates how users can replicate data from MySQL to Vertica using Tungsten. Tungsten uses the JDBC driver to connect to Vertica. This document provides the replication mechanism from MySQL 5.6 to Vertica 7.2.x using Tungsten 4.0. Hewlett Packard Enterprise has tested Vertica 7.1.x and 7.2.x with Tungsten 4.0.0.18 on a Linux platform. Replication to Vertica In heterogeneous systems, it is not possible to execute the DDL and DML statements from source systems into target systems because the SQL dialects are different. Tungsten uses row-based replication when performing replication from MySQL to Vertica. Replication to Vertica follows this flow: Copyright 2015 Hewlett Packard Enterprise Development LP. Page 4

Data is extracted from the source database into the THL. While extracting data from the THL, the Tungsten Replicator writes the data into CSV files based on the names of the source tables. These files contain all of the row-based data, including the unique global transaction ID generated by the Tungsten Replicator during replication. The operation type (Insert or Delete) is also listed as part of the CSV file. Vertica loads the CSV files into staging tables. Tungsten executes SQL statements to perform updates on live versions of the tables. The statements use the CSV, batch-loaded information to delete old rows and insert new data into tables. The statements also perform updates as necessary to work effectively with Vertica. Download and Install Tungsten Follow these steps to download and install the Tungsten Replicator. 1. Download the software for Tungsten 4.0. 2. Unpack the file into the Tungsten Replicator deployment directory using the following command: Tungsten: shell > $>tar xvf tungsten-replicator-4.0.0-18.tar.gz Download and Install the Vertica Client Drivers Before you can connect to Vertica using Tungsten, you must install the Vertica client package. This package includes the JDBC client driver that Tungsten uses to connect to Vertica. Copyright 2015 Hewlett Packard Enterprise Development LP. Page 5

Download Vertica Client Drivers 1. Go to the Vertica Client Drivers page. 2. Download the version of the Vertica client package that is compatible with the architecture of your operating system and Vertica server version. Note: Vertica drivers are forward compatible, so you can connect to the Vertica server using previous versions of the client. For more information about client and server compatibility, see Client Driver and Server Version Compatibility in the Vertica documentation. Install Vertica Client Drivers Based on the client package you downloaded, follow the steps for installation from the Vertica documentation. Copy the Vertica JDBC driver into the Tungsten Replicator deployment directory: <Tungsten_Home>/tungsten-replicator-4.0.0-18/tungstenreplicator/lib Set up MySQL to Vertica Replication Before you begin, you must configure the staging host that is responsible for setting up the replication services involved in the replication process. For details on configuring the staging host, see Staging Host Configuration in the Tungsten documentation. Configure Hosts MySQL uses row-based replication to replicate data to Vertica. You must create tables that need to be replicated in Vertica; tables are not automatically created. You must use the ddlscan utility to create tables for replication. To prepare the MySQL and Vertica hosts, follow the steps in Preparing Hosts for Vertica Deployments. Prepare Schemas in Vertica The Tungsten Replicator does not prepare target database schemas and tables based on the source database. The Tungsten Replicator includes a tool called the ddlscan, which reads the schema definition from MySQL and translates that information into the schema definition required for the target database. You should prepare the source database tables you want to replicate to Vertica and then prepare table definitions for your Vertica database. Follow these steps to create schema and tables for the staging tables and target tables: 1. Create a table in MySQL using the following command: Copyright 2015 Hewlett Packard Enterprise Development LP. Page 6

mysql -utungsten DBNAME... mysql> CREATE TABLE Char1_Table(Id int primary key, 2. Execute the ddlscan tool twice. DataTypeSet VARCHAR (20) 'Char1', ValueDesc VARCHAR (50), Char1_Column CHAR ); default charset=utf8; Shell $> cd <Tungsten_Home>/tungsten-replicator-4.0.0-18/tungsten-replicator/samples/extensions/velocity The first time generates the live table definitions: Tungsten: shell $> /<Tungsten_Home>//tungsten-replicator- 4.0.0-18/tungsten-replicator/bin/ddlscan -user tungsten url jdbc:mysql:thin://host1:3306/dbname -pass password -template ddl-mysql-vertica.vm -db DBNAME >>ddl.sql The second time creates the table definitions for the staging data using the staging template: Tungsten: shell $> /<Tungsten_Home>//tungsten-replicator- 4.0.0-18/tungsten-replicator/bin/ddlscan -user tungsten url jdbc:mysql:thin://host1:3306/dbname -pass password -template ddl-mysql-vertica-staging.vm -db DBNAME >>ddl.sql 3. Edit ddl.sql file and update this file according to the target database. 4. Execute the Vertica database using the following command: dbadmin: shell $> vsql U dbadmin w <PASSWORD> <ddl.sql Install Vertica Replication 1. Change to the staging directory using the following command: Tungsten: shell $> cd <Tungsten_Home>/ tungsten-replicator- 4.0.0-18 2. Configure the main parameters for the replicator service: Tungsten: shell $>./tools/tpm configure alpha \ --master=host1 \ --members=host1, host2 \ --install-directory=/opt/continuent \ --disable-relay-logs=true \ --skip-validation-check=hostsfilecheck \ Copyright 2015 Hewlett Packard Enterprise Development LP. Page 7

--enable-heterogenous-service=true \ --start 3. Configure and install the MySQL Master: Tungsten: shell $>./tools/tpm update alpha\ --master=host1 \ --hosts=host1 \ --datasource-host=host1 \ --datasource-user=tungsten \ --datasource-password=password \ --datasource-mysql-conf=/usr/my.cnf \ --home-directory=/opt/continuent \ --java-file-encoding=utf8 \ --java-user-timezone=gmt \ --svc-extractor-filters=colnames,pkey \ -- property=replicator.filter.pkey.addcolumnstodeletes=true \ -- property=replicator.filter.pkey.addpkeytoinserts=true \ --mysql-use-bytes-for-string=false \ --start-and-report Note: The preceding command has some essential settings that help with heterogeneous replication: The Java VM file encoding and time zone are UTF-8 and GMT, respectively. Standardizing these values is required to avoid corrupting data in batch loads. Tungsten translates string values to UTF-8 rather than passing these values to slaves as bytes. Tungsten inserts filters to add column names and identify the primary key on tables. These additions are required for batch loading to work properly. 4. Configure and install the Vertica slave server: Tungsten: shell $>./tools/tpm update alpha \ --hosts=host2 \ --replication-user=dbadmin \ --replication-password=password \ --batch-enabled=true \ Copyright 2015 Hewlett Packard Enterprise Development LP. Page 8

--batch-load-language=js \ --batch-load-template=vertica6 \ --datasource-type=vertica \ --vertica-dbname=dbname \ --replication-host=host2 \ --replication-port=5433 \ --skip-validation-check=installermasterslavecheck \ --svc-applier-block-commit-size=25000 \ --svc-applier-block-commit-interval=30s \ --start-and-report Monitor Vertica Deployment Monitoring a Vertica replication scenario requires checking the status of both the master, which extracts data from MySQL, and the slave, which retrieves the remote THL information and applies it to Vertica. Copyright 2015 Hewlett Packard Enterprise Development LP. Page 9

The following graphic shows the master server. The output of the trepctl shows the current sequence number and applier status. The following graphic shows the slave server. The output of the trepctl shows the current sequence number and applier status. Copyright 2015 Hewlett Packard Enterprise Development LP. Page 10

Test MySQL with Vertica Replication To replicate data, you need a table on MySQL to hold some data. Follow these steps to test MySQL with Vertica using the Tungsten Replicator. The following example shows how to move a row from one table to another: 1. Log in to MySQL and insert a row: mysql -utungsten DBNAME... mysql> INSERT INTO Char1_Table VALUES(1,default, 'Empty', ''); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO Char1_Table VALUES(2,default, 'Typical', 'a'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO Char1_Table VALUES(3,default, 'Max', 'Z'); Copyright 2015 Hewlett Packard Enterprise Development LP. Page 11

Query OK, 1 row affected (0.00 sec) 2. If you configured things properly, you should see the following on the Vertica side: dbadmin=> SELECT * from tungsten.char1_table; Id DataTypeSet ValueDesc Char1_Column ----+------------------------------------------- 1 default Empty 2 default Typical a 3 default Max Z (3 rows) Example: Use the SimpleBatchApplier The Tungsten Replicator applies data to Vertica using a new applier class called the SimpleBatchApplier. It implements the CSV files through the following process: 1. As new transactions arrive, the Replicator writes them to CSV files named after the corresponding Vertica tables. For example, if you have updates for a table named simple_tab in a schema test, the format would look like the following: Schema Table Column Type Size --------+-------------+-----------------+--------------+----- tungsten Char1_Table id int 8 tungsten Char1_Table DataTypeSet varchar(20) 20 tungsten Char1_Table ValueDesc varchar(40) 40 tungsten Char1_Table Char1_Column char 1 2. The updates go into a file named test.simple_tab. The following is an example of the data in the CSV file: "64087","I","5","Some data","some data","b","1" "64087","I","6","more data","more data","c","2" "64088","D",3",","default","Max", z,"3" 3. The CSV file includes a unique global transaction ID, an operation code (I for insert and D for delete), and the primary key. For inserts, there are additional columns that contain data; for deletes, columns contain nulls. The last column is a row number, which allows for ordering after the data is loaded into Vertica. Copyright 2015 Hewlett Packard Enterprise Development LP. Page 12

4. The Tungsten Replicator applies transactions to replicas in serial order without deviations. If you INSERT and then UPDATE a row, it always works because the Replicator applies them to the slave server in the same order. 5. The Tungsten Replicator continues writing transactions until it reaches the block commit maximum. The Replicator then closes each CSV file and loads the content into a staging table that is named according to the base name. The staging table format mimics the CSV file columns. For example, the staging table could look like the following example: Schema Table Column Type Size --------+------------------+-----------------+--------------+- -----+ tungsten stage_xxx_char1_table tungsten_seqno int 8 tungsten stage_xxx_char1_table tungsten_opcode char(1) 1 tungsten stage_xxx_char1_table id int 8 tungsten stage_xxx_char1_table DataTypeSet varchar(20) 20 tungsten stage_xxx_char1_table ValueDesc Varchar(40) 40 tungsten stage_xxx_char1_table Char1_Column Char 1 tungsten stage_xxx_char1_table tungsten_row_id int 8 6. Finally, the Replicator applies the deletes and inserts to the table test.simple_tab by executing SQL commands as shown in the following example: DELETE FROM tungsten.char1_table WHERE id IN (SELECT id FROM tungsten.stage_xxx_char1_table WHERE tungsten_opcode = 'D'); INSERT INTO tungsten.char1_table(id, DataTypeSet,ValueDesc,Char1_Column) SELECT id, DataTypeSet,ValueDesc,Char1_Column FROM tungsten.stage_xxx_char1_table AS stage_a WHERE tungsten_opcode='i' AND tungsten_row_id IN (SELECT MAX(tungsten_row_id) FROM tungsten.stage_xxx_char1_table GROUP BY id); Copyright 2015 Hewlett Packard Enterprise Development LP. Page 13

Known Limitations Tungsten currently has some important limitations for batch loading, namely: Primary keys must be a single column only. Tungsten does not handle multi-column keys. You must define the primary keys. Binary data may cause problems when converted to CSV as it converts to Unicode. Data Type Mapping: MySQL to Vertica The following table shows data type mapping between MySQL data types and Vertica data types. MySQL Data Type Vertica Data Type DATETIME DATETIME TIMESTAMP TIMESTAMP DATE DATE TIME TIME TINYINT TINYINT SMALLINT SMALLINT MEDIUMINT INT INT INT BIGINT INT VARCHAR VARCHAR CHAR CHAR BINARY BINARY VARBINARY VARBINARY TEXT, TINYTEXT, MEDIUMTEXT, LONGTEXT VARCHAR(65000) BLOB, TINYBLOB, MEDIUMBLOB, LONGBLOB VARBINARY(65000) FLOAT FLOAT Copyright 2015 Hewlett Packard Enterprise Development LP. Page 14

DOUBLE ENUM SET BIT(1) BIT DOUBLE PRECISION VARCHAR VARCHAR(4000) BOOLEAN CHAR(64) For More Information For More Information About Tungsten Vertica Community Edition Vertica Documentation Big Data and Analytics Community See http://pubs.vmware.com/continuent/t ungsten-replicator-4.0/index.html https://my.vertica.com/community/ http://my.vertica.com/docs/7.2.x/htm L/index.htm https://community.dev.hpe.com/t5/big -Data-and-Analytics/ctp/bigdata_analytics Copyright 2015 Hewlett Packard Enterprise Development LP. Page 15