HP Vertica Integration with Informatica: Connection Guide. HP Vertica Analytic Database

Similar documents
Plug-In for Informatica Guide

Connectivity Pack for Microsoft Guide

Vertica OnDemand Getting Started Guide HPE Vertica Analytic Database. Software Version: 7.2.x

HP Vertica Integration with SAP Business Objects: Tips and Techniques. HP Vertica Analytic Database

Configure an ODBC Connection to SAP HANA

HP Device Manager 4.6

FTP Server Configuration

HP Quality Center. Software Version: Microsoft Word Add-in Guide

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.1.x

Supported Platforms HPE Vertica Analytic Database. Software Version: 7.2.x

HP Application Lifecycle Management

HP Device Manager 4.6

HP IMC User Behavior Auditor

HP OpenView AssetCenter

HP Data Protector Integration with Autonomy IDOL Server

HP LeftHand SAN Solutions

How to Configure a Secure Connection to Microsoft SQL Server

ETL Tools. L. Libkin 1 Data Integration and Exchange

Operating System Installation Guide

HP Device Manager 4.6

Data Domain Discovery in Test Data Management

HP Device Manager 4.7

HP Vertica on Amazon Web Services Backup and Restore Guide

HP Enterprise Integration module for SAP applications

HP Server Automation Enterprise Edition

HP Quality Center. Software Version: Microsoft Excel Add-in Guide

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

How to use Data Protector 6.0 or 6.10 with Exchange Recovery Storage Groups to restore a single mailbox

HP Access Control Express Installation Guide

Connect to an SSL-Enabled Microsoft SQL Server Database from PowerCenter on UNIX/Linux

HP Quality Center. Software Version: Microsoft Excel Add-in Guide

Jet Data Manager 2012 User Guide

HP ALM Best Practices Series

HP Software as a Service

Instructions for installing Microsoft Windows Small Business Server 2003 R2 on HP ProLiant servers

HP ARCHIVING SOFTWARE FOR EXCHANGE

HP Quality Center. Software Version: Version Control Add-in Guide

2. Unzip the file using a program that supports long filenames, such as WinZip. Do not use DOS.

HP LeftHand SAN Solutions

Sharing Pictures, Music, and Videos on Windows Media Center Extender

Data Domain Profiling and Data Masking for Hadoop

SAP BusinessObjects Business Intelligence platform Document Version: 4.1 Support Package Information Design Tool User Guide

HP BladeSystem Management Pack version 1.0 for Microsoft System Center Essentials Troubleshooting Assistant

HP Intelligent Management Center v7.1 Virtualization Monitor Administrator Guide

Administering Windows-based HP Thin Clients with System Center 2012 R2 Configuration Manager SP1

HP INTEGRATED ARCHIVE PLATFORM

HP ThinShell. Administrator Guide

Running a Workflow on a PowerCenter Grid

HP Software as a Service. Federated SSO Guide

HP Operations Orchestration Software

HP SiteScope. HP Vertica Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems. Software Version: 11.

HP Client Automation Standard Fast Track guide

HP Device Manager 4.7

HP ProLiant DL380 G5 High Availability Storage Server

HP Device Manager 4.6

ORACLE BUSINESS INTELLIGENCE WORKSHOP

HP Business Service Management

Microsoft Dynamics NAV Connector. User Guide

Bluetooth for Windows

HP Application Lifecycle Management

RSA Security Analytics

HP Image Assistant. Table of contents. Quick Start Guide

HP PolyServe Software upgrade guide

XMailer Reference Guide

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Application Setup help topics for printing

The purpose of this document is to describe how to connect Crystal Reports with BMC Remedy AR System using ODBC.

HP Service Manager. Software Version: 9.40 For the supported Windows and Linux operating systems. Request Management help topics for printing

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Setting Up ALERE with Client/Server Data

HP External Hard Disk Drive Backup Solution by Seagate User Guide. November 2004 (First Edition) Part Number

Connecting LISTSERV to an Existing Database Management System (DBMS)

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

HP Real User Monitor. Release Notes. For the Windows and Linux operating systems Software Version: Document Release Date: November 2012

HP CloudSystem Enterprise

HP Business Service Management

HP ALM. Software Version: Tutorial

HP Web Jetadmin Database Connector Plug-in reference manual

HP AppPulse Active. Software Version: 2.2. Real Device Monitoring For AppPulse Active

HP Softpaq Download Manager and HP System Software Manager

User Guide. Informatica Smart Plug-in for HP Operations Manager. (Version 8.5.1)

HP Server Management Packs for Microsoft System Center Essentials User Guide

Modem and Local Area Networks. Document Part Number:

Creating a universe on Hive with Hortonworks HDP 2.0

FileMaker 12. ODBC and JDBC Guide

AWS Schema Conversion Tool. User Guide Version 1.0

HP LoadRunner. Software Version: Ajax TruClient Tips & Tricks

FileMaker 11. ODBC and JDBC Guide

HP Project and Portfolio Management Center

Sophos Enterprise Console server to server migration guide. Product version: 5.1 Document date: June 2012

WHITE PAPER. HP Guide to System Recovery and Restore

HP Device Manager 4.7

HP ThinPro. Table of contents. Enabling RemoteFX for RDP. Technical white paper

Web Intelligence User Guide

Quick start to evaluating HP Windows Embedded Standard 2009 Thin Clients. HP t5630w, HP t5730w, HP t5740, HP gt7720

HP D2D NAS Integration with HP Data Protector 6.11

Voyager Reporting System (VRS) Installation Guide. Revised 5/09/06

Transcription:

HP Vertica Integration with Informatica: Connection Guide HP Vertica Analytic Database HP Big Data Document Release Date: September, 2015

Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Restricted Rights Legend Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Copyright Notice Copyright 2006-2015 Hewlett-Packard Development Company, L.P. Trademark Notices Adobe is a trademark of Adobe Systems Incorporated. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. HP Vertica Integration with Informatica: Connection Guide Page 2

Contents About HP Vertica Connection Guides... 4 About This Document... 4 Informatica PowerCenter Overview... 4 History of Integration Between Informatica and HP Vertica... 4 PowerExchange Features for HP Vertica... 5 Configure and Connect to HP Vertica using the PWX Connector... 6 Install and Configure PowerExchange for HP Vertica... 6 Configure HP Vertica as Source and Target... 7 Create a Mapping... 8 Create a Workflow... 9 Configure the Workflow... 11 Run the Workflow... 13 Pushdown Optimization... 15 Pushdown Considerations... 15 Support for Pushdown Optimization Expressions... 16 Operators... 16 Variables... 16 Functions... 16 Transformations... 17 For More Information... 17 HP Vertica Integration with Informatica: Connection Guide Page 3

About HP Vertica Connection Guides HP Vertica connection guides provide basic information about setting up connections to HP Vertica from software that our technology partners create. These documents provide guidance using one specific version of HP Vertica and one specific version of the third-party vendor s software. Other versions of the third-party product may work with HP Vertica. However, Hewlett-Packard may not have tested these other versions. About This Document This document provides guidance using the latest versions of HP Vertica and Informatica as of September, 2015. Informatica PowerCenter Overview Informatica offers the PowerCenter platform for connecting to HP Vertica. PowerCenter is a scalable, highperformance enterprise ETL platform that provides the following major capabilities: Data delivery: Performing ETL (extract, transform, load) and ELT (extract, load, transform) operations, moving data among various data sources Data transformation: Basic, intermediate, and complex data transformation Design and development environment: Team-based development capabilities Metadata and modeling: Physical-to-logical data mapping, including graphical attribute-level mapping PowerCenter 9.6.1 HotFix 2 includes the PowerExchange (PWX) Connector for HP Vertica. The PWX Connector for HP Vertica includes additional capabilities and performance improvements when connected to your HP Vertica database. This document assumes that the reader is familiar with both Informatica PowerCenter and HP Vertica. For this document, Hewlett-Packard tested connecting to HP Vertica 7.x using Informatica PowerCenter 9.6.1 HotFix 2. History of Integration Between Informatica and HP Vertica Prior to PowerExchange (PWX) for HP Vertica, Hewlett-Packard developed and supported two connectors for Informatica and HP Vertica: In 2009, Hewlett-Packard developed the HP Vertica Plug-in for Informatica. This connector used the native method of loading data from Informatica into HP Vertica. In 2013, Hewlett-Packard replaced the earlier HP Vertica Plug-in for Informatica with a Java plug-in that supports all operating system platforms. This plug-in runs on generic JDBC and ODBC connections and includes new and improved features compared to the native plug-in. In 2014, Informatica released PWX Connector for HP Vertica. This connector includes enhancements to the partitioning and pushdown capabilities of Informatica. Hewlett-Packard recommends that you use the new PWX Connector for HP Vertica with Informatica PowerCenter 9.6.1 HotFix 2 to connect to your HP Vertica database. The following timeline shows the history of plug-ins for connecting from Informatica to HP Vertica: HP Vertica Integration with Informatica: Connection Guide Page 4

PowerExchange Features for HP Vertica IT organizations can access data sources without having to develop custom data access utilities using a highperformance tool PowerExchange Connector for HP Vertica. PowerExchange Connector for HP Vertica provides connectivity between Informatica PowerCenter and HP Vertica. PWX uses the HP Vertica ODBC driver to write large volumes of data into HP Vertica. The major features of PWX for HP Vertica include: PWX for HP Vertica Feature Bulk mode or relational mode for processing HP Vertica data Description Bulk mode: Write large amounts of data to HP Vertica from multiple data sources. Bulk mode is only available for writing to HP Vertica. Relational mode: Read data from an HP Vertica source and write data to an HP Vertica target. Relational mode supports pushdown optimization. Design-time features Preview source and target data before mapping. Generate and execute DDL on HP Vertica. Import HP Vertica tables. Run-time features Set the commit type, interval sources, and interval targets. Update strategy: Data-driven approach to inserting, deleting, and updating data in HP Vertica. Partitioning support Key range: One or more ports make up a partition key. SQL transformation Lookup (cached and uncached) Connection resiliency Round robin: Data is distributed to one or more partitions. Hash-based: Data is distributed to partitions in groups. Uses HP Vertica COPY LOCAL statement for various partitions in the mapping pipeline. Flexibility to process logic per input row. Supports HP Vertica lookup in the existing data integration logic. High availability HP Vertica Integration with Informatica: Connection Guide Page 5

PWX for HP Vertica Feature Compatibility with HP Vertica Java Plug-in for Informatica Description Support for existing features SQL pushdown Source-side Target-side Full pushdown optimization Supported pushdown transformations Aggregator, filter, joiner, union, sort, router, and lookup transformation as source. Expression transformation as source, target, and full. Update transformation as full. (When the source and target database are the same, all transformations are pushed to the target.) Configure and Connect to HP Vertica using the PWX Connector PowerCenter includes the PowerExchange for HP Vertica connector. Informatica supports this connector from Informatica PowerCenter 9.6.1 HotFix 2 onwards. Install and Configure PowerExchange for HP Vertica Before you start these procedures, you must install HP Vertica using the instructions in the HP Vertica Installation Guide. To install and configure PowerExchange for HP Vertica, perform the following steps on your Informatica server: 1. Ask Informatica Support for a license file that includes PowerExchange for HP Vertica license. 2. Ask Informatica Support for the Informatica PowerCenter 9.6.1 HotFix 2 software. 3. Using the license file that contains PWX for HP Vertica, install PowerCenter 9.6.1 HotFix 2. 4. Install the HP Vertica ODBC driver (7.0 and higher). For HP Vertica driver/server compatibility, review the information in HP Vertica 7.1.x Client Drivers in the HP Vertica documentation. 5. Create a configuration file named vertica.ini. This file defines the HP Vertica-specific settings required by the ODBC drivers. Here s an example vertica.ini file: [VerticaDriverName] DriverManagerEncoding=UTF-16 ODBCInstLib=<Informatica install Dir>/ODBC7.1/lib/libodbcinst.so ErrorMessagesPath=/opt/vertica/lib64 ## Installed Vertica Client Directory LogLevel=4 LogPath=/tmp For more information about the content of the vertica.ini file, see Location of the Additional Driver Settings in the HP Vertica documentation. 6. Set the VERTICAINI environment variable to the path to the vertica.ini file. 7. To use the PowerCenter bulk mode, register the PWX for HP Vertica connector: a. Make sure you have the PWX for HP Vertica XML file: <Informatica_installation_folder>\server\bin\plugin\VerticaConnector.xml. HP Vertica Integration with Informatica: Connection Guide Page 6

b. Follow the instructions to register the plug-in on the Informatica server in Registering the Plug-In's Metadata. You must specify Verticaconnector.xml as the plug-in file. c. After you have registered the plug-in, in the Repository, on the Plug-Ins tab, make sure you see the line that reads PowerExchange for HP Vertica. d. Before you continue, on the Repository Properties tab, you must switch the Operating mode back to Normal mode. For detailed instructions about registering PWX for HP Vertica, see the Informatica PowerExchange for HP Vertica User Guide for PowerCenter (Version 9.6.1 Hotfix 2 and above) document, available from Informatica Support. Configure HP Vertica as Source and Target Use PowerCenter Designer to import HP Vertica source and target definitions. Before you import a source or target definition, you must create an ODBC data source for your HP Vertica database: 1. To configure an ODBC data source on Windows, set up a DSN for your HP Vertica database(s) using the Informatica Control Panel. Under Administrative Tool, click Data Source (ODBC). Make sure to test the connection to verify that you can connect. 2. To configure an ODBC data source on Linux or other non-windows platform, add the HP Vertica database entries to the <Informatica_Installation_Directory>\ODBC7.1\odbc.ini file. Here is an example odbc.ini file: [ODBC Data Sources] vertica_odbc=libverticaodbc.so [ODBC] InstallDir=/opt/infa/Informatica/9.6.1/ODBC7.1 Trace=0 TraceFile=/opt/infa/Informatica/9.6.1/odbctrace.out TraceDll=/opt/Infa/Informatica/9.6.1/ODBC7.1/lib/DWtrc27.so [vertica_odbc] Description = Vmart Database Driver = /opt/vertica/lib64/libverticaodbc.so Database = <database_name> Servername = <server_name> UID= <user_id> PWD= <password> Port = <port_number> ConnSettings= HP Vertica Integration with Informatica: Connection Guide Page 7

SSLKeyFile= SSLCertFile= Locale=UTF-8 3. Set the ODBCINI environment variable to the path to the odbc.ini file. 4. Using Informatica Designer, import the HP Vertica source and target definition. In the Source Analyzer, on the Source menu, select Import from the Database. 5. Select the tables you want to import and click OK. The table names you imported now include the (ODBC) designation. 6. In the Target Analyzer, on the Target menu, select Import from the Database. 7. Select the tables you want to import and click OK. The table names you imported include the (ODBC) designation. Create a Mapping After you have configured the source and target tables, use PowerCenter Designer to create a mapping and create the desired transformations: 1. Open the Mapping Designer. 2. On the Mapping menu, select Create. HP Vertica Integration with Informatica: Connection Guide Page 8

3. Name the new mapping and click OK. 4. Drag the source and target tables into the Mapping Designer. Create the transformation between the source and the target. 5. Save the mapping. Create a Workflow To create a workflow, you can use the Workflow Manager or the Mapping Designer. The following example shows how to use the Mapping Designer to create the workflow and configure the PWX Connector: HP Vertica Integration with Informatica: Connection Guide Page 9

1. In the Mapping Designer, right-click and select Generate Workflow. The Workflow Generation wizard opens. 2. Step 1: Select the desired Workflow Generation Option and click Next. 3. Step 2: From the drop-down list, select the Informatica Integration Service you want to run the workflow on. If needed, under Connection Object, change the connection setting for HP Vertica database. Click Next. 4. Step 3: Make any necessary changes to the workflow settings and click Next. 5. Step 4: When you see that the workflow generated successfully, click Finish. 6. Make changes to the connection, memory, properties, and other settings using the Workflow Manager. 7. After you have saved the changes, right-click the folder for the connector and click Connect. 8. In the Workflow Manager, reconnecting to the folder allows you to view the newly created workflow. HP Vertica Integration with Informatica: Connection Guide Page 10

Configure the Workflow 1. Drag the workflow into the Workflow Designer. 2. Double-click the task. The Edit Tasks window opens. 3. Select the Mapping tab. HP Vertica Integration with Informatica: Connection Guide Page 11

4. In the left-hand pane, select your source. Under Readers, the value should be Relational Reader. Under Connections > Value, check that the HP Vertica connector is listed. 5. If needed, under Properties, change the attribute values. 6. In the left-hand pane, select your target. Under Writers, the value should be Relational Writer. Under Connections > Value, check that the HP Vertica connector is listed. 7. If needed, under Properties, change the attribute values. 8. In the Edit Tasks dialog box, select the Properties tab. Change the Commit Interval from the default (10000) to 100000, 1000000, or 10000000, depending on how many data rows you are loading. When writing large HP Vertica Integration with Informatica: Connection Guide Page 12

amounts of data to HP Vertica, larger commit intervals can improve the load performance. 9. Make any other desired changes to the attributes. When you have completed your changes, click Apply and OK. 10. Save the workflow. Run the Workflow 1. To run your workflow, right-click in the Workflow Manager and select Start Workflow. 2. To check the status of your running workflow, open the Workflow Monitor and select the folder for your workflow. The Workflow Monitor displays the status of the running workflow. The following window shows that the wf_m_demo_mapping workflow is running (no errors have occurred) and another workflow has completed successfully. HP Vertica Integration with Informatica: Connection Guide Page 13

3. When the running workflow completes, the Status reads either Succeeded or Failed. 4. To review the workflow logs, right-click s_m_<mapping_name> and select Get Session Log. 5. Review the session logs. Here is a sample from a successful workflow run: HP Vertica Integration with Informatica: Connection Guide Page 14

The session logs list any error or issue that occurred while the workflow was running. The session logs also show the number of rows that the workflow committed to the HP Vertica target, depending on the value of the commit interval. Pushdown Optimization PowerExchange for HP Vertica supports three types of pushdown optimization: Source Side PowerCenter pushes certain transformations to the source database only. Use source-side pushdown optimization where you are downloading data into a file from an HP Vertica database. Target Side PowerCenter pushes certain transformations to the target database only. Use target-side pushdown optimization when you want to load data from a file or other database into HP Vertica. Full PowerCenter pushes all tranformations to the target database when the source and target database are the same. This situation requires a relational mode connection. Use full pushdown optimization to load data from HP Vertica, use Informatica to transform the data, and then write it back to the database. Pushdown Considerations Consider the following when using pushdown optimization with Informatica and HP Vertica: HP Vertica strips the padding spaces from CHAR column values when you push down a function that takes a CHAR column as an argument. Pushdown compatibility for connection properties have to be identical for the source and target databases. The connection properties are: Code Page Connect String Connection environment SQL HP Vertica Integration with Informatica: Connection Guide Page 15

Transaction environment SQL Qualifying name for a table is <database_name>.<schema_name>.<table_name> Support for Pushdown Optimization Expressions Operators + - * / % = > < >= <= <>!= ^= NOT AND OR Variables SESSSTARTTIME SYSDATE Functions Functions listed with a * can be pushed to HP Vertica using source-side pushdown optimization. ABS() FLOOR() MIN() SYSDATE() ADD_TO_DATE() GET_DATE_PART() MOD() SYSTIMESTAMP() ASCII() IIF() POWER() TAN() AVG() INITCAP() ROUND(DATE)* TANH() CEIL()* INSTR() ROUND(NUMBER)* TO_BIGINT CHR() ISNULL() RPAD() TO_CHAR(DATE) CONCAT() LAST_DAY() RTRIM() TO_CHAR(NUMBER) COS() LENGTH() SIGN()* TO_DATE() COSH() LN() SIN() TO_DECIMAL() COUNT() LOG()* SINH() TO_FLOAT() HP Vertica Integration with Informatica: Connection Guide Page 16

DATE_COMPARE() LOOKUP SOUNDEX() TO_INTEGER() DATE_DIFF() LOWER() SQRT() TRUNC(DATE)* DECODE() LPAD() STDDEV() TRUNC(NUMBER)* EXP() LTRIM() SUBSTR() UPPER() MAX() SUM() VARIANCE() Transformations PWX for HP Vertica supports the following transformations: Transformation Aggregator transformation Expression transformation Filter transformation Joiner transformation Lookup transformation Router transformation Sorter transformation Source qualifier transformation Target transformation Union transformation Update strategy transformation Pushdown type Source-side, Full Source-side, Target-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Target-side, Full Source-side, Full Full For more details, see the Informatica PowerCenter Advanced Workflow Guide (Version 9.6.1 HotFix 2 onwards). For More Information For More Information About Tips and techniques for optimizing your HP Vertica and Informatica connection Informatica See <T&T document on community> http://www.informatica.com/ HP Vertica Integration with Informatica: Connection Guide Page 17

PowerCenter HP Vertica Community Edition HP Vertica Big Data and Analytics Community https://my.vertica.com/community/ https://www.vertica.com/ https://www.informatica.com/products/dataintegration/powercenter.html#fbid=4sohvlwepta https://community.dev.hp.com/t5/big-data-and-analytics/ctp/bigdata_analytics HP Vertica Integration with Informatica: Connection Guide Page 18