HP Vertica Integration with Informatica: Connection Guide HP Vertica Analytic Database HP Big Data Document Release Date: September, 2015
Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Restricted Rights Legend Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Copyright Notice Copyright 2006-2015 Hewlett-Packard Development Company, L.P. Trademark Notices Adobe is a trademark of Adobe Systems Incorporated. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. HP Vertica Integration with Informatica: Connection Guide Page 2
Contents About HP Vertica Connection Guides... 4 About This Document... 4 Informatica PowerCenter Overview... 4 History of Integration Between Informatica and HP Vertica... 4 PowerExchange Features for HP Vertica... 5 Configure and Connect to HP Vertica using the PWX Connector... 6 Install and Configure PowerExchange for HP Vertica... 6 Configure HP Vertica as Source and Target... 7 Create a Mapping... 8 Create a Workflow... 9 Configure the Workflow... 11 Run the Workflow... 13 Pushdown Optimization... 15 Pushdown Considerations... 15 Support for Pushdown Optimization Expressions... 16 Operators... 16 Variables... 16 Functions... 16 Transformations... 17 For More Information... 17 HP Vertica Integration with Informatica: Connection Guide Page 3
About HP Vertica Connection Guides HP Vertica connection guides provide basic information about setting up connections to HP Vertica from software that our technology partners create. These documents provide guidance using one specific version of HP Vertica and one specific version of the third-party vendor s software. Other versions of the third-party product may work with HP Vertica. However, Hewlett-Packard may not have tested these other versions. About This Document This document provides guidance using the latest versions of HP Vertica and Informatica as of September, 2015. Informatica PowerCenter Overview Informatica offers the PowerCenter platform for connecting to HP Vertica. PowerCenter is a scalable, highperformance enterprise ETL platform that provides the following major capabilities: Data delivery: Performing ETL (extract, transform, load) and ELT (extract, load, transform) operations, moving data among various data sources Data transformation: Basic, intermediate, and complex data transformation Design and development environment: Team-based development capabilities Metadata and modeling: Physical-to-logical data mapping, including graphical attribute-level mapping PowerCenter 9.6.1 HotFix 2 includes the PowerExchange (PWX) Connector for HP Vertica. The PWX Connector for HP Vertica includes additional capabilities and performance improvements when connected to your HP Vertica database. This document assumes that the reader is familiar with both Informatica PowerCenter and HP Vertica. For this document, Hewlett-Packard tested connecting to HP Vertica 7.x using Informatica PowerCenter 9.6.1 HotFix 2. History of Integration Between Informatica and HP Vertica Prior to PowerExchange (PWX) for HP Vertica, Hewlett-Packard developed and supported two connectors for Informatica and HP Vertica: In 2009, Hewlett-Packard developed the HP Vertica Plug-in for Informatica. This connector used the native method of loading data from Informatica into HP Vertica. In 2013, Hewlett-Packard replaced the earlier HP Vertica Plug-in for Informatica with a Java plug-in that supports all operating system platforms. This plug-in runs on generic JDBC and ODBC connections and includes new and improved features compared to the native plug-in. In 2014, Informatica released PWX Connector for HP Vertica. This connector includes enhancements to the partitioning and pushdown capabilities of Informatica. Hewlett-Packard recommends that you use the new PWX Connector for HP Vertica with Informatica PowerCenter 9.6.1 HotFix 2 to connect to your HP Vertica database. The following timeline shows the history of plug-ins for connecting from Informatica to HP Vertica: HP Vertica Integration with Informatica: Connection Guide Page 4
PowerExchange Features for HP Vertica IT organizations can access data sources without having to develop custom data access utilities using a highperformance tool PowerExchange Connector for HP Vertica. PowerExchange Connector for HP Vertica provides connectivity between Informatica PowerCenter and HP Vertica. PWX uses the HP Vertica ODBC driver to write large volumes of data into HP Vertica. The major features of PWX for HP Vertica include: PWX for HP Vertica Feature Bulk mode or relational mode for processing HP Vertica data Description Bulk mode: Write large amounts of data to HP Vertica from multiple data sources. Bulk mode is only available for writing to HP Vertica. Relational mode: Read data from an HP Vertica source and write data to an HP Vertica target. Relational mode supports pushdown optimization. Design-time features Preview source and target data before mapping. Generate and execute DDL on HP Vertica. Import HP Vertica tables. Run-time features Set the commit type, interval sources, and interval targets. Update strategy: Data-driven approach to inserting, deleting, and updating data in HP Vertica. Partitioning support Key range: One or more ports make up a partition key. SQL transformation Lookup (cached and uncached) Connection resiliency Round robin: Data is distributed to one or more partitions. Hash-based: Data is distributed to partitions in groups. Uses HP Vertica COPY LOCAL statement for various partitions in the mapping pipeline. Flexibility to process logic per input row. Supports HP Vertica lookup in the existing data integration logic. High availability HP Vertica Integration with Informatica: Connection Guide Page 5
PWX for HP Vertica Feature Compatibility with HP Vertica Java Plug-in for Informatica Description Support for existing features SQL pushdown Source-side Target-side Full pushdown optimization Supported pushdown transformations Aggregator, filter, joiner, union, sort, router, and lookup transformation as source. Expression transformation as source, target, and full. Update transformation as full. (When the source and target database are the same, all transformations are pushed to the target.) Configure and Connect to HP Vertica using the PWX Connector PowerCenter includes the PowerExchange for HP Vertica connector. Informatica supports this connector from Informatica PowerCenter 9.6.1 HotFix 2 onwards. Install and Configure PowerExchange for HP Vertica Before you start these procedures, you must install HP Vertica using the instructions in the HP Vertica Installation Guide. To install and configure PowerExchange for HP Vertica, perform the following steps on your Informatica server: 1. Ask Informatica Support for a license file that includes PowerExchange for HP Vertica license. 2. Ask Informatica Support for the Informatica PowerCenter 9.6.1 HotFix 2 software. 3. Using the license file that contains PWX for HP Vertica, install PowerCenter 9.6.1 HotFix 2. 4. Install the HP Vertica ODBC driver (7.0 and higher). For HP Vertica driver/server compatibility, review the information in HP Vertica 7.1.x Client Drivers in the HP Vertica documentation. 5. Create a configuration file named vertica.ini. This file defines the HP Vertica-specific settings required by the ODBC drivers. Here s an example vertica.ini file: [VerticaDriverName] DriverManagerEncoding=UTF-16 ODBCInstLib=<Informatica install Dir>/ODBC7.1/lib/libodbcinst.so ErrorMessagesPath=/opt/vertica/lib64 ## Installed Vertica Client Directory LogLevel=4 LogPath=/tmp For more information about the content of the vertica.ini file, see Location of the Additional Driver Settings in the HP Vertica documentation. 6. Set the VERTICAINI environment variable to the path to the vertica.ini file. 7. To use the PowerCenter bulk mode, register the PWX for HP Vertica connector: a. Make sure you have the PWX for HP Vertica XML file: <Informatica_installation_folder>\server\bin\plugin\VerticaConnector.xml. HP Vertica Integration with Informatica: Connection Guide Page 6
b. Follow the instructions to register the plug-in on the Informatica server in Registering the Plug-In's Metadata. You must specify Verticaconnector.xml as the plug-in file. c. After you have registered the plug-in, in the Repository, on the Plug-Ins tab, make sure you see the line that reads PowerExchange for HP Vertica. d. Before you continue, on the Repository Properties tab, you must switch the Operating mode back to Normal mode. For detailed instructions about registering PWX for HP Vertica, see the Informatica PowerExchange for HP Vertica User Guide for PowerCenter (Version 9.6.1 Hotfix 2 and above) document, available from Informatica Support. Configure HP Vertica as Source and Target Use PowerCenter Designer to import HP Vertica source and target definitions. Before you import a source or target definition, you must create an ODBC data source for your HP Vertica database: 1. To configure an ODBC data source on Windows, set up a DSN for your HP Vertica database(s) using the Informatica Control Panel. Under Administrative Tool, click Data Source (ODBC). Make sure to test the connection to verify that you can connect. 2. To configure an ODBC data source on Linux or other non-windows platform, add the HP Vertica database entries to the <Informatica_Installation_Directory>\ODBC7.1\odbc.ini file. Here is an example odbc.ini file: [ODBC Data Sources] vertica_odbc=libverticaodbc.so [ODBC] InstallDir=/opt/infa/Informatica/9.6.1/ODBC7.1 Trace=0 TraceFile=/opt/infa/Informatica/9.6.1/odbctrace.out TraceDll=/opt/Infa/Informatica/9.6.1/ODBC7.1/lib/DWtrc27.so [vertica_odbc] Description = Vmart Database Driver = /opt/vertica/lib64/libverticaodbc.so Database = <database_name> Servername = <server_name> UID= <user_id> PWD= <password> Port = <port_number> ConnSettings= HP Vertica Integration with Informatica: Connection Guide Page 7
SSLKeyFile= SSLCertFile= Locale=UTF-8 3. Set the ODBCINI environment variable to the path to the odbc.ini file. 4. Using Informatica Designer, import the HP Vertica source and target definition. In the Source Analyzer, on the Source menu, select Import from the Database. 5. Select the tables you want to import and click OK. The table names you imported now include the (ODBC) designation. 6. In the Target Analyzer, on the Target menu, select Import from the Database. 7. Select the tables you want to import and click OK. The table names you imported include the (ODBC) designation. Create a Mapping After you have configured the source and target tables, use PowerCenter Designer to create a mapping and create the desired transformations: 1. Open the Mapping Designer. 2. On the Mapping menu, select Create. HP Vertica Integration with Informatica: Connection Guide Page 8
3. Name the new mapping and click OK. 4. Drag the source and target tables into the Mapping Designer. Create the transformation between the source and the target. 5. Save the mapping. Create a Workflow To create a workflow, you can use the Workflow Manager or the Mapping Designer. The following example shows how to use the Mapping Designer to create the workflow and configure the PWX Connector: HP Vertica Integration with Informatica: Connection Guide Page 9
1. In the Mapping Designer, right-click and select Generate Workflow. The Workflow Generation wizard opens. 2. Step 1: Select the desired Workflow Generation Option and click Next. 3. Step 2: From the drop-down list, select the Informatica Integration Service you want to run the workflow on. If needed, under Connection Object, change the connection setting for HP Vertica database. Click Next. 4. Step 3: Make any necessary changes to the workflow settings and click Next. 5. Step 4: When you see that the workflow generated successfully, click Finish. 6. Make changes to the connection, memory, properties, and other settings using the Workflow Manager. 7. After you have saved the changes, right-click the folder for the connector and click Connect. 8. In the Workflow Manager, reconnecting to the folder allows you to view the newly created workflow. HP Vertica Integration with Informatica: Connection Guide Page 10
Configure the Workflow 1. Drag the workflow into the Workflow Designer. 2. Double-click the task. The Edit Tasks window opens. 3. Select the Mapping tab. HP Vertica Integration with Informatica: Connection Guide Page 11
4. In the left-hand pane, select your source. Under Readers, the value should be Relational Reader. Under Connections > Value, check that the HP Vertica connector is listed. 5. If needed, under Properties, change the attribute values. 6. In the left-hand pane, select your target. Under Writers, the value should be Relational Writer. Under Connections > Value, check that the HP Vertica connector is listed. 7. If needed, under Properties, change the attribute values. 8. In the Edit Tasks dialog box, select the Properties tab. Change the Commit Interval from the default (10000) to 100000, 1000000, or 10000000, depending on how many data rows you are loading. When writing large HP Vertica Integration with Informatica: Connection Guide Page 12
amounts of data to HP Vertica, larger commit intervals can improve the load performance. 9. Make any other desired changes to the attributes. When you have completed your changes, click Apply and OK. 10. Save the workflow. Run the Workflow 1. To run your workflow, right-click in the Workflow Manager and select Start Workflow. 2. To check the status of your running workflow, open the Workflow Monitor and select the folder for your workflow. The Workflow Monitor displays the status of the running workflow. The following window shows that the wf_m_demo_mapping workflow is running (no errors have occurred) and another workflow has completed successfully. HP Vertica Integration with Informatica: Connection Guide Page 13
3. When the running workflow completes, the Status reads either Succeeded or Failed. 4. To review the workflow logs, right-click s_m_<mapping_name> and select Get Session Log. 5. Review the session logs. Here is a sample from a successful workflow run: HP Vertica Integration with Informatica: Connection Guide Page 14
The session logs list any error or issue that occurred while the workflow was running. The session logs also show the number of rows that the workflow committed to the HP Vertica target, depending on the value of the commit interval. Pushdown Optimization PowerExchange for HP Vertica supports three types of pushdown optimization: Source Side PowerCenter pushes certain transformations to the source database only. Use source-side pushdown optimization where you are downloading data into a file from an HP Vertica database. Target Side PowerCenter pushes certain transformations to the target database only. Use target-side pushdown optimization when you want to load data from a file or other database into HP Vertica. Full PowerCenter pushes all tranformations to the target database when the source and target database are the same. This situation requires a relational mode connection. Use full pushdown optimization to load data from HP Vertica, use Informatica to transform the data, and then write it back to the database. Pushdown Considerations Consider the following when using pushdown optimization with Informatica and HP Vertica: HP Vertica strips the padding spaces from CHAR column values when you push down a function that takes a CHAR column as an argument. Pushdown compatibility for connection properties have to be identical for the source and target databases. The connection properties are: Code Page Connect String Connection environment SQL HP Vertica Integration with Informatica: Connection Guide Page 15
Transaction environment SQL Qualifying name for a table is <database_name>.<schema_name>.<table_name> Support for Pushdown Optimization Expressions Operators + - * / % = > < >= <= <>!= ^= NOT AND OR Variables SESSSTARTTIME SYSDATE Functions Functions listed with a * can be pushed to HP Vertica using source-side pushdown optimization. ABS() FLOOR() MIN() SYSDATE() ADD_TO_DATE() GET_DATE_PART() MOD() SYSTIMESTAMP() ASCII() IIF() POWER() TAN() AVG() INITCAP() ROUND(DATE)* TANH() CEIL()* INSTR() ROUND(NUMBER)* TO_BIGINT CHR() ISNULL() RPAD() TO_CHAR(DATE) CONCAT() LAST_DAY() RTRIM() TO_CHAR(NUMBER) COS() LENGTH() SIGN()* TO_DATE() COSH() LN() SIN() TO_DECIMAL() COUNT() LOG()* SINH() TO_FLOAT() HP Vertica Integration with Informatica: Connection Guide Page 16
DATE_COMPARE() LOOKUP SOUNDEX() TO_INTEGER() DATE_DIFF() LOWER() SQRT() TRUNC(DATE)* DECODE() LPAD() STDDEV() TRUNC(NUMBER)* EXP() LTRIM() SUBSTR() UPPER() MAX() SUM() VARIANCE() Transformations PWX for HP Vertica supports the following transformations: Transformation Aggregator transformation Expression transformation Filter transformation Joiner transformation Lookup transformation Router transformation Sorter transformation Source qualifier transformation Target transformation Union transformation Update strategy transformation Pushdown type Source-side, Full Source-side, Target-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Source-side, Full Target-side, Full Source-side, Full Full For more details, see the Informatica PowerCenter Advanced Workflow Guide (Version 9.6.1 HotFix 2 onwards). For More Information For More Information About Tips and techniques for optimizing your HP Vertica and Informatica connection Informatica See <T&T document on community> http://www.informatica.com/ HP Vertica Integration with Informatica: Connection Guide Page 17
PowerCenter HP Vertica Community Edition HP Vertica Big Data and Analytics Community https://my.vertica.com/community/ https://www.vertica.com/ https://www.informatica.com/products/dataintegration/powercenter.html#fbid=4sohvlwepta https://community.dev.hp.com/t5/big-data-and-analytics/ctp/bigdata_analytics HP Vertica Integration with Informatica: Connection Guide Page 18