PWX CONNECTORS FOR INFORMATICA FOR EMC GREENPLUM DATABASES

Similar documents
Plug-In for Informatica Guide

Greenplum Database (software-only environments): Greenplum Database (4.0 and higher supported, or higher recommended)

DEPLOYING EMC DOCUMENTUM BUSINESS ACTIVITY MONITOR SERVER ON IBM WEBSPHERE APPLICATION SERVER CLUSTER

Connect to an SSL-Enabled Microsoft SQL Server Database from PowerCenter on UNIX/Linux

EMC VoyenceControl Integration Module. BMC Atrium Configuration Management Data Base (CMDB) Guide. version P/N REV A01

Working with the Cognos BI Server Using the Greenplum Database

EMC AVAMAR 6.0 GUIDE FOR IBM DB2 P/N REV A01 EMC CORPORATION CORPORATE HEADQUARTERS: HOPKINTON, MA

Integration Module for BMC Remedy Helpdesk

Setting Up a Unisphere Management Station for the VNX Series P/N Revision A01 January 5, 2010

Technical Notes. EMC NetWorker Performing Backup and Recovery of SharePoint Server by using NetWorker Module for Microsoft SQL VDI Solution

EMC NetWorker Module for Microsoft Exchange Server Release 5.1

Configuring and Integrating Oracle

IBM WEBSPHERE LOAD BALANCING SUPPORT FOR EMC DOCUMENTUM WDK/WEBTOP IN A CLUSTERED ENVIRONMENT

Using Microsoft Windows Authentication for Microsoft SQL Server Connections in Data Archive

Secure Agent Quick Start for Windows

Reconfiguring VMware vsphere Update Manager

Use QNAP NAS for Backup

Dell Statistica Statistica Enterprise Installation Instructions

Reconfiguration of VMware vcenter Update Manager

SETTING UP ACTIVE DIRECTORY (AD) ON WINDOWS 2008 FOR EROOM

SMTP POP3 SETUP FOR EMC DOCUMENTUM eroom

Front-Office Server 2.7

STATISTICA VERSION 10 STATISTICA ENTERPRISE SERVER INSTALLATION INSTRUCTIONS

STATISTICA VERSION 9 STATISTICA ENTERPRISE INSTALLATION INSTRUCTIONS FOR USE WITH TERMINAL SERVER

Alteryx Predictive Analytics for Oracle R

PigCHAMP Knowledge Software. Enterprise Edition Installation Guide

Sage Intelligence Financial Reporting for Sage ERP X3 Version 6.5 Installation Guide

EMC ApplicationXtender Server

Operating System Installation Guide

Installing and Configuring vcloud Connector

Installing and Configuring vcloud Connector

Installing Management Applications on VNX for File

Using Windows Administrative Tools on VNX

Dell Statistica Document Management System (SDMS) Installation Instructions

Informatica Corporation Proactive Monitoring for PowerCenter Operations Version 3.0 Release Notes May 2014

Configuring Avaya Aura Communication Manager and Avaya Call Management System Release 16.3 with Avaya Contact Center Control Manager Issue 1.

English ETERNUS CS800 S3. Backup Exec OST Guide

Foglight. Foglight for Virtualization, Free Edition Installation and Configuration Guide

Enabling Backups for Windows and MAC OS X

Deploying EMC Documentum WDK Applications with IBM WebSEAL as a Reverse Proxy

EMC Celerra Network Server

Managing the SSL Certificate for the ESRS HTTPS Listener Service Technical Notes P/N REV A01 January 14, 2011

EMC NetWorker Module for Microsoft for Windows Bare Metal Recovery Solution

HP Device Manager 4.6

INTEROPERABILITY OF SAP BUSINESS OBJECTS 4.0 WITH GREENPLUM DATABASE - AN INTEGRATION GUIDE FOR WINDOWS USERS (64 BIT)

Technical Note P/N REV A02 May 07, 2010

Installing and Configuring DB2 10, WebSphere Application Server v8 & Maximo Asset Management

Reconfiguring VMware vsphere Update Manager

EventTracker: Configuring DLA Extension for AWStats Report AWStats Reports

VMWARE PROTECTION USING VBA WITH NETWORKER 8.1

WhatsUp Gold v16.1 Installation and Configuration Guide

Allworx OfficeSafe Operations Guide Release 6.0

Copyright 2015 SolarWinds Worldwide, LLC. All rights reserved worldwide. No part of this document may be reproduced by any means nor modified,

User Guide. Informatica Smart Plug-in for HP Operations Manager. (Version 8.5.1)

ENABLING SINGLE SIGN-ON FOR EMC DOCUMENTUM WDK-BASED APPLICATIONS USING IBM WEBSEAL ON AIX

EMC Celerra Version 5.6 Technical Primer: Control Station Password Complexity Policy Technology Concepts and Business Considerations

EMC ViPR Controller Add-in for Microsoft System Center Virtual Machine Manager

Snow Inventory. Installing and Evaluating

Informatica Cloud & Redshift Getting Started User Guide

Technical Note. Performing Exchange Server Granular Level Recovery by using the EMC Avamar 7.1 Plug-in for Exchange VSS with Ontrack PowerControls

EMC Documentum Repository Services for Microsoft SharePoint

User Document. Adobe Acrobat 7.0 for Microsoft Windows Group Policy Objects and Active Directory

Installation Instruction STATISTICA Enterprise Small Business

Vodafone PC SMS (Software version 4.7.1) User Manual

SOA Software: Troubleshooting Guide for Agents

SAS 9.3 Foundation for Microsoft Windows

MS SQL Express installation and usage with PHMI projects

NSi Mobile Installation Guide. Version 6.2

Practice Fusion API Client Installation Guide for Windows

Release Bulletin Sybase ETL Small Business Edition 4.2

Using Group Policy to Manage and Enforce ACL on VNX for File P/N REV A01 February 2011

Jive Connects for Microsoft SharePoint: Troubleshooting Tips

KeyAdvantage System DMS Integration. Software User Manual

Bulk Downloader. Call Recording: Bulk Downloader

Symantec AntiVirus Corporate Edition Patch Update

Oracle Enterprise Manager

Parallels Transporter Agent

STATISTICA VERSION 12 STATISTICA ENTERPRISE SMALL BUSINESS INSTALLATION INSTRUCTIONS

Archive Server for MDaemon disaster recovery & database migration

Configure an ODBC Connection to SAP HANA

EMC Documentum Interactive Delivery Services Accelerated: Step-by-Step Setup Guide

2. Installation Instructions - Windows (Download)

ATT8367-Novell GroupWise 2014 and the Directory Labs

Suite. How to Use GrandMaster Suite. Exporting with ODBC

Sage 100 ERP. Installation and System Administrator s Guide

HP Data Protector Integration with Autonomy IDOL Server

Setting up SQL Translation Framework OBE for Database 12cR1

ilaw Installation Procedure

Cloud Services. Introduction...2 Overview...2. Security considerations Installation...3 Server Configuration...4

Installing the BlackBerry Enterprise Server Management console with a remote database

McAfee Network Threat Response (NTR) 4.0

Installation and Configuration Guide

NetIQ Sentinel Quick Start Guide

MobiLink Synchronization with Microsoft SQL Server and Adaptive Server Anywhere in 30 Minutes

EMC Documentum Business Process Suite

Adeptia Suite 6.2. Application Services Guide. Release Date October 16, 2014

Exercise Safe Commands and Audit Trail

CA SiteMinder. Directory Configuration - OpenLDAP. r6.0 SP6

Administration guide. Océ LF Systems. Connectivity information for Scan-to-File

Transcription:

White Paper PWX CONNECTORS FOR INFORMATICA FOR EMC GREENPLUM DATABASES Abstract This white paper explains how the EMC Greenplum PowerExchange (PWX) connector is used in conjunction with the Informatica Workflow Manager to create tasks that leverage the bulk load capability of the Greenplum database. It explains the uses, configuration, setup procedure, and best practices for the PWX connectors. April 2011

Copyright 2011 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate of its publication date. The information is subject to change without notice. The information in this publication is provided as is. EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com. Part Number h8226 2

Table of Contents Executive summary... 4 Audience... 4 Organization of this paper... 4 The PWX connector... 4 Supported versions... 6 Installing the PWX connector... 6 Using the PWX connector in Informatica Workflow Manager... 7 Best practices and tips... 8 Registration... 10 Conclusion... 10 References... 10 3

Executive summary Informatica PowerCenter is a popular extract-transform-load (ETL) tool for EMC Greenplum database customers. By using the Informatica Workflow Manager, customers can easily create tasks that load the source data into the target Greenplum database. Greenplum also created a PowerExchange (PWX) connector for Informatica. This PWX connector facilitates the database loading process and integrates the Informatica PowerCenter Workflow Manager with the Greenplum database. This leveraging of the bulk loading capabilities of the Greenplum utility results in better data loading performance. Audience This white paper is intended for EMC field-facing employees such as sales, technical consultants, support, and customers who will be using the Greenplum PWX connector for Informatica in their daily work. It documents the PWX connector s capabilities, and shows the readers how it can be used in conjunction with Informatica s Workflow Manager in creating a work task. The reader is not expected to have any prior knowledge of the Greenplum PWX connector but should be familiar with how the Informatica Workflow Manager operates. This is not an installation guide. Organization of this paper This paper covers the following topics: The PWX connector Supported versions of Informatica Installation and registration of the PWX connector Use of the PWX connector Best practices Registration The PWX connector Informatica is one of the most popular data integration tools in the market. Its products include the PowerCenter and PowerExchange family of products. Customers use PowerCenter to design workflow tasks that can be used to extract input data from files or databases, and then loaded into Greenplum databases. PowerCenter also has the capability to run transformations on the input data while massaging the data into the desired input format for the target databases. Informatica PowerExchange is a family of data access tools that help customers access, load, and deliver data, as part of the ETL process. It has customized graphical user interfaces that help customers access native data types and special 4

features and capabilities. For example, PowerExchange supports source and target data from Adabas, DB2, Informix, POP and IMAP email formats, and many others. The PWX connector for Greenplum is a special adapter written by EMC Greenplum. The PWX connector fills a gap in the PowerExchange support line, specifically by supporting the Greenplum database as the target database. The connector blends in seamlessly with the Workflow Manager in PowerCenter as a target writer. On the Mapping tab of the Edit Tasks frame, you can click on the target s properties, and select Greenplum Writer as the default writer. The other choices may be File Writer and Relational Writer. Figure 1. Workflow Manager target mapping By using the PWX connector, customers receive another benefit. The Greenplum utility is used internally for bulk load of data into the database directly from Informatica s PowerCenter. The utility is an interface to the Greenplum external table parallel-loading feature. By using a load specification control file, executes data loading by: 5

Invoking the Greenplum parallel file server program Creating an external table definition based on the source data definition Executing the SQL commands to load the data Supported versions The PWX connector for Greenplum is supported on many versions across various operating systems: Table 1. Informatica PWX connector versions for Greenplum PowerCenter 8.1.1 PowerCenter 8.6.1 GPDB 3.2 PWX 1.1.0.0 PWX 1.1.0.0 GPDB 3.3.0 3.3.6 PWX 1.2.0.x PWX 1.2.0.x GPDB 3.3.7 PWX 1.2.1.0 GPDB 4.0 PWX 2.0.0.0 GPDB 4.1 * * GPDB 4.1 is generally available, but testing has been completed on only 64-bit Windows 2003 servers and 64-bit Red Hat Linux servers at the time of publication. Testing on GPDB 4.1 with PowerCenter 9.0.1 is in progress. Table 2. PowerCenter versions per operating system Operating Systems PowerCenter Versions Windows x86 32-bit 8.11, 8.5.1,8.6,8.6.1 Windows x86 64-bit 8.6.1 Red Hat Linux 4 x86 32-bit 8.1.1, 8.5.1, 8.6, 8.6.1 Solaris Sparc 10 32-bit 8.1.1, 8.5.1, 8.6, 8.6.1 Solaris Space 10 64-bit 8.1.1, 8.5.1, 8.6, 8.6.1 Note: You can use a 32-bit version of the PWX connector if you run a 32-bit version of PowerCenter on a 64-bit platform. Installing the PWX connector To use the PWX connector, you must install the following Greenplum software packages for the specific platform where the PowerCenter software is running: Table 3. Greenplum software packages Greenplum software Purpose greenplum-loader-<version><platform>.bin Installs the and utilities greenplum-connectivity-<version><platform>.bin Installs the ODBC driver greenplum-pwx-<version><platform>.bin Installs PWX for Greenplum 6

Note: The Greenplum load and Greenplum connectivity packages are freely available for download from the Greenplum Network website (http://gpn.greenplum.com), while the Greenplum PWX package is available to EMC customers only, and available for download from an internal website. EMC will be transitioning all Greenplum articles to EMC Powerlink in the near future. The Greenplum loader package installs the and utilities that are used to bulk load data into the Greenplum database. The connectivity package is used to import the target table schema from the Greenplum database. Importing the schema of the desired table into the Greenplum database target table is necessary in order to create the mapping between the source data and the target Greenplum database. Using the PWX connector in Informatica Workflow Manager Using Informatica PowerCenter as an ETL tool involves the following steps: Use the PowerCenter Repository Manager to add a repository and folder. Use the PowerCenter Designer to define the source and target data, and the mapping and transformations in between. Use the Workflow Manager to define tasks and workflow, and to run the tasks. Use the Workflow Monitor to monitor the progress of the tasks. In the Workflow Manager, you define where the data that was extracted is to be loaded in the Greenplum database. As shown in Figure 1 on page 5, the usual method is to use the Relational Writer. Greenplum has presented our customers with an alternative: the Greenplum Writer. To use the Greenplum Writer, open the properties of the task by double-clicking the task icon. On the Mapping tab, click the target icon to expose the Writers drop-down list. Selecting Greenplum Writer will load the PWX connector into the task. Figure 2. Using the Greenplum Writer 7

When Informatica Workflow Manager starts a session or task, it performs the following sequence of actions: The session is initialized. The repository is opened, integration service is contacted, and a folder is opened. The workflow is opened, and run ID is issued. The mapping is opened. At this point, PowerCenter has all the information it needs to start the session. The parallel pipeline engine is started. The reader starts reading the source data. At this point, the initialization task is completed. The target writer is initialized. If a relational writer is selected, it will be started and the target database will be contacted. If a PWX connecter is used, a control file (YAML file) will be created, and will start. The is a dataloading utility that acts as an interface to the external table parallel-loading feature. The utility reads the source data and creates external tables. The gpload now calls (Greenplum file distribution program) to load the data into the Greenplum database on the least used segment servers, then balances the data loading as evenly between the segment servers as possible. Best practices and tips When you run PowerCenter at the client computer, is invoked. The utility calls up to send rows of records to the segment servers. It is therefore important to have the network set up correctly in order to facilitate data communications between the load server, the integration service server, and the segment servers. For example, the segment servers must be able to reach the client server and the integration server. You can do this through DNS setup or through IP addresses set up in /etc/hosts (or its Windows equivalent). If you are using DCA, then on the DCA master you can use or to set up all segment servers all at once. Use character datatypes (char, vchar) whenever possible. Character datatypes are more efficient to process than other datatypes. For instance, if you are reading an input field of numeric type, and are not using it as a numeric datatype in the target data field, then it is more efficient to use a character datatype field as the target field. For example, if the input field is zip codes (all numerals), but you are really just storing it as is and not really using the field as a numeral, then make the target field a character datatype field. 8

For debugging PWX errors, the session log is your best friend. To get the log, go to the Workflow Monitor. On the task list pane, right-click on the task, and select session log. Look for the entries that say [ERROR]. You will probably see errors such as: Short write error This error seems to be a catch-all. Read further down and solve the next error to remove this error. Unable to connect to the database Check the username and password specified in the ODBC manager, and also in the Connection in the Workflow Manager. Verify that the hostname or the host s IP address is correct, and that you can ping it from all the hosts. No privilege to create external tables The user specified in the Greenplum connection does not have sufficient privileges to create the external tables. Log in to the database and use alter role to grant the user sufficient privileges (for example, the createdb privilege.) Source file cannot be found This is not a PWX error. If the source file is not in the default directory, PowerCenter expects it to be in the SrcFiles directory under server > infa_shared directory of the Informatica installation directory of the integration server. Verify that the source file has been stored there. Unable to create the process Python is not installed, or a wrong version of Python was installed. Additionally, it could be caused by an incorrect environment variable. Ask for verbose logs while you are debugging the workflow sessions. Open the task properties by right-clicking the task icon and select Edit, or just double-click the task icon: On the Properties tab, select the Write Backward Compatible Session Log File checkbox. On the Config Object tab, in the Error handling group, go to line 2: Override tracing. By default, None is selected. Click on the line and select Verbose Data. Each row of data should appear. On the Mapping tab, define the debug level for the source file and the destination table. On the Sources or Targets page, in the Properties group, the Tracing Level line defaults to Normal. You can select from Normal, Verbose Initialization to Verbose Data. If the workflow sessions do not run successfully, try eliminating as a source of the run errors. Run at the system (command) prompt. Create a temporary directory where you put a small representation of your source file and a control (YAML) file, and then run interactively. 9

Use the v (for verbose output) switch or the V (for very verbose output), and follow the log entries for hints of what may have gone wrong. If does not run, then: On Windows systems, it could be that Python is not installed. Install Python (currently Python 2.5.4 is recommended). On Linux systems, Python is automatically installed. Check the environment variables for PATH, GPHOME_LOADERS, and PYTHONPATH. Verify that they contain the necessary paths. If you reinstall a different version of Python, the path to the previous version may still be in the pathname s variable PATH. You should edit the variable and remove the path to the previous version. If you make changes to the environment variable, restart the Informatica services so that the new values are included into the system. On Windows servers, sometimes it is necessary to reboot the PWX client computer for this to take effect. Registration After you have installed the PWX connector, you will need to register the connector plug-in to the repository. You have to start up the Admin Console, and go to the plug-in tab to register it. Failure to do so will cause PWX not to run in Informatica. Also, if the plug-in is not registered, you will not be able to see a Greenplum writer in the Workflow Manager Connections. Conclusion Informatica customers should take advantage of the Greenplum PWX connector when loading their source data into a Greenplum database. Having to bulk load the data enhances the performance of the data significantly. References The following can be found on Powerlink: Greenplum Database 4.1 Administrator Guide (P/N: 300-012-428) Greenplum Database 4.1 Load Tools for Windows (P/N: 300-012-437) 10