Configuring Hadoop Distributed File Service as an Optimized File Archive Store



Similar documents
Using Microsoft Windows Authentication for Microsoft SQL Server Connections in Data Archive

Configuring Informatica Data Vault to Work with Cloudera Hadoop Cluster

Connect to an SSL-Enabled Microsoft SQL Server Database from PowerCenter on UNIX/Linux

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...

Configure an ODBC Connection to SAP HANA

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Data Domain Profiling and Data Masking for Hadoop

How to Configure a Secure Connection to Microsoft SQL Server

NexentaConnect for VMware Virtual SAN

OpenLane 5.3 supports a distributed architecture with either an Oracle 8i SQL database or a Sybase database. Refer to: Oracle Integration on page 2.

Configure Managed File Transfer Endpoints

Configuring TLS Security for Cloudera Manager

Data Domain Discovery in Test Data Management

Running a Workflow on a PowerCenter Grid

Secure Agent Quick Start for Windows

SELF SERVICE RESET PASSWORD MANAGEMENT BACKUP GUIDE

Secure Communication Requirements

How to Resolve the POODLE Vulnerability in Native Connection to Oracle

LT Auditor Windows Assessment SP1 Installation & Configuration Guide

Configuration Guide. Remote Backups How-To Guide. Overview

Configuring an Oracle Business Intelligence Enterprise Edition Resource in Metadata Manager

Improving Performance of Microsoft CRM 3.0 by Using a Dedicated Report Server

Installing RMFT on an MS Cluster

StreamServe Persuasion SP4

Migrating the Domain Configuration Repository During an Upgrade

StarWind iscsi SAN Software: Tape Drives Using StarWind and Symantec Backup Exec

Implementing Microsoft SQL Server 2008 Exercise Guide. Database by Design

White Paper. Installation and Configuration of Fabasoft Folio IMAP Service. Fabasoft Folio 2015 Update Rollup 3

NexentaConnect for VMware Virtual SAN

StarWind iscsi SAN Software: Installing StarWind on Windows Server 2008 R2 Server Core

Sophos Endpoint Security and Control How to deploy through Citrix Receiver 2.0

SOA Software API Gateway Appliance 7.1.x Administration Guide

Configuration of a Load-Balanced and Fail-Over Merak Cluster using Windows Server 2003 Network Load Balancing

What s New. Archive Attender 4 For Microsoft Exchange

Cloudera Navigator Installation and User Guide

inforouter V8.0 Server Migration Guide.

Using Temporary Tables to Improve Performance for SQL Data Services

BEAWebLogic. Portal. WebLogic Portlets for SAP Installation Guide

User Guide. DocAve Lotus Notes Migrator for Microsoft Exchange 1.1. Using the DocAve Notes Migrator for Exchange to Perform a Basic Migration

Initializing SAS Environment Manager Service Architecture Framework for SAS 9.4M2. Last revised September 26, 2014

Database Management Reference

Cloudera Backup and Disaster Recovery

Cloudera ODBC Driver for Apache Hive Version

HDFS to HPCC Connector User's Guide. Boca Raton Documentation Team

Database migration using Wizard, Studio and Commander. Based on migration from Oracle to PostgreSQL (Greenplum)

WhatsUp Gold v16.2 Installation and Configuration Guide

MadCap Software. Upgrading Guide. Pulse

SAS 9.3 Foundation for Microsoft Windows

Using the DataDirect Connect for JDBC Drivers with WebLogic 8.1

Setting up VMware ESXi for 2X VirtualDesktopServer Manual

Application Note VAST Network settings

Plug-In for Informatica Guide

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

MDM Multidomain Edition (Version 9.6.0) For Microsoft SQL Server Performance Tuning

Cloudera Backup and Disaster Recovery

HTTPS Configuration for SAP Connector

IBM WEBSPHERE LOAD BALANCING SUPPORT FOR EMC DOCUMENTUM WDK/WEBTOP IN A CLUSTERED ENVIRONMENT

AXIS 70U - Using Scan-to-File

Installing Microsoft Exchange Integration for LifeSize Control

FileMaker Server 14. FileMaker Server Help

fåíéêåéí=péêîéê=^çãáåáëíê~íçêûë=dìáçé

HP Web Jetadmin Database Connector Plug-in reference manual

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

FileMaker Server 13. FileMaker Server Help

How To Load Data Into An Org Database Cloud Service - Multitenant Edition

Installing and configuring Microsoft Reporting Services

SAS 9.4 In-Database Products

Troubleshooting Active Directory Server

4.0. Offline Folder Wizard. User Guide

Configuring Cisco CallManager IP Phones to Work With IP Phone Agent

Server Installation Guide ZENworks Patch Management 6.4 SP2

Integrating VoltDB with Hadoop

Novell Identity Manager

Entrust Managed Services PKI. Configuring secure LDAP with Domain Controller digital certificates

How To Install Storegrid Server On Linux On A Microsoft Ubuntu 7.5 (Amd64) Or Ubuntu (Amd86) (Amd77) (Orchestra) (For Ubuntu) (Permanent) (Powerpoint

Introduction to the Network Data Management Protocol (NDMP)

Galaxy Software Addendum

CA Workload Automation Agent for Databases

Configuring Notification for Business Glossary

Testing and Restoring the Nasuni Filer in a Disaster Recovery Scenario

Wavecrest Certificate

Creating Connection with Hive

ROUNDTABLE TSMS 11.5 Installation Guide

How To Manage Storage With Novell Storage Manager 3.X For Active Directory

SMS Database System Quick Start. [Version 1.0.3]

Creating IBM Cognos Controller Databases using Microsoft SQL Server

Setting up Citrix XenServer for 2X VirtualDesktopServer Manual

FileMaker Server 11. FileMaker Server Help

Installation Guide Revision 1.0.

CS WinOMS Practice Management Software Server Migration Help Guide

Administering the Web Server (IIS) Role of Windows Server

StarWind iscsi SAN & NAS: Configuring HA File Server on Windows Server 2012 for SMB NAS January 2013

Configure Backup Server for Cisco Unified Communications Manager

DESlock+ Basic Setup Guide ENTERPRISE SERVER ESSENTIAL/STANDARD/PRO

How to troubleshoot database connection issues Doc ID: Applies to apriori versions: 2011R1

Installation and Configuration Guide

FileMaker Server 10 Help

Simba XMLA Provider for Oracle OLAP 2.0. Linux Administration Guide. Simba Technologies Inc. April 23, 2013

Learn More MaaS360 Cloud Extender Checklist (MDM for Blackberry)

DEPLOYING EMC DOCUMENTUM BUSINESS ACTIVITY MONITOR SERVER ON IBM WEBSPHERE APPLICATION SERVER CLUSTER

Transcription:

Configuring Hadoop Distributed File Service as an Optimized File Archive Store 2013 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. All other company and product names may be trade names or trademarks of their respective owners and/or copyrighted materials of such owners.

Abstract This article provides information on how to configure Hadoop Distributed File System (HDFS) as an optimized file archive store in Data Archive. Data Archive uses the libhdfs API to archive and access data in HDFS. Supported Versions Data Archive 6.1.x Table of Contents Overview.... 2 Step 1. Install the libhdfs API Files.... 2 Step 2. Create a Directory in HDFS.... 3 Step 3. Create the Target Connection.... 3 Step 4. Run the Create Archive Folder Job.... 5 Step 5. Copy the Hadoop Connection to Other File Archive Service Configuration Files.... 5 Step 6. Validate the Connection to HDFS.... 5 Overview You can use the Hadoop Distributed File System (HDFS) as an optimized file archive store in Data Archive. To create an optimized file archive in HDFS, complete the following tasks: 1. Install the libhdfs API files. 2. Create a directory in HDFS. 3. Create an optimized file archive target connection. 4. Run the Create Archive Folder job. 5. Copy the connection to other File Archive Service configuration files. 6. Validate the connection to HDFS. Step 1. Install the libhdfs API Files The libhdfs API provides access to files in a Hadoop file system. Data Archive requires the libhdfs API files to access an optimized file archive in HDFS. The Hadoop installation includes the libhdfs API. The File Archive Service requires the following libhdfs files: commons-logging-api-1.0.4.jar hadoop-0.20.2-core.jar libhdfs.so (UNIX) or libhdfs.dll (Windows) To install the libhdfs API, copy the libhdfs files to the machines where the following File Archive Service components are installed: File Archive Service On Windows, copy the files to the root of the File Archive Service directory. 2

On UNIX, copy the files to <File Archive Service Directory>/odbc. File Archive Service agent On Windows or UNIX, copy the files to the root of the File Archive Service agent directory. If the File Archive Service agent is installed on multiple machines, copy the libhdfs API files to all machines that host a File Archive Service agent. File Archive Service plug-in for Data Archive On Windows, copy the files to <Data Archive Directory>\webapp\file_archive. On UNIX, copy the files to <Data Archive Directory>/webapp/file_archive/odbc. After the installation, verify that the CLASSPATH environment variable includes the location of the libhdfs files. Step 2. Create a Directory in HDFS In HDFS, create a directory for the optimized file archive. Step 3. Create the Target Connection In Data Archive, create a target connection to the optimized file archive and set the archive store type to Hadoop HDFS. The following list describes the properties that you need to set for the target connection: Staging Directory Directory in which the file archive loader temporarily stores data as it completes the archive process. Enter the absolute path for the directory. The directory must be accessible to the ILM Server. Number of Rows Per File Maximum number of rows that the file archive loader stores in a file in the optimized file archive. Default is 1 million rows. File Archive Data Directory Directory in which the file archive loader creates the optimized file archive. Enter the absolute path for the directory. You can set up the directory on a local storage or use Network File System (NFS) to connect to a directory on any of the following types of storage devices: Direct-attached storage (DAS) Network-attached storage (NAS) Storage area network (SAN) You can specify a different directory for each optimized file archive target connection. The directory must be accessible to the ILM Server and the File Archive Service. If you select an archive store in the Archive Store Type property, the file archive loader archives data to the archive store, not to the location specified in the File Archive Data Directory property. Instead, the file archive loader uses the file archive data directory as a staging location when it writes data to the archive store. File Archive Folder Name Name of the folder in the optimized file archive in which to store the archived data. The optimized file archive folder corresponds to the database in the archive source. 3

File Archive Host Host name or IP address of the machine that hosts the File Archive Service. File Archive Port Port number used by the ssasql command line program and other clients such as the SQL Worksheet and ODBC applications to connect to the File Archive Service. Default is 8500. File Archive Administration Port Port number used by the File Archive Service agent and the File Archive Administrator tool to connect to the File Archive Service. Default is 8600. File Archive User Name of the administrator user account to connect to the File Archive Service. You can use the default administrator user account created during the File Archive Service installation. The user name for the default administrator user account is dba. File Archive User Password Password for the administrator user account. Confirm Password Verification of the password for the administrator user account. Add-On URL URL for the File Archive Service for External Attachments component. The File Archive Service for External Attachments converts external attachments from the archived format to the source format. Required to restore encrypted attachments from the optimized file archive to the source database. Maintain Imported Schema Name Use schema names from the source data imported through the Enterprise Data Manager. By default, this option is enabled. The file archive loader creates a schema structure in the optimized file archive folder that corresponds to the source schema structure imported through the Enterprise Data Manager. It adds the transactional tables to the schemas within the structure. The file archive loader also creates a dbo schema and adds the metadata tables to the dbo schema. The imported schema structure is based on the data source. If source connections contain similar structures but use different schema names, you must import the source schema structure for each source connection. For example, you import the schema structure from a development instance. You export metadata from the development instance and import the metadata into the production instance. If the schema names are different in development and production, you must import the schema structure from the production instance. You cannot use the schema structure imported from the development instance. If this option is not enabled, the file archive loader creates the dbo schema in the file archive folder. The file archive loader adds all transactional tables for all schemas and all metadata tables to the dbo schema. Archive Store Type Storage platform for the optimized file archive. Select the Hadoop HDFS archive store. HDFS URL Hostname or IP address for the HDFS server. HDFS Port Port number to connect to HDFS. The default HDFS port number is 54310. 4

Command Path to the directory for the optimized file archive in HDFS. Do not include the HDFS prefix or host name. Step 4. Run the Create Archive Folder Job In Data Archive, run the Create Archive Folder job to create the file archive folder and the connection to HDFS. The Create Archive Folder job creates the file archive folder and adds a Hadoop connection entry to the ssa.ini file in the File Archive Service plug-in in Data Archive. The job sets the name of the file archive folder and the name of the Hadoop connection based on the folder name property specified in the target connection. For example, the File Archive Folder Name property in the target connection is set to HDFS_Sales. The Create Archive Folder job creates a file archive folder named HDFS_Sales and adds a Hadoop connection named HDFS_Sales to the ssa.ini file. The following example shows an entry for a Hadoop connection named HDFS_Sales in the ssa.ini file: [HADOOP_CONNECTION HDFS_Sales] URL = 10.17.40.25 PORT = 54310 Step 5. Copy the Hadoop Connection to Other File Archive Service Configuration Files The Hadoop connection definition on the machine that hosts Data Archive must match the Hadoop connection definition on the machines that host other File Archive Service components. Copy the Hadoop connection definition from the ssa.ini file on the machine that hosts Data Archive to the ssa.ini files on the machines that host the File Archive Service and File Archive Service agent. After you run the Create Archive Folder job, go to the File Archive Service plug-in directory in Data Archive and find the ssa.ini file. Copy the Hadoop connection definition to the ssa.ini file on the machine that hosts the File Archive Service. Additionally, if you have installed the File Archive Service agent on another machine, copy the Hadoop connection definition to the ssa.ini file on the machine that hosts the File Archive Service agent. If you have installed the File Archive Service agent on multiple machines, copy the Hadoop connection definition to the ssa.ini file on each machine that hosts a File Archive Service agent. Step 6. Validate the Connection to HDFS Verify the connection to HDFS from the File Archive Service and from the File Archive Service plug-in in Data Archive. Use the File Archive Service ssadrv administration command to validate the connection to the HDFS file archive store. On the machine that hosts the File Archive Service, run the following command: ssadrv a hdfs://<connection Name>/<Path to the file archive folder in HDFS> For example, run the following command: ssadrv a hdfs://hdfs_sales/data/sandqa1/infa_archive In this command, HDFS_Sales is the name of the Hadoop connection defined in the ssa.ini file and data/sandqa1/ infa_archive is the path to the optimized file archive folder named infa_archive in the HDFS file archive store. You can run the same command on the machine that hosts Data Archive to test the connection from Data Archive to HDFS. Author Marissa Johnston Staff Technical Writer 5

Acknowledgements Thanks to Vassiliy Truskov for his help in completing this article. 6