Informatica Intelligent Data Lake (Version 10.1) Installation and Configuration Guide
Informatica Intelligent Data Lake Installation and Configuration Guide Version 10.1 June 2016 Copyright (c) 1993-2016 Informatica LLC. All rights reserved. This software and documentation contain proprietary information of Informatica LLC and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica LLC. This Software may be protected by U.S. and/or international Patents and other Patents Pending. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013 (1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing. Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging, Informatica Master Data Management, and Live Data Map are trademarks or registered trademarks of Informatica LLC in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright Sun Microsystems. All rights reserved. Copyright RSA Security Inc. All Rights Reserved. Copyright Ordinal Technology Corp. All rights reserved. Copyright Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright Meta Integration Technology, Inc. All rights reserved. Copyright Intalio. All rights reserved. Copyright Oracle. All rights reserved. Copyright Adobe Systems Incorporated. All rights reserved. Copyright DataArt, Inc. All rights reserved. Copyright ComponentSource. All rights reserved. Copyright Microsoft Corporation. All rights reserved. Copyright Rogue Wave Software, Inc. All rights reserved. Copyright Teradata Corporation. All rights reserved. Copyright Yahoo! Inc. All rights reserved. Copyright Glyph & Cog, LLC. All rights reserved. Copyright Thinkmap, Inc. All rights reserved. Copyright Clearpace Software Limited. All rights reserved. Copyright Information Builders, Inc. All rights reserved. Copyright OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rights reserved. Copyright International Organization for Standardization 1986. All rights reserved. Copyright ejtechnologies GmbH. All rights reserved. Copyright Jaspersoft Corporation. All rights reserved. Copyright International Business Machines Corporation. All rights reserved. Copyright yworks GmbH. All rights reserved. Copyright Lucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved. Copyright Daniel Veillard. All rights reserved. Copyright Unicode, Inc. Copyright IBM Corp. All rights reserved. Copyright MicroQuill Software Publishing, Inc. All rights reserved. Copyright PassMark Software Pty Ltd. All rights reserved. Copyright LogiXML, Inc. All rights reserved. Copyright 2003-2010 Lorenzi Davide, All rights reserved. Copyright Red Hat, Inc. All rights reserved. Copyright The Board of Trustees of the Leland Stanford Junior University. All rights reserved. Copyright EMC Corporation. All rights reserved. Copyright Flexera Software. All rights reserved. Copyright Jinfonet Software. All rights reserved. Copyright Apple Inc. All rights reserved. Copyright Telerik Inc. All rights reserved. Copyright BEA Systems. All rights reserved. Copyright PDFlib GmbH. All rights reserved. Copyright Orientation in Objects GmbH. All rights reserved. Copyright Tanuki Software, Ltd. All rights reserved. Copyright Ricebridge. All rights reserved. Copyright Sencha, Inc. All rights reserved. Copyright Scalable Systems, Inc. All rights reserved. Copyright jqwidgets. All rights reserved. Copyright Tableau Software, Inc. All rights reserved. Copyright MaxMind, Inc. All Rights Reserved. Copyright TMate Software s.r.o. All rights reserved. Copyright MapR Technologies Inc. All rights reserved. Copyright Amazon Corporate LLC. All rights reserved. Copyright Highsoft. All rights reserved. Copyright Python Software Foundation. All rights reserved. Copyright BeOpen.com. All rights reserved. Copyright CNRI. All rights reserved. This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and/or other software which is licensed under various versions of the Apache License (the "License"). You may obtain a copy of these Licenses at http://www.apache.org/licenses/. Unless required by applicable law or agreed to in writing, software distributed under these Licenses is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the Licenses for the specific language governing permissions and limitations under the Licenses. This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under various versions of the GNU Lesser General Public License Agreement, which may be found at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt University, Copyright ( ) 1993-2006, all rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html. This product includes Curl software which is Copyright 1996-2013, Daniel Stenberg, <daniel@haxx.se>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. The product includes software copyright 2001-2005 ( ) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/ license.html. The product includes software copyright 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://dojotoolkit.org/license. This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html. This product includes software copyright 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http:// www.gnu.org/software/ kawa/software-license.html. This product includes OSSP UUID software which is Copyright 2002 Ralf S. Engelschall, Copyright 2002 The OSSP Project Copyright 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php. This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to terms available at http:/ /www.boost.org/license_1_0.txt. This product includes software copyright 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http:// www.pcre.org/license.txt. This product includes software copyright 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// www.eclipse.org/org/documents/epl-v10.php and at http://www.eclipse.org/org/documents/edl-v10.php.
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?license, http:// www.stlport.org/doc/ license.html, http://asm.ow2.org/license.html, http://www.cryptix.org/license.txt, http://hsqldb.org/web/hsqllicense.html, http:// httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt, http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/ license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/opensourcelicense.html, http://fusesource.com/downloads/licenseagreements/fuse-message-broker-v-5-3- license-agreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://www.jcraft.com/jsch/license.txt; http://jotm.objectweb.org/bsd_license.html;. http://www.w3.org/consortium/legal/ 2002/copyright-software-20021231; http://www.slf4j.org/license.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://www.json.org/license.html; http:// forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/software/tcltk/license.html, http:// www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/iodbc/license; http:// www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.net-snmp.org/about/ license.html; http://www.openmdx.org/#faq; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http:// www.jmock.org/license.html; http://xsom.java.net; http://benalman.com/about/license/; https://github.com/createjs/easeljs/blob/master/src/easeljs/display/bitmap.js; http://www.h2database.com/html/license.html#summary; http://jsoncpp.sourceforge.net/license; http://jdbc.postgresql.org/license.html; http:// protobuf.googlecode.com/svn/trunk/src/google/protobuf/descriptor.proto; https://github.com/rantav/hector/blob/master/license; http://web.mit.edu/kerberos/krb5- current/doc/mitk5license.html; http://jibx.sourceforge.net/jibx-license.html; https://github.com/lyokato/libgeohash/blob/master/license; https://github.com/hjiang/jsonxx/ blob/master/license; https://code.google.com/p/lz4/; https://github.com/jedisct1/libsodium/blob/master/license; http://one-jar.sourceforge.net/index.php? page=documents&file=license; https://github.com/esotericsoftware/kryo/blob/master/license.txt; http://www.scala-lang.org/license.html; https://github.com/tinkerpop/ blueprints/blob/master/license.txt; http://gee.cs.oswego.edu/dl/classes/edu/oswego/cs/dl/util/concurrent/intro.html; https://aws.amazon.com/asl/; https://github.com/ twbs/bootstrap/blob/master/license; https://sourceforge.net/p/xmlunit/code/head/tree/trunk/license.txt; https://github.com/documentcloud/underscore-contrib/blob/ master/license, and https://github.com/apache/hbase/blob/master/license.txt. This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php), the new BSD License (http://opensource.org/ licenses/bsd-3-clause), the MIT License (http://www.opensource.org/licenses/mit-license.php), the Artistic License (http://www.opensource.org/licenses/artisticlicense-1.0) and the Initial Developer s Public License Version 1.0 (http://www.firebirdsql.org/en/initial-developer-s-public-license-version-1-0/). This product includes software copyright 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/. This product includes software Copyright (c) 2013 Frank Balluffi and Markus Moeller. All rights reserved. Permissions and limitations regarding this software are subject to terms of the MIT license. See patents at https://www.informatica.com/legal/patents.html. DISCLAIMER: Informatica LLC provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of noninfringement, merchantability, or use for a particular purpose. Informatica LLC does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice. NOTICES This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software Corporation ("DataDirect") which are subject to the following terms and conditions: 1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. 2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS. Part Number: IDL-INS-1000-0001
Table of Contents Preface.... 6 Informatica Resources.... 6 Informatica Network.... 6 Informatica Knowledge Base.... 6 Informatica Documentation.... 7 Informatica Product Availability Matrixes.... 7 Informatica Velocity.... 7 Informatica Marketplace.... 7 Informatica Global Customer Support.... 7 Chapter 1: Introduction to Intelligent Data Lake Installation.... 8 Intelligent Data Lake Installation Overview.... 8 Live Data Map Installation.... 8 Intelligent Data Lake Installation.... 9 Installation Process.... 9 Chapter 2: Before You Install.... 10 Before You Install Overview.... 10 Read the Release Notes.... 10 Verify the License Key.... 11 Install and Configure Live Data Map.... 11 Before Installation.... 11 During Installation.... 12 After Installation.... 12 Install Big Data Management Packages.... 12 Create HDFS and Hive Connections for the Data Lake.... 13 Verify System Requirements.... 13 Verify Temporary Disk Space Requirements.... 13 Verify Database Requirements.... 14 Verify Services Installation Requirements.... 14 Verify Hardware Requirements.... 14 Setup the Database for Data Preparation Service.... 15 Setup Keystore and Truststore Files.... 15 Chapter 3: Intelligent Data Lake Installation.... 16 Overview of the Intelligent Data Lake Installation.... 16 Installing Intelligent Data Lake in Console Mode.... 16 Intelligent Data Lake Installation in Silent Mode.... 18 Configuring the Silent Install Properties File.... 18 Running the Silent Installer.... 28 4 Table of Contents
Secure the Passwords in the Properties File.... 28 Troubleshooting.... 28 Chapter 4: After You Install Intelligent Data Lake.... 30 After You install Overview.... 30 Create the Application Services.... 30 Install Python... 31 Enable Logging of User Activity Events.... 31 Index.... 33 Table of Contents 5
Preface The Intelligent Data Lake Installation and Configuration Guide contains information about the installation and setup of Intelligent Data Lake. It includes information about the Informatica domain requirements, system requirements, and the installation and configuration process. This guide assumes you have knowledge of databases and Hadoop clusters. This guide also assumes that you are familiar with your enterprise systems and network. Informatica Resources Informatica Network Informatica Network hosts Informatica Global Customer Support, the Informatica Knowledge Base, and other product resources. To access Informatica Network, visit https://network.informatica.com. As a member, you can: Access all of your Informatica resources in one place. Search the Knowledge Base for product resources, including documentation, FAQs, and best practices. View product availability information. Review your support cases. Find your local Informatica User Group Network and collaborate with your peers. As a member, you can: Access all of your Informatica resources in one place. Search the Knowledge Base for product resources, including documentation, FAQs, and best practices. View product availability information. Find your local Informatica User Group Network and collaborate with your peers. Informatica Knowledge Base Use the Informatica Knowledge Base to search Informatica Network for product resources such as documentation, how-to articles, best practices, and PAMs. To access the Knowledge Base, visit https://kb.informatica.com. If you have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base team at KB_Feedback@informatica.com. 6
Informatica Documentation To get the latest documentation for your product, browse the Informatica Knowledge Base at https://kb.informatica.com/_layouts/productdocumentation/page/productdocumentsearch.aspx. If you have questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email at infa_documentation@informatica.com. Informatica Product Availability Matrixes Product Availability Matrixes (PAMs) indicate the versions of operating systems, databases, and other types of data sources and targets that a product release supports. If you are an Informatica Network member, you can access PAMs at https://network.informatica.com/community/informatica-network/product-availability-matrices. Informatica Velocity Informatica Velocity is a collection of tips and best practices developed by Informatica Professional Services. Developed from the real-world experience of hundreds of data management projects, Informatica Velocity represents the collective knowledge of our consultants who have worked with organizations from around the world to plan, develop, deploy, and maintain successful data management solutions. If you are an Informatica Network member, you can access Informatica Velocity resources at http://velocity.informatica.com. If you have questions, comments, or ideas about Informatica Velocity, contact Informatica Professional Services at ips@informatica.com. Informatica Marketplace The Informatica Marketplace is a forum where you can find solutions that augment, extend, or enhance your Informatica implementations. By leveraging any of the hundreds of solutions from Informatica developers and partners, you can improve your productivity and speed up time to implementation on your projects. You can access Informatica Marketplace at https://marketplace.informatica.com. Informatica Global Customer Support You can contact a Global Support Center by telephone or through Online Support on Informatica Network. To find your local Informatica Global Customer Support telephone number, visit the Informatica website at the following link: http://www.informatica.com/us/services-and-training/support-services/global-support-centers. If you are an Informatica Network member, you can use Online Support at http://network.informatica.com. About the Intelligent Data Lake Installation and Configuration Guide 7
C H A P T E R 1 Introduction to Intelligent Data Lake Installation This chapter includes the following topics: Intelligent Data Lake Installation Overview, 8 Live Data Map Installation, 8 Intelligent Data Lake Installation, 9 Installation Process, 9 Intelligent Data Lake Installation Overview Intelligent Data Lake is a data preparation platform that provides a way for analysts and non-technical users to discover, access, analyze, and structure data without IT support. Analysts can use the Intelligent Data Lake interface to search for the data they need and prepare the data for use in their task or project. Intelligent Data Lake requires the Live Data Map and an Informatica domain. Intelligent Data Lake uses Live Data Map to search for data and discover data lineage and relationships. You must install Live Data Map and create and configure the Live Data Map services before you install Intelligent Data Lake. For more information, see the Live Data Map Installation and Configuration Guide. Complete the pre-installation tasks to prepare for the installation. You can install the Intelligent Data Lake services only on a Red Hat Enterprise Linux 6 or higher machine. You can run the installer in console or silent mode. Live Data Map Installation Informatica provides an installer that installs Informatica services and Live Data Map. You must install Live Data Map 2.0 on an external cluster and configure the services before you install Intelligent Data Lake. Note: You must select the external cluster option during Informatica Live Data Map installation. Intelligent Data Lake cannot use an internal Live Data Map Hadoop cluster. The installer installs Live Data Map with the option to create the following services: 1. Data Integration Service 8
2. Model Repository Service 3. Catalog Service 4. Content Management Service must be configured if you want to use the data domain discovery feature. For more information about Live Data Map installation, see the Live Data Map Installation and Configuration Guide. Intelligent Data Lake Installation Before you install Intelligent Data Lake, you must have an Informatica domain with Live Data Map and the following services: Data Integration Service Model Repository Service Catalog Service Use the Intelligent Data Lake installer to install the application. You can install Intelligent Data Lake in the following modes: Console Mode: If you install Intelligent Data Lake in console mode, the installer files are copied to the installation directory. Only the Model Repository Service content is created and enabled. The Data Preparation Service and Intelligent Data Lake services are not created. You must create the Data Preparation Service and Intelligent Data Lake Service after installation using the Administrator tool. Silent Mode: If you install Intelligent Data Lake in silent mode, you can choose to create the following services during installation: - Data Preparation Service - Intelligent Data Lake Service If you do not want to create the services during silent mode installation, you can create these services after installation using the Administrator tool. Note: During Intelligent Data Lake installation, the files are copied to the Live Data Map installation directory. To uninstall Intelligent Data Lake, you must uninstall Live Data Map. For more information, see the Live Data Map Installation and Configuration Guide. Installation Process The installation process for Intelligent Data Lake consists of the following phases: 1. Install and configure Live Data Map 2. Create HDFS and Hive connections for the data lake 3. Verify system requirements 4. Setup the database for Data Preparation Service 5. Setup Keystore and Truststore files 6. Install Intelligent Data Lake 7. Complete any required post-install configuration Intelligent Data Lake Installation 9
C H A P T E R 2 Before You Install This chapter includes the following topics: Before You Install Overview, 10 Read the Release Notes, 10 Verify the License Key, 11 Install and Configure Live Data Map, 11 Install Big Data Management Packages, 12 Create HDFS and Hive Connections for the Data Lake, 13 Verify System Requirements, 13 Setup the Database for Data Preparation Service, 15 Setup Keystore and Truststore Files, 15 Before You Install Overview You can install Intelligent Data Lake with Informatica services on Red Hat Enterprise Linux 6 or higher machine. Before you start the installation on a Red Hat Enterprise Linux 6 or higher machine, set up the machine to meet the requirements to install and run Intelligent Data Lake. If the machine where you install Intelligent Data Lake is not configured correctly, the installation can fail. Read the Release Notes Read the Informatica Release Notes for updates to the installation process. You can also find information about known and fixed limitations for the release. 10
Verify the License Key Before you install the software, verify that you have the license key available. You can get the license key in the following ways: Installation DVD. If you receive the Informatica installation files in a DVD, the license key file is included in the Informatica License Key CD. FTP download. If you download the Informatica installation files from the Informatica Electronic Software Download (ESD) site, the license key is in an email message from Informatica. Copy the license key file to a directory accessible to the user account that installs the product. You can provide the license key for Intelligent Data Lake when you install Live Data Map. You cannot specify the license during Intelligent Data Lake installation. Ensure that you have the following license options enabled: Hive option for the Data Integration Service Live Data Map option for the Catalog Service Data Lake option for the Data Preparation Service and the Intelligent Data Lake Service Contact Informatica Global Customer Support if you do not have a license key or if you have an incremental license key and you want to install Intelligent Data Lake. If you use an incremental license for Intelligent Data Lake, the serial number of the incremental license must match the serial number for an existing license object in the domain. If the serial numbers do not match, the AddLicense command fails. You can get more information about the contents of the license key file used for installation, including serial number, version, expiration date, operating systems, and connectivity options in the installation debug log. Install and Configure Live Data Map Intelligent Data Lake uses Live Data Map to search for data and discover data lineage and relationships. You must install Live Data Map and create and configure the Live Data Map services before you install Intelligent Data Lake. Before Installation Verify the system requirements and prerequisites for Live Data Map installation. The license details for Intelligent Data Lake can be provided when you install Live Data Map. You cannot specify the license during Intelligent Data Lake installation. Ensure that you have the following license options enabled for Intelligent Data Lake: Hive option for the Data Integration Service Live Data Map option for the Catalog Service Data Lake option for the Data Preparation Service and the Intelligent Data Lake Service Verify the License Key 11
During Installation Install Live Data Map 2.0 with external cluster option. You must select the external cluster option during Informatica Live Data Map installation. Internal cluster is not supported for Intelligent Data Lake. For more information, see the Live Data Map Installation and Configuration Guide. You can choose to create the following services during the Live Data Map installation or you can also create them after installation in the following order using the Administrator tool. For more information, see the Live Data Map Administrator Guide. Data Integration Service. Note: If you plan to use the operating system profiles option for the Data Integration Service, ensure that you create and associate a different Data Integration Service for Live Data Map and Intelligent Data Lake. Live Data Map does not support operating system profiles. For more information, see the Intelligent Data Lake Administrator Guide. Model Repository Service Catalog Service Content Management Service must be configured if you want to use the data domain discovery feature. After Installation After you install Live Data Map, complete the following tasks: Create a Hive resource for the data lake. For more information about creating a Hive resource for Intelligent Data Lake, see the Live Data Map Administrator Guide. You must create the Hive resource with the following specific settings for Intelligent Data Lake: - In the General > Connection Properties > Url field: Ensure that you use the Fully Qualified Domain Name (FQDN) of the Hive server in the JDBC connection URL used to access the Hive server. - In the Metadata Load Settings > Additional Properties > Schema field: By default, all the schemas available in the Hadoop cluster will be selected. You must select only the schemas required for Intelligent Data Lake. - If you are using operating system profiles, the Hive user name you specify in the Hive resource must be an HDFS superuser. For more information about operating system profiles, see the Intelligent Data Lake Administrator Guide. To extract metadata from the Hive sources, you need to import the relevant connectors, modify the scannerdeployer.xml file, and then restart the Catalog Service. For more information, see How To Configure Scanner Deployer for Hive in Live Data Map. If you are using operating system profiles, create a new Data Integration Service for Intelligent Data Lake using the Administrator tool. Install Big Data Management Packages To install Big Data Management on Cloudera or HortonWorks, the tar.gz file includes the Big Data Management packages and binary files that you need. Intelligent Data Lake uses the pushdown mode to run mappings for data upload and publish. For running the mappings in pushdown mode, you must install Big Data Management packages on all nodes of the Hadoop cluster. For more information, see the Big Data Management Installation and Configuration Guide. 12 Chapter 2: Before You Install
Create HDFS and Hive Connections for the Data Lake If you are installing Intelligent Data Lake in console mode, you must use the Big Data Management Configuration Utility to create the HDFS and Hive connections for Intelligent Data Lake. This step can be skipped if you are installing Intelligent Data Lake in silent mode. If you are configuring for high availability, run the Big Data Management Utility on each node where you install the Data Preparation Service and the Intelligent Data Lake Service. The Big Data Management Configuration Utility will update the Hadoop configuration files. For more information, see the Big Data Management Installation and Configuration Guide.You must create the HDFS and Hive connections with the following specific requirements for Intelligent Data Lake: For the Hadoop distribution version, select either Cloudera CDH or Hortonworks HDP. - If you select Cloudera CDH, select Cloudera Manager to access files on the Hadoop cluster. - If you select Hortonworks HDP, select Apache Ambari to access files on the Hadoop cluster. Select the Hive on MapReduce option for running the mappings. In the Cluster Configuration Connection type selection panel: select No to use the default Hive command Line Interface to run mappings. Hive connection: - Connection Details Panel: for the Metastore Execution Mode, select remote. HDFS connection: - Connection Details Panel: in the HDFS username field, enter the user name with superuser privileges. The user must have access to all the Intelligent Data Lake schemas. Verify System Requirements Verify that your planned setup meets the minimum system requirements for the Intelligent Data Lake installation process, temporary disk space, databases, and application service hardware. For more information about product requirements and supported platforms, see the Product Availability Matrix on the Informatica Customer Portal: https://mysupport.informatica.com/community/my-support/productavailability-matrices. Verify Temporary Disk Space Requirements The installer writes temporary files to the hard disk. Verify that you have enough available disk space on the machine to support the installation. When the installation completes, the installer deletes the temporary files and releases the disk space. The installer requires 2 GB of temporary disk space. Create HDFS and Hive Connections for the Data Lake 13
Verify Database Requirements Verify that the database server has adequate disk space for the databases required by the Data Preparation Service. The following table describes the database requirements for the data preparation repository database: Database Data Preparation Repository Database Requirements Create the Data Preparation Repository database on a MySQL database. Allow 5 GB of disk space for the database. Allocate more space based on the amount of metadata you want to store. Verify Services Installation Requirements Verify that your machine meets the minimum system requirements to install the Intelligent Data Lake services. The following table lists the processor, minimum memory and disk space required to install the Intelligent Data Lake services. Services RAM Disk Space Disk space for local storage Data Preparation Service Approximately 512 MB per user and 4 GB for the server. Allocate more space as required. 60 GB 10 GB Verify Hardware Requirements Verify that your machine meets the minimum hardware requirements to install the Intelligent Data Lake services. The following table lists the hardware required to install the Intelligent Data Lake services. Services Data Preparation Service Processor 2 CPUs with a minimum of 4 cores Recommended: 2 CPUs with 8 cores 14 Chapter 2: Before You Install
Setup the Database for Data Preparation Service Set up the MySQL server database that the Data Preparation Service connects to. The MySQL database must meet the following requirements: You can use MySQL version 5.6.26 or higher. - For MySQL version 5.6.26 and higher, set lower_case_table_names=1. - For MySQL version 5.7 and higher, set explicit_defaults_for_timestamp=1. The database must have the following permissions: - Create tables and views. - Drop tables and views. - Insert, update and delete data. Setup Keystore and Truststore Files When you install Live Data Map, you can configure secure communication for the domain and specify the location of the keystore files for the security certificates. The domain must have keystore and truststore files named infa_keystore and infa_truststore in PEM and JKS format. For more information see the Live Data Map Installation and Configuration Guide. If the domain is secure, you must secure the services that you create in Intelligent Data Lake. The following services in the domain and the YARN application must share the same common truststore file: - Data Integration Service - Model Repository Service - Catalog Service - Data Preparation Service - Intelligent Data Lake Service The Data Preparation Service and Intelligent Data Lake Service must also share the same keystore file. You can use different keystore files for the Data Integration Service, Model Repository Service, and Catalog Service. If you use different keystore files, you must add all the keystore files to the common truststore file. Setup the Database for Data Preparation Service 15
C H A P T E R 3 Intelligent Data Lake Installation This chapter includes the following topics: Overview of the Intelligent Data Lake Installation, 16 Installing Intelligent Data Lake in Console Mode, 16 Intelligent Data Lake Installation in Silent Mode, 18 Troubleshooting, 28 Overview of the Intelligent Data Lake Installation You can install the Intelligent Data Lake application on a Red Hat Enterprise Linux 6 or higher machine in console or silent mode. Complete the pre-installation tasks to prepare for the installation. You can install the Intelligent Data Lake services on multiple machines. You must configure and enable the Data Preparation Service before you create the Intelligent Data Lake Service. If you install Intelligent Data Lake in console mode, the services are not created by the installer. You must create the services after installation using the Administrator tool. If you install Intelligent Data Lake in silent mode, you can choose to create the services during installation or you can create them after the silent installation is complete. Note: It is recommended to use the silent mode installation for Intelligent Data Lake. Installing Intelligent Data Lake in Console Mode You can install the Intelligent Data Lake services in console mode on Red Hat Enterprise Linux 6 or higher. When you run the installer in console mode, the words Quit and Back are reserved words. Do not use them as input text. 1. Log in to the machine with a system user account. 2. Close all other applications. 3. On a shell command line, run the install.sh file from the directory where you have extracted the installer files. The installer displays the message for documentation and copyright information. 16
4. Press Enter to continue. The installer displays the message about the prerequisites and pre-installation tasks. 5. If the prerequisites or pre-installation tasks are not completed, press n to exit the installer and set them as required. If the prerequisites and pre-installation tasks are completed, press y to continue. 6. Press Enter to continue. 7. Type the path for the Live Data Map installation directory. The directory names in the path must not contain spaces or the following special characters: @ * $ #! % ( ) { } [ ], ; '. Intelligent Data Lake and Live Data Map must be installed in the same directory. 8. Review the pre-installation summary, and press Enter to continue. The installer copies the Intelligent Data Lake files to the installation directory. 9. To install Intelligent Data Lake services on the master gateway node, press 2. To install Informatica Intelligent Data Lake services on any other node, press 1. 10. Specify the domain and node details. The following describes the domain and node parameters you must set: Property Domain Name Node Name Domain User Name Domain User Password Description Name of the domain created during Live Data Map installation. Name of the node where you want to install Intelligent Data Lake. If you are installing Intelligent Data Lake for the first time, you must install it on the master gateway node. The subsequent installation can be on any node. User name for the domain administrator. Password for the domain administrator. The password must be more than 2 characters and must not exceed 16 characters. 11. Specify the details for the services associated with Intelligent Data Lake. The following table describes the service parameters you must set: Property Model Repository Service Name Description Name of the Model Repository Service associated with the Intelligent Data Lake Service. Note: You must enter the name of the Model Repository Service configured for Live Data Map. If you do not enter the correct name of the Model Repository Service, the Model repository content will not be updated and the Model Repository Service cannot be enabled. The installer will display an error message with the Ok and Continue options. You must press Continue to exit the installer and complete the steps described in Troubleshooting on page 28. The Post-installation Summary indicates whether the installation completed successfully. It also shows the status of the installed components and their configuration. You can view the installation log files to get more information about the tasks performed by the installer and to view configuration properties for the installed components. Installing Intelligent Data Lake in Console Mode 17
Intelligent Data Lake Installation in Silent Mode To install the Intelligent Data Lake services without user interaction, install in silent mode. Use a properties file to specify the installation options. The installer reads the file to determine the installation options. You can use silent mode installation to install Intelligent Data Lake on multiple machines on the network or to standardize the installation across machines. Copy the Intelligent Data Lake installation files to the hard disk on the machine where you plan to install Intelligent Data Lake. If you install on a remote machine, verify that you can access and create files on the remote machine. To install in silent mode, complete the following tasks: 1. Configure the installation properties file and specify the installation options in the properties file. 2. Run the installer with the installation properties file. 3. Secure the passwords in the installation properties file. Configuring the Silent Install Properties File Informatica provides a sample properties file that includes the parameters that are required by the Intelligent Data Lake installer. You can customize the sample properties file to specify the options for your installation. Then, run the silent installation. The sample SilentInput.properties file is stored in the root directory of the DVD or the installer download location. After you customize the file, save the file again with the file name SilentInput.properties. 1. Go to the root of the directory that contains the installation files. 2. Locate the sample SilentInput.properties file. 3. Create a backup copy of the SilentInput.properties file. 4. Use a text editor to open the file and modify the values of the installation parameters. The following table describes the installation parameters that you can modify: Property Name USER_INSTALL_DIR DOMAIN_USER DOMAIN_PSSWD Description Directory in which to install Intelligent Data Lake. Intelligent Data Lake and Live Data Map must be installed in the same directory. Set USER_INSTALL_DIR to the <Location of the Live Data Map installation directory> and ensure that the directory has write permissions. Default is: home/informatica/10.1.0. User name for the domain administrator. - The name is not case sensitive and cannot exceed 128 characters. - The name cannot include a tab, newline character, or the following special characters: % * + \ / '.? ; < > - The name can include an ASCII space character except for the first and last character. Other space characters are not allowed. Password for the domain administrator. The password must be more than 2 characters but cannot exceed 16 characters. 18 Chapter 3: Intelligent Data Lake Installation
Property Name FIRST_GATEWAY_NODE CREATE_LAKE_SERVICES DATA_PREP_SERVICE DATA_LAKE_SERVICE BOTH_LAKEANDPREP_SERVICE Description Indicates whether the installation is on the master gateway node. If the value is 1, the services are installed on the master gateway node of the Live Data Map domain. If the value is 0, the services are installed on any other node. If the machine on which you are installing Intelligent Data Lake has 16 GB of RAM or less, Informatica recommends that you create the Data Preparation Service and the Intelligent Data Lake Service on different nodes. If you create the services on different nodes, you must create the Data Preparation Service on the master gateway node. You can create the Intelligent Data Lake Service on any other node. Default is 1. Enables creation of the Data Preparation Service and Intelligent Data Lake during installation. Set the value to 1 to enable service creation during installation. If the value is 0, the Data Preparation Service and the Intelligent Data Lake Service are not created during installation and you must create the services from the Administrator tool. Default is 1. Enables creation of the Data Preparation Service during installation. Set the value to 1 to enable the Data Preparation Service creation during installation. If the value is 0, the Data Preparation Service is not created during installation and you must be create the service from the Administrator tool. Default is 0. Enables creation of the Intelligent Data Lake Service during installation. The Intelligent Data Lake Service must be associated with a Data Preparation Service. Set the value to 1 to enable the Intelligent Data Lake Service creation during installation. If the value is 0, the Intelligent Data Lake Service is not created during installation and you must be create the service from the Administrator tool. Default is 0. Enables creation of the Data Preparation Service and Intelligent Data Lake during installation. Set the value to 1 to enable service creation during installation. If the value is 0, the Data Preparation Service and the Intelligent Data Lake Service are not created during installation and you must create the services from the Administrator tool. Default is 1. Intelligent Data Lake Installation in Silent Mode 19
Property Name CREATE_CONNECTION CLOUDERA_SELECTION CLOUDERA_HOSTNAME CLOUDERA_USER_NAME CLOUDERA_USER_PASSWD CLOUDERA_PORT HORTONWORKS_SELECTION Description Indicates whether the installer creates the HDFS connection and Hive connection for the Hadoop cluster. If the value is 1, the HDFS and Hive connections are created during installation. Select this option if you want to configure Hadoop configuration files and create Hive and HDFS connections for the Data Integration Service or Data Preparation Service on this node. If you are configuring a high availability Hadoop cluster, you must update the Hadoop configuration files (for example: core-site.xml and hive-site.xml) on all nodes where the Data Preparation Service is running. If the value is 0, the HDFS and Hive connections are not created during installation and you must create the connections from the Administrator tool. Note: You must create the create the HDFS and Hive connections from the Administrator tool and set the values for the HDFS_CONNECTION_NAME and HIVE_CONNECTION_NAME properties. Default is 1. Hadoop distribution for the data lake. To use Cloudera as the Hadoop distribution, set CLOUDERA_SELECTION to 1. To use HortonWorks as the Hadoop distribution, set CLOUDERA_SELECTION to 0. Default is 1. Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1. Host where the Cloudera Manager runs. Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1. User name for the Cloudera Manager. Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1. Password for the Cloudera Manager user. Required if CREATE_CONNECTION=1 and the CLOUDERA_SELECTION=1. Port for the Cloudera Manager. Hadoop distribution for the data lake. To use HortonWorks as the Hadoop distribution, set HORTONWORKS_SELECTION to 1. To use Cloudera as the Hadoop distribution, set HORTONWORKS_SELECTION to 0. Default is 0. 20 Chapter 3: Intelligent Data Lake Installation
Property Name AMBARI_HOSTNAME AMBARI_USER_NAME AMBARI_USER_PASSWD AMBARI_PORT UPDATE_DIS CLUSTER_INSTALLATION_DIR DIS_SERVICE_NAME_CONNECTION SAMPLE_HIVE_CONNECTION HIVE_IMPERSONATION_USER HDFS_USER_NAME SAMPLE_HDFS_CONNECTION Description Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1. Host where Apache Ambari server runs. Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1. User name for the Apache Ambari server. Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1. Password for the Apache Ambari server user. Required if CREATE_CONNECTION=1 and the HORTONWORKS_SELECTION=1. Web port for the Apache Ambari server. Required if CREATE_CONNECTION=1. The Data Integration Service must be present in the Informatica domain. If the cluster is enabled for Kerberos authentication, then copy the krb5.conf file to the{infa_home}/services/ shared/security folder on the machine where the Data Integration Service is configured. If the value is 1, the Data Integration Service properties are updated and the service is automatically restarted. If CREATE_CONNECTION=1 and the connection already exists, the Data Integration Service will not be updated. Default is 0. Required if UPDATE_DIS=1. Indicates the Red Hat Package Manager (RPM) installation directory for the Hadoop cluster. Default is /opt/informatica. Name of the Data Integration Service associated with Live Data Map. Required if CREATE_CONNECTION=1. Name of the Hive connection. If you do not set the SAMPLE_HIVE_CONNECTION property, the installer uses the default name "HIVE". Required if CREATE_CONNECTION=1. Name of the Hadoop Impersonation user used by Intelligent Data Lake. This user must exist in the Hadoop Cluster. User name with permissions to access the HBase database. Required if CREATE_CONNECTION=1. Name of the HDFS connection. If you do not set the SAMPLE_HDFS_CONNECTION property, the installer uses the default name "HDFS". Intelligent Data Lake Installation in Silent Mode 21
Property Name HIVESERVER2_PRINCIPAL KERBEROS_PRINCIPAL_NAME KERBEROS_KEYTAB CATALOGUE_SERVICE_NAME MRS_SERVICE_NAME DIS_SERVICE_NAME ENABLE_CMS_SERVICE CMS_SERVICE_NAME DPS_DB_HOST DPS_DB_USER Description Required if CREATE_CONNECTION=1 and the Hadoop cluster for the Data Integration Service uses Kerberos authentication. Required if UPDATE_DIS=1 and the Hadoop cluster uses Kerberos authentication. Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format: user@realm. Required if UPDATE_DIS=1 and the Hadoop cluster uses Kerberos authentication. File name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. Name of the Catalog Service associated with the Intelligent Data Lake Service. Default is Catalog_Service. Name of the Model Repository Service associated with the Intelligent Data Lake Service. Default is Model_Repository_Service. Name of the Data Integration Service associated with the Intelligent Data Lake Service. If you plan to use the Operating System profiles option for the Data Integration Service, ensure that you create and associate a different Data Integration Service for Live Data Map and Intelligent Data Lake. Live Data Map does not support Operating System profiles. Default is Data_Integration_Service. Required if you want to use the data domain discovery feature in Live Data Map. Name of the Content Management Service. If the value is true, the Content Management Service is enabled. If the value is false, the Content Management Service is disabled. Default is false. Name of the Content Management Service associated with the Intelligent Data Lake Service. Required if DATA_PREP_SERVICE=1 or Host name of the machine that hosts the Data Preparation repository database. Required if DATA_PREP_SERVICE=1 or Database user account to use to connect to the Data Preparation repository. 22 Chapter 3: Intelligent Data Lake Installation
Property Name DPS_DB_PSSWD DPS_DB_PORT DPS_DB_SCHEMA DPS_SERVICE_NAME ENABLE_DPS_SERVICE DPS_NODE_NAME DPS_LICENSE_SERVICE_NAME DPS_PROTOCOL_TYPE DPS_HTTP_PORT DPS_HTTPS_PORT Description Required if DATA_PREP_SERVICE=1 or Password for the Data Preparation repository database user account. Required if DATA_PREP_SERVICE=1 or Port number for the database. Default is 3306. Required if DATA_PREP_SERVICE=1 or Schema or database name of the Data Preparation repository database. Required if DATA_PREP_SERVICE=1 or Name of the Data Preparation Service associated with the Intelligent Data Lake Service. Default is Data_Preparation_Service. Required if DATA_PREP_SERVICE=1 or If the value is true, the Data Preparation Service is enabled immediately after creation. If the value is false, you must enable the Data Preparation Service from the Administrator tool. Default is false. Required if DATA_PREP_SERVICE=1 or Name of the node where you want to run the Data Preparation Service. Required if DATA_PREP_SERVICE=1 or License object with the data lake option that allows use of the Data Preparation Service. Required if DATA_PREP_SERVICE=1 or To enable secure communication for the Data Preparation Service, set DPS_PROTOCOL_TYPE to https. To disable secure communication for the Data Preparation Service, set DPS_PROTOCOL_TYPE to http. Required if DPS_PROTOCOL_TYPE=http. If DPS_PROTOCOL_TYPE=https, ensure this field is blank. HTTP port number. Required if DPS_PROTOCOL_TYPE=https. If DPS_PROTOCOL_TYPE=http, ensure this field is blank. HTTPS port number. Intelligent Data Lake Installation in Silent Mode 23
Property Name DPS_CUSTOM_HTTPS_ENABLED DPS_KEYSTORE_DIR DPS_KEYSTORE_PSSWD DPS_TRUSTSTORE_DIR DPS_TRUSTSTORE_PSSWD HDFS_LOCATION LOCAL_STORAGE_DIR SOLR_PORT AUTH_MODE HADOOP_IMPERSONATION_USER HDFS_PRINCIPAL_NAME KERBEROS_KEYTAB_FILE Description Required if DATA_PREP_SERVICE=1 or Indicates whether the Data Preparation Service uses the default or custom SSL certificate files. To use the default Informatica SSL certificate files, set DPS_CUSTOM_HTTPS_ENABLED to false. To use the custom SSL certificate files, set DPS_CUSTOM_HTTPS_ENABLED to true. Default is false. Required if DPS_CUSTOM_HTTPS_ENABLED=true. Path and the file name of keystore file that contains key and certificates required for the HTTPS communication. Required if DPS_CUSTOM_HTTPS_ENABLED is set to true. Required if DPS_CUSTOM_HTTPS_ENABLED=true. Password for the keystore file. Required if DPS_CUSTOM_HTTPS_ENABLED=true. Path and the file name of truststore file that contains authentication certificates for the HTTPS connection. Required if DPS_CUSTOM_HTTPS_ENABLED=true. Password for the truststore file. HDFS location for data preparation file storage. If the connection to the local storage fails, the Data Preparation Service recovers data preparation files from the HDFS location. Directory for data preparation file storage on the node on which the Data Preparation Service runs. Port number for the Apache Solr server used to provide data preparation recommendations. Default is 8983. Set the Hadoop Authentication Mode to NonSecure or Kerberos. Required if AUTH_MODE=Kerberos. User name to use in Hadoop impersonation as set in core-site.xml. Required if AUTH_MODE=Kerberos. Service Principal Name (SPN) for the data preparation Hadoop cluster. Specify the service principal name in the following format: user/_host@realm. Required if AUTH_MODE=Kerberos. Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Data Preparation Service runs. 24 Chapter 3: Intelligent Data Lake Installation
Property Name HADOOP_DISTRIBUTION HDFS_CONNECTION_NAME IDL_SERVICE_NAME IDL_NODE_NAME IDL_DPS_SERVICE_NAME IDL_LICENSE_SERVICE_NAME ENABLE_IDL_SERVICE IDL_PROTOCOL_TYPE IDL_HTTP_PORT Description Required if AUTH_MODE=Kerberos. Set HADOOP_DISTRIBUTION=Cloudera or HADOOP_DISTRIBUTION=HortonWorks to select the Hadoop distribution that you want to configure. Default is Cloudera. Required if AUTH_MODE=Kerberos. HDFS connection for data preparation file storage. Required if DATA_LAKE_SERVICE or Intelligent Data Lake Service Name. Default is Intelligent_Data_Lake_Service. Required if DATA_LAKE_SERVICE or Name of the node where you want to run the Intelligent Data Lake Service. Required if DATA_LAKE_SERVICE or Name of the Data Preparation Service. Set the IDL_DPS_SERVICE_NAME property to the name of the Data Preparation Service to associate with the Intelligent Data Lake Service specified in the IDL_SERVICE_NAME property. Required if DATA_LAKE_SERVICE or License object with the data lake option that allows use of the Intelligent Data Lake Service. Required if DATA_LAKE_SERVICE or If the value is true, the Intelligent Data Lake Service is enabled immediately after creation. If the value is false, you must enable the Intelligent Data Lake Service from the Administrator tool. Default is false. Required if DATA_LAKE_SERVICE=1 or Indicates whether the Intelligent Data Lake Services uses secure communication. To enable secure communication for the Intelligent Data Lake Service, set IDL_PROTOCOL_TYPE to https. To disable secure communication for the Data Preparation Service, set IDL_PROTOCOL_TYPE to http. Required if IDL_PROTOCOL_TYPE=http. If IDL_PROTOCOL_TYPE=https, ensure this field is blank. HTTP port number. Intelligent Data Lake Installation in Silent Mode 25
Property Name IDL_HTPPS_PORT IDL_CUSTOM_HTTPS_ENABLED IDL_KEYSTORE_DIR IDL_KEYSTORE_PSSWD IDL_TRUSTSTORE_DIR IDL_TRUSTSTORE_PSSWD LAKE_RESOURCE_NAME HDFS_SYSTEM_DIR HADOOP_DIISTRIBUTION_DIR HIVE_CONNECTION_NAME HIVE_LOCALSTORAGE_FORMAT ENABLE_AUDIT_OPTIONS Description Required if IDL_PROTOCOL_TYPE=https. If IDL_PROTOCOL_TYPE=http, ensure this field is blank. HTTPS port number. Required if IDL_CUSTOM_HTTPS_ENABLED=true. Indicates whether the Intelligent Data Lake Service uses the default or custom SSL certificate files. To use the default Informatica SSL certificate files, set IDL_CUSTOM_HTTPS_ENABLED to false. To use the custom SSL certificate files, set IDL_CUSTOM_HTTPS_ENABLED to true. Default is false. Required if IDL_CUSTOM_HTTPS_ENABLED=true. Path and the file name of keystore file that contains key and certificates required for the HTTPS communication. Required if IDL_CUSTOM_HTTPS_ENABLED=true. Password for the keystore file. Required if IDL_CUSTOM_HTTPS_ENABLED=true. Path and the file name of truststore file that contains authentication certificates for the HTTPS connection. Required if IDL_CUSTOM_HTTPS_ENABLED=true. Password for the truststore file. Hive resource for the data lake. You configure the resource in Live Data Map Administrator. HDFS directory where the Intelligent Data Lake Service copies temporary data and files necessary for the service to run. Directory that contains Hadoop distribution files on the machine where Intelligent Data Lake Service runs. The directory must be within the Informatica directory. The default directory is <Informatica installation directory>/services/ shared/hadoop/<hadoop distribution name>. Hive connection for the data lake. If CREATE_CONNECTION=1, you must enter the same value as the SAMPLE_HIVE_CONNECTION value for this field. If CREATE_CONNECTION=0, you must create a Hive connection and enter the value for this field. Data storage format for the Hive tables. Values are DefaultFormat, Parquet, ORC. Default is DefaultFormat. Indicates whether you can log user activity events. To enable event logging, set ENABLE_AUDIT_OPTIONS to true. Default is false. 26 Chapter 3: Intelligent Data Lake Installation
Property Name ZOOKEEPER_QUORUM ZOOKEEPER_CLIENT_PORT ZOOKEEPER_PARENT_ZNODE SECURITY_MODE HDFS_KERBEROS_PRINCIPAL IDL_KERBEROS_PRINCIPAL IDL_KERBEROS_KEYTAB_FILE HBASE_MASTER_PRINCIPAL HBASE_REGION_PRINCIPAL Description Required if ENABLE_AUDIT_OPTIONS=true. List of host names and port numbers of the ZooKeeper Quorum used to log events. Specify the host names as comma-separated key value pairs. For example: <hostname1>,<hostname2>. Required if ENABLE_AUDIT_OPTIONS=true. Port number on which the ZooKeeper server listens for client connections. Default value is 2181. Required if ENABLE_AUDIT_OPTIONS=true. Name of the ZooKeeper znode where the Intelligent Data Lake configuration details are stored. Indicates whether the security mode is NonSecure or Kerberos. Set the SECURITY_MODE to NonSecure or Kerberos. Default is NonSecure. Required if SECURITY_MODE=Kerberos. Service principal name (SPN) of the data lake Hadoop cluster. Required if SECURITY_MODE=Kerberos. Service principal name (SPN) of the user account to impersonate when connecting to the data lake Hadoop cluster. The user account for impersonation must be set in the <Informatica installation directory>/services/shared/hadoop/<hadoop distribution name>/conf/core-site.xml file. Required if SECURITY_MODE=Kerberos. Path and file name of the SPN keytab file for the user account to impersonate when connecting to the Hadoop cluster. The keytab file must be in a directory on the machine where the Intelligent Data Lake Service runs. Required if SECURITY_MODE=Kerberos. Service principal name (SPN) of the HBase Master Service. Use the hbase.master.kerberos.principal key value set in this file: /etc/ hbase/conf/hbase-site.xml. You must replace the _HOST parameter with the actual host name of the server where HBase Master is running. Required if SECURITY_MODE=Kerberos. Service principal name (SPN) of the HBase Region Server service. Use the hbase.regionserver.kerberos.principal key value set in this file: /etc/hbase/conf/hbase-site.xml. You must replace the _HOST parameter with the actual host name of the server where HBase Region Server is running. Intelligent Data Lake Installation in Silent Mode 27
Property Name HBASE_USER HBASE_SCHEMA Description Required if ENABLE_AUDIT_OPTIONS= true and SECURITY_MODE=Kerberos. User name with permissions to access the HBase database. Required if ENABLE_AUDIT_OPTIONS= true. Namespace for the HBase tables. The default value is default. 5. Save the properties file with the name SilentInput.properties. Running the Silent Installer After you configure the properties file, open a command prompt to start the silent installation. 1. Open a Red Hat Enterprise Linux 6 or higher shell. 2. Go to the root of the directory that contains the installation files. 3. Verify that the directory contains the file SilentInput.properties that you edited and resaved. 4. Run silentinstall.sh to start the silent installation. The silent installer runs in the background. The process can take a while. The silent installation is complete when the Informatica_<Version>_Services_InstallLog.log file is created in the installation directory. The silent installation fails if you incorrectly configure the properties file or if the installation directory is not accessible. View the installation log files and correct the errors. Then, run the silent installation again. Secure the Passwords in the Properties File After you run the silent installer, ensure that passwords in the properties file are kept secure. When you configure the properties file for a silent installation, you enter passwords in plain text. After you run the silent installer, use one of the following methods to secure the passwords: Remove the passwords from the properties file. Delete the properties file. Store the properties file in a secure location. Troubleshooting For Intelligent Data Lake installation in console mode, you must enter the name of the Model Repository Service configured for Live Data Map. If you do not enter the correct name of the Model Repository Service, the Model repository content will not be updated and the Model Repository Service cannot be enabled. The installer will display an error message with the Ok and Continue options. You must press Continue to exit the installer and complete the following tasks using the Administrator tool or command line options: Administrator tool: To upgrade the Model Repository Service, select the service in the Navigator, and then click Actions > Repository Contents > Upgrade. 28 Chapter 3: Intelligent Data Lake Installation
To enable the Model Repository Service, select the service in the Navigator, and then click Actions > Enable Service. Command Line: To upgrade the Model Repository Service: <LDM Installation>/isp/bin/infacmd.sh mrs upgradecontents <Domain Name> -un < domain Username> -pd <Domain Password> -sn < Valid MRS Name>. To enable the Model Repository Service: <LDM Installation>/isp/bin/infacmd.sh enableservice - dn <Domain Name> -un < domain Username> -pd <Domain Password> -sn < Valid MRS Name>. Troubleshooting 29
C H A P T E R 4 After You Install Intelligent Data Lake This chapter includes the following topics: After You install Overview, 30 Create the Application Services, 30 Install Python, 31 Enable Logging of User Activity Events, 31 After You install Overview After you install Intelligent Data Lake, complete the post-installation tasks. Create the Application Services Use the Informatica Administrator tool to create the Intelligent Data Lake services. This task is required if you have installed Intelligent Data Lake in console mode or if you have installed Intelligent Data Lake in silent mode without creating the services during installation. Ensure that you complete all the prerequisite tasks before you create the Intelligent Data Lake services. You must create the Intelligent Data Lake Services in the following order: Data Preparation Service Intelligent Data Lake Service Note: If you plan to use the operating system profiles option for the Data Integration Service, ensure that you create and associate a different Data Integration Service for Live Data Map and Intelligent Data Lake. Live Data Map does not support operating system profiles. For more information, see the Intelligent Data Lake Administrator Guide. 30
Install Python Intelligent Data Lake uses the Apache Solr indexing capabilities to provide recommendations of related data assets. Apache Solr requires Python modules. You must install Python with the following modules in the node where the Data Preparation Service is configured: argparse sys getopt os urllib httplib2 ConfigParser Enable Logging of User Activity Events You can audit user activity on the data lake by viewing the events that the Intelligent Data Lake Service writes to HBase. The events include user activity in the Intelligent Data Lake application, such as when a user creates a project, adds data assets to a project, or publishes prepared data. Ensure that you have created a HBase instance in the Hadoop cluster where the data lake is configured. Follow these steps to enable logging of user activity events: 1. Log in to the Administrator tool. 2. In the Domain Navigator, select the Intelligent Data Lake Service. If the Hadoop cluster uses Kerberos authentication, complete step 3. If the Hadoop cluster does not use Kerberos authentication, skip to step 4. 3. Edit the data lake security options. In the Edit Data Lake Security Options window, enter the following details: Property HBase Master Service Principal Name HBase RegionServer Service Principal Name HBase User Name Description Service principal name (SPN) of the HBase Master Service. Use the value set in this file: /etc/hbase/conf/hbase-site.xml. Service principal name (SPN) of the HBase Region Server service. Use the value set in this file: /etc/hbase/conf/hbase-site.xml. User name with Create, Read, and Write permissions to access the HBase database. Install Python 31
4. Edit the event logging options. In the Edit Event Logging Options window, enter the following details: Property Log User Activity Events HBase ZooKeeper Quorum HBase ZooKeeper Client Port ZooKeeper Parent ZNode HBase Namespace Description Indicates whether the Intelligent Data Lake service logs the user activity events for auditing. The user activity logs are stored in an HBase instance. List of host names and port numbers of the ZooKeeper Quorum used to log events. Specify the host names as comma-separated values. For example: <hostname1>,<hostname2>. Port number on which the ZooKeeper server listens for client connections. Default value is 2181. Name of the ZooKeeper znode where the Intelligent Data Lake configuration details are stored. Namespace for the HBase tables. 5. Click OK. You must restart the service for the properties to take effect. 32 Chapter 4: After You Install Intelligent Data Lake
I n d e x A application services installation requirements 14 C console mode installing Intelligent Data Lake 16 D database requirements installation requirements 14 disk space requirements installation requirements 13 domain configuration repository requirements 14 I installation process 9 Installation prerequisites 10 installation requirements application service requirements 14 database requirements 14 disk space 13 Intelligent Data Lake installing in console mode 16 Intelligent Data Lake services installing in silent mode 18 L license keys verifying 11 M minimum system requirements nodes 14 P Prerequisites installation 10 S silent mode installing Intelligent Data Lake services 18 system requirements application services 14 33