Integrating with Apache Hadoop HPE Vertica Analytic Database. Software Version: 7.2.x

Size: px
Start display at page:

Download "Integrating with Apache Hadoop HPE Vertica Analytic Database. Software Version: 7.2.x"

Transcription

1 HPE Vertica Analytic Database Software Version: 7.2.x Document Release Date: 12/7/2015

2 Legal Notices Warranty The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HPE shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Restricted Rights Legend Confidential computer software. Valid license from HPE required for possession, use or copying. Consistent with FAR and , Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Copyright Notice Copyright 2015 Hewlett Packard Enterprise Development LP Trademark Notices Adobe is a trademark of Adobe Systems Incorporated. Microsoft and Windows are U.S. registered trademarks of Microsoft Corporation. UNIX is a registered trademark of The Open Group. This product includes an interface of the 'zlib' general purpose compression library, which is Copyright Jean-loup Gailly and Mark Adler. HPE Vertica Analytic Database (7.2.x) Page 2 of 129

3 Contents Introduction to Hadoop Integration 8 Hadoop Distributions 8 Integration Options 8 Cluster Layout 10 Co-Located Clusters 10 Hardware Recommendations 10 Configuring Hadoop for Co-Located Clusters 12 webhdfs 12 YARN 12 Hadoop Balancer 12 Replication Factor 13 Disk Space for Non-HDFS Use 13 Separate Clusters 13 Choosing Which Hadoop Interface to Use 16 Creating an HDFS Storage Location 16 Using the ORC Reader 16 Using the HCatalog Connector 16 Using the HDFS Connector 17 Using the MapReduce Connector 17 Using Kerberos with Hadoop 18 How Vertica uses Kerberos With Hadoop 18 User Authentication 18 Vertica Authentication 19 See Also 20 Configuring Kerberos 21 Prerequisite: Setting Up Users and the Keytab File 21 HCatalog Connector 21 HDFS Connector 21 HDFS Storage Location 22 Token Expiration 22 See Also 22 Using the ORC Reader 23 Syntax 23 Supported Data Types 24 HPE Vertica Analytic Database (7.2.x) Page 3 of 129

4 Timestamps 24 Kerberos 25 Query Performance 25 Examples 26 Using the HCatalog Connector 27 Hive, HCatalog, and WebHCat Overview 27 HCatalog Connection Features 27 HCatalog Connection Considerations 28 How the HCatalog Connector Works 28 HCatalog Connector Requirements 29 Vertica Requirements 29 Hadoop Requirements 29 Testing Connectivity 30 Installing the Java Runtime on Your Vertica Cluster 31 Installing a Java Runtime 31 Setting the JavaBinaryForUDx Configuration Parameter 32 Configuring Vertica for HCatalog 33 Copy Hadoop Libraries and Configuration Files 33 Install the HCatalog Connector 35 Using the HCatalog Connector with HA NameNode 36 Defining a Schema Using the HCatalog Connector 36 Querying Hive Tables Using HCatalog Connector 38 Viewing Hive Schema and Table Metadata 38 Synching an HCatalog Schema With a Local Schema 43 Data Type Conversions from Hive to Vertica 44 Data-Width Handling Differences Between Hive and Vertica 45 Using Non-Standard SerDes 46 Determining Which SerDe You Need 46 Installing the SerDe on the Vertica Cluster 47 Troubleshooting HCatalog Connector Problems 48 Connection Errors 48 UDx Failure When Querying Data: Error SerDe Errors 50 Differing Results Between Hive and Vertica Queries 51 Preventing Excessive Query Delays 51 Using the HDFS Connector 52 HDFS Connector Requirements 52 webhdfs Requirements 52 HPE Vertica Analytic Database (7.2.x) Page 4 of 129

5 Kerberos Authentication Requirements 53 Testing Your Hadoop webhdfs Configuration 53 Loading Data Using the HDFS Connector 55 The HDFS File URL 56 Copying Files in Parallel 57 Viewing Rejected Rows and Exceptions 58 Creating an External Table Based on HDFS Files 58 Load Errors in External Tables 60 HDFS ConnectorTroubleshooting Tips 60 User Unable to Connect to Kerberos-Authenticated Hadoop Cluster 60 Resolving Error Transfer Rate Errors 62 Using HDFS Storage Locations 64 Storage Location for HDFS Requirements 64 HDFS Space Requirements 65 Additional Requirements for Backing Up Data Stored on HDFS 65 How the HDFS Storage Location Stores Data 66 What You Can Store on HDFS 66 What HDFS Storage Locations Cannot Do 66 Creating an HDFS Storage Location 67 Creating a Storage Location Using Vertica for SQL on Hadoop 68 Adding HDFS Storage Locations to New Nodes 68 Creating a Storage Policy for HDFS Storage Locations 69 Storing an Entire Table in an HDFS Storage Location 69 Storing Table Partitions in HDFS 70 Moving Partitions to a Table Stored on HDFS 72 Backing Up Vertica Storage Locations for HDFS 73 Configuring Vertica to Restore HDFS Storage Locations 74 Configuration Overview 74 Installing a Java Runtime 75 Finding Your Hadoop Distribution's Package Repository 75 Configuring Vertica Nodes to Access the Hadoop Distribution s Package Repository 76 Installing the Required Hadoop Packages 77 Setting Configuration Parameters 79 Setting Kerberos Parameters 80 Confirming that distcp Runs 80 Troubleshooting 81 Configuring Hadoop and Vertica to Enable Backup of HDFS Storage 82 Granting Superuser Status on Hortonworks Granting Superuser Status on Cloudera Manually Enabling Snapshotting for a Directory 83 Additional Requirements for Kerberos 84 Testing the Database Account's Ability to Make HDFS Directories Snapshottable 84 HPE Vertica Analytic Database (7.2.x) Page 5 of 129

6 Performing Backups Containing HDFS Storage Locations 84 Removing HDFS Storage Locations 85 Removing Existing Data from an HDFS Storage Location 85 Moving Data to Another Storage Location 86 Clearing Storage Policies 87 Changing the Usage of HDFS Storage Locations 89 Dropping an HDFS Storage Location 90 Removing Storage Location Files from HDFS 90 Removing Backup Snapshots 90 Removing the Storage Location Directories 91 Troubleshooting HDFS Storage Locations 92 HDFS Storage Disk Consumption 92 Kerberos Authentication When Creating a Storage Location 94 Backup or Restore Fails When Using Kerberos 94 Using the MapReduce Connector 95 Vertica Connector for Hadoop Features 95 Prerequisites 95 Hadoop and Vertica Cluster Scaling 96 Installing the Connector 96 Accessing Vertica Data From Hadoop 98 Selecting VerticaInputFormat 98 Setting the Query to Retrieve Data From Vertica 99 Using a Simple Query to Extract Data From Vertica 99 Using a Parameterized Query and Parameter Lists 100 Using a Discrete List of Values 100 Using a Collection Object 100 Scaling Parameter Lists for the Hadoop Cluster 101 Using a Query to Retrieve Parameter Values for a Parameterized Query 102 Writing a Map Class That Processes Vertica Data 102 Working with the VerticaRecord Class 102 Writing Data to Vertica From Hadoop 104 Configuring Hadoop to Output to Vertica 104 Defining the Output Table 104 Writing the Reduce Class 105 Storing Data in the VerticaRecord 106 Passing Parameters to the Vertica Connector for Hadoop Map Reduce At Run Time 109 Specifying the Location of the Connector.jar File 109 Specifying the Database Connection Parameters 109 Parameters for a Separate Output Database 110 Example Vertica Connector for Hadoop Map Reduce Application 111 Compiling and Running the Example Application 115 Compiling the Example (optional) 116 Running the Example Application 117 HPE Vertica Analytic Database (7.2.x) Page 6 of 129

7 Verifying the Results 118 Using Hadoop Streaming with the Vertica Connector for Hadoop Map Reduce 119 Reading Data From Vertica in a Streaming Hadoop Job 119 Writing Data to Vertica in a Streaming Hadoop Job 121 Loading a Text File From HDFS into Vertica 123 Accessing Vertica From Pig 125 Registering the Vertica.jar Files 125 Reading Data From Vertica 125 Writing Data to Vertica 126 Integrating Vertica with the MapR Distribution of Hadoop 128 Send Documentation Feedback 129 HPE Vertica Analytic Database (7.2.x) Page 7 of 129

8 Introduction to Hadoop Integration Introduction to Hadoop Integration Apache Hadoop, like Vertica, uses a cluster of nodes for distributed processing. The primary component of interest is HDFS, the Hadoop Distributed File System. You can use HDFS from Vertica in several ways: You can import HDFS data into locally-stored ROS files. You can access HDFS data in place, using external tables. You can use HDFS as a storage location for ROS files. Hadoop includes two other components of interest: Hive, a data warehouse that provides the ability to query data stored in Hadoop. HCatalog, a component that makes Hive metadata available to applications, such as Vertica, outside of Hadoop. A Hadoop cluster can use Kerberos authentication to protect data stored in HDFS. Vertica integrates with Kerberos to access HDFS data if needed. See Using Kerberos with Hadoop. Hadoop Distributions Vertica can be used with Hadoop distributions from Hortonworks, Cloudera, and MapR. See Vertica Integrations for Hadoop for the specific versions that are supported. Integration Options Vertica supports two cluster architectures. Which you use affects the decisions you make about integration. You can co-locate Vertica on some or all of your Hadoop nodes. Vertica can then take advantage of local data. This option is supported only for Vertica for SQL on Hadoop. You can build a Vertica cluster that is separate from your Hadoop cluster. In this configuration, Vertica can fully use each of its nodes; it does not share resources with Hadoop. This option is not supported for Vertica for SQL on Hadoop.. These layout options are described in Cluster Layout. Both layouts support several interfaces for using Hadoop: HPE Vertica Analytic Database (7.2.x) Page 8 of 129

9 Introduction to Hadoop Integration An HDFS Storage Location uses HDFS to hold Vertica data (ROS files). The HCatalog Connector lets Vertica query data that is stored in a Hive database the same way you query data stored natively in a Vertica schema. The ORC Reader lets Vertica query data that is stored in the ORC format native to Hadoop. This is faster than using the HCatalog Connector for this type of data. The HDFS Connector lets Vertica import HDFS data. It also lets VerticaVertica read HDFS data as an external table without using Hive. The MapReduce Connector lets you create Hadoop MapReduce jobs that retrieve data from Vertica. These jobs can also insert data into Vertica. HPE Vertica Analytic Database (7.2.x) Page 9 of 129

10 Cluster Layout Cluster Layout Vertica and Hadoop each use a cluster of nodes for distributed processing. These clusters can be co-located, meaning you run both products on the same machines, or separate. Co-Located Clusters are for use with Vertica for SQL on Hadoop licenses. Separate Clusters are for use with Premium Edition and Community Edition licenses. Co-Located Clusters With co-located clusters, Vertica is installed on some or all of your Hadoop nodes. The Vertica nodes use a private network in addition to the public network used by all Hadoop nodes, as the following figure shows: You might choose to place Vertica on all of your Hadoop nodes or only on some of them. If you are using HDFS Storage Locations you should use at least three Vertica nodes, the minimum number for K-Safety. Using more Vertica nodes can improve performance because the HDFS data needed by a query is more likely to be local. Normally, both Hadoop and Vertica use the entire node. Because this configuration uses shared nodes, you must address potential resource contention in your configuration on those nodes. See Configuring Hadoop for Co-Located Clusters for more information. No changes are needed on Hadoop-only nodes. You can place Hadoop and Vertica clusters within a single rack, or you can span across many racks and nodes. Spreading node types across racks can improve efficiency. Hardware Recommendations Hadoop clusters frequently do not have identical provisioning requirements or hardware configurations. However, Vertica nodes should be equivalent in size and capability, per HPE Vertica Analytic Database (7.2.x) Page 10 of 129

11 Cluster Layout the best-practice standards recommended in General Hardware and OS Requirements and Recommendations in Installing Vertica. Because Hadoop cluster specifications do not always meet these standards, Hewlett Packard Enterprise recommends the following specifications for Vertica nodes in your Hadoop cluster. Specifications For... Processor Recommendation For best performance, run: Two-socket servers with 8 14 core CPUs, clocked at or above 2.6 GHz for clusters over 10 TB Single-socket servers with 8 12 cores clocked at or above 2.6 GHz for clusters under 10 TB Memory Distribute the memory appropriately across all memory channels in the server: Minimum 8 GB of memory per physical CPU core in the server High-performance applications GB of memory per physical core Type at least DDR3-1600, preferably DDR Storage Read/write: Minimum 40 MB/s per physical core of the CPU For best performance MB/s per physical core Storage post RAID: Each node should have 1 9 TB. For a production setting, RAID 10 is recommended. In some cases, RAID 50 is acceptable. Because of the heavy compression and encoding that Verticadoes, SSDs are not required. In most cases, a RAID of more, lessexpensive HDDs performs just as well as a RAID of fewer SSDs. If you intend to use RAID 50 for your data partition, you should keep a spare node in every rack, allowing for manual failover of a Vertica node in the case of a drive failure. A Vertica node recovery is faster than a RAID 50 rebuild. Also, be sure to never put more than 10 TB compressed on any node, to keep node recovery times at an acceptable rate. HPE Vertica Analytic Database (7.2.x) Page 11 of 129

12 Cluster Layout Network 10 GB networking in almost every case. With the introduction of 10 GB over cat6a (Ethernet), the cost difference is minimal. Configuring Hadoop for Co-Located Clusters If you are co-locating Vertica on any HDFS nodes, there are some additional configuration requirements. webhdfs Hadoop has two services that can provide web access to HDFS: webhdfs httpfs For Vertica, you must use the webhdfs service. YARN The YARN service is available in newer releases of Hadoop. It performs resource management for Hadoop clusters. When co-locating Vertica on YARNmanaged Hadoop nodes you must make some changes in YARN. HPE recommends reserving at least 16GB of memory for Vertica on shared nodes. Reserving more will improve performance. How you do this depends on your Hadoop distribution: If you are using Hortonworks, create a "Vertica" node label and assign this to the nodes that are running Vertica. If you are using Cloudera, enable and configure static service pools. Consult the documentation for your Hadoop distribution for details. Alternatively, you can disable YARN on the shared nodes. Hadoop Balancer The Hadoop Balancer can redistribute data blocks across HDFS. For many Hadoop services, this feature is useful. However, for Vertica this can reduce performance under some conditions. HPE Vertica Analytic Database (7.2.x) Page 12 of 129

13 Cluster Layout If you are using HDFS storage locations, the Hadoop load balancer can move data away from the Vertica nodes that are operating on it. This degrades performance. This can also occur when reading ORC files if Vertica is not running on all Hadoop nodes. (If you are using separate Vertica and Hadoop clusters, all Hadoop access is over the network, and the performance cost is less noticeable.) To prevent the undesired movement of data blocks across the HDFS cluster, consider excluding Vertica nodes from rebalancing. See the Hadoop documentation to learn how to do this. Replication Factor By default, HDFS stores three copies of each data block. Vertica is generally set up to store two copies of each data item through K-Safety. Thus, lowering the replication factor to 2 can save space and still provide data protection. To lower the number of copies HDFS stores, set HadoopFSReplication, as explained in Troubleshooting HDFS Storage Locations. Disk Space for Non-HDFS Use You also need to reserve some disk space for non-hdfs use. To reserve disk space using Ambari, set dfs.datanode.du.reserved to a value in the hdfs-site.xml configuration file. Setting this parameter preserves space for non-hdfs files that Vertica requires. Separate Clusters In the Premium Edition product, your Vertica and Hadoop clusters must be set up on separate nodes, ideally connected by a high-bandwidth network connection. This is different from the configuration for Vertica for SQL on Hadoop, in which Vertica nodes are co-located on Hadoop nodes. The following figure illustrates the configuration for separate clusters:: HPE Vertica Analytic Database (7.2.x) Page 13 of 129

14 Cluster Layout The network is a key performance component of any well-configured cluster. When Vertica stores data to HDFS it writes and reads data across the network. The layout shown in the figure calls for two networks, and there are benefits to adding a third: Database Private Network: Vertica uses a private network for command and control and moving data between nodes in support of its database functions. In some networks, the command and control and passing of data are split across two networks. Database/Hadoop Shared Network: Each Vertica node must be able to connect to each Hadoop data node and the Name Node. Hadoop best practices generally require a dedicated network for the Hadoop cluster. This is not a technical requirement, but a dedicated network improves Hadoop performance. Vertica and Hadoop should share the dedicated Hadoop network. Optional Client Network: Outside clients may access the clustered networks through a client network. This is not an absolute requirement, but the use of a third network that supports client connections to either Vertica or Hadoop can improve HPE Vertica Analytic Database (7.2.x) Page 14 of 129

15 Cluster Layout performance. If the configuration does not support a client network, than client connections should use the shared network. HPE Vertica Analytic Database (7.2.x) Page 15 of 129

16 Choosing Which Hadoop Interface to Use Choosing Which Hadoop Interface to Use Vertica provides several ways to interact with data stored in Hadoop. This section explains how to choose among them. Decisions about Cluster Layout can affect the decisions you make about Hadoop interfaces. Creating an HDFS Storage Location Using a storage location to store data in the Vertica native file format (ROS) delivers the best query performance among the available Hadoop options. (Storing ROS files on the local disk rather than in Hadoop is faster still.) If you already have data in Hadoop, however, doing this means you are importing that data into Vertica. For co-located clusters, which does not use local file storage, you might still choose to use an HDFS storage location for better performance. You can use the HDFS Connector to load data that is already in HDFS into Vertica. For separate clusters, which use local file storage, consider using an HDFS storage location for lower-priority data. See Using HDFS Storage Locations and Using the HDFS Connector. Using the ORC Reader If your data is stored in the Optimized Row Columnar format, an open format supported by most Hadoop providers, Vertica can query that data directly from HDFS. This is faster than using the HCatalog Connector, but you cannot pull schema definitions from Hive directly into the database. The ORC Reader reads the data in place; no extra copies are made. See Using the ORC Reader. Using the HCatalog Connector The HCatalog Connector uses Hadoop services (Hive and HCatalog) to query data stored in HDFS. Like the ORC Reader, it reads data in place rather than making copies. Using this interface you can read all file formats supported by Hadoop, including Parquet and ORC, and Vertica can use Hive's schema definitions. However, performance can be poor in some cases. The HCatalog Connector is also sensitive to HPE Vertica Analytic Database (7.2.x) Page 16 of 129

17 Choosing Which Hadoop Interface to Use changes in the Hadoop libraries on which it depends; upgrading your Hadoop cluster might affect your HCatalog connections. See Using the HCatalog Connector. Using the HDFS Connector The HDFS Connector can be used to create and query external tables, reading the data in place rather than making copies. The HDFS Connector can be used with any data format for which a parser is available. It does not use Hive data; you have to define the table yourself. Its performance can be poor because, like the HCatalog Connector, it cannot take advantage of the benefits of columnar file formats. See Using the HDFS Connector. Using the MapReduce Connector The other interfaces described in this section allow you to read Hadoop data from Vertica or create Vertica data in Hadoop. The MapReduce Connector, in contrast, allows you to integrate with Hadoop's MapReduce jobs. Use this connector to send Vertica data to MapReduce or to have MapReduce jobs create data in Vertica. See Using the MapReduce Connector. HPE Vertica Analytic Database (7.2.x) Page 17 of 129

18 Using Kerberos with Hadoop Using Kerberos with Hadoop If your Hadoop cluster uses Kerberos authentication to restrict access to HDFS, you must configure Vertica to make authenticated connections. The details of this configuration vary, based on which methods you are using to access HDFS data: How Vertica uses Kerberos With Hadoop Configuring Kerberos How Vertica uses Kerberos With Hadoop Vertica authenticates with Hadoop in two ways that require different configurations: User Authentication On behalf of the user, by passing along the user's existing Kerberos credentials, as occurs with the HDFS Connector and the HCatalog Connector. Vertica Authentication On behalf of system processes (such as the Tuple Mover), by using a special Kerberos credential stored in a keytab file. User Authentication To use Vertica with Kerberos and Hadoop, the client user first authenticates with the Kerberos server (Key Distribution Center, or KDC) being used by the Hadoop cluster. A user might run kinit or sign in to Active Directory, for example. A user who authenticates to a Kerberos server receives a Kerberos ticket. At the beginning of a client session, Vertica automatically retrieves this ticket.the database then uses this ticket to get a Hadoop token, which Hadoop uses to grant access. Vertica uses this token to access HDFS, such as when executing a query on behalf of the user. When the token expires, the database automatically renews it, also renewing the Kerberos ticket if necessary. The following figure shows how the user, Vertica, Hadoop, and Kerberos interact in user authentication: HPE Vertica Analytic Database (7.2.x) Page 18 of 129

19 Using Kerberos with Hadoop When using the HDFS Connector or the HCatalog Connector, or when reading an ORC file stored in HDFS, Vertica uses the client identity as the preceding figure shows. Vertica Authentication Automatic processes, such as the Tuple Mover, do not log in the way users do. Instead, Vertica uses a special identity (principal) stored in a keytab file on every database node. (This approach is also used for Vertica clusters that use Kerberos but do not use Hadoop.) After you configure the keytab file, Vertica uses the principal residing there to automatically obtain and maintain a Kerberos ticket, much as in the client scenario. In this case, the client does not interact with Kerberos. The following figure shows the interactions required for Vertica authentication: HPE Vertica Analytic Database (7.2.x) Page 19 of 129

20 Using Kerberos with Hadoop Each Vertica node uses its own principal; it is common to incorporate the name of the node into the principal name. You can either create one keytab per node, containing only that node's principal, or you can create a single keytab containing all the principals and distribute the file to all nodes. Either way, the node uses its principal to get a Kerberos ticket and then uses that ticket to get a Hadoop token. For simplicity, the preceding figure shows the full set of interactions for only one database node. When creating HDFS storage locations Vertica uses the principal in the keytab file, not the principal of the user issuing the CREATE LOCATION statement. See Also For specific configuration instructions, see Configuring Kerberos. HPE Vertica Analytic Database (7.2.x) Page 20 of 129

21 Using Kerberos with Hadoop Configuring Kerberos Vertica can connect with Hadoop in several ways, and how you manage Kerberos authentication varies by connection type. This documentation assumes that you are using Kerberos for both your HDFS and Vertica clusters. Prerequisite: Setting Up Users and the Keytab File If you have not already configured Kerberos authentication for Vertica, follow the instructions in Configure for Kerberos Authentication. In particular: Create one Kerberos principal per node. Place the keytab file(s) in the same location on each database node and set its location in KerberosKeytabFile (see Specify the Location of the Keytab File). Set KerberosServiceName to the name of the principal (see Inform About the Kerberos Principal). HCatalog Connector You use the HCatalog Connector to query data in Hive. Queries are executed on behalf of Vertica users. If the current user has a Kerberos key, then Vertica passes it to the HCatalog connector automatically. Verify that all users who need access to Hive have been granted access to HDFS. In addition, in your Hadoop configuration files (core-site.xml in most distributions), make sure that you enable all Hadoop components to impersonate the Vertica user. The easiest way to do this is to set the proxyuser property using wildcards for all users on all hosts and in all groups. Consult your Hadoop documentation for instructions. Make sure you do this before running hcatutil (see Configuring Vertica for HCatalog). HDFS Connector The HDFS Connector loads data from HDFS into Vertica on behalf of the user, using a User Defined Source. If the user performing the data load has a Kerberos key, then the UDS uses it to access HDFS. Verify that all users who use this connector have been granted access to HDFS. HPE Vertica Analytic Database (7.2.x) Page 21 of 129

22 Using Kerberos with Hadoop HDFS Storage Location You can create a database storage location in HDFS. An HDFS storage location provides improved performance compared to other HDFS interfaces (such as the HCatalog Connector). After you create Kerberos principals for each node, give all of them read and write permissions to the HDFS directory you will use as a storage location. If you plan to back up HDFS storage locations, take the following additional steps: Grant Hadoop superuser privileges to the new principals. Configure backups, including setting the HadoopConfigDir configuration parameter, following the instructions in Configuring Hadoop and Vertica to Enable Backup of HDFS Storage Configure user impersonation to be able to restore from backups following the instructions in "Setting Kerberos Parameters" in Configuring Vertica to Restore HDFS Storage Locations. Because the keytab file supplies the principal used to create the location, you must have it in place before creating the storage location. After you deploy keytab files to all database nodes, use the CREATE LOCATION statement to create the storage location as usual. Token Expiration Vertica attempts to automatically refresh Hadoop tokens before they expire, but you can also set a minimum refresh frequency if you prefer. The HadoopFSTokenRefreshFrequency configuration parameter specifies the frequency in seconds: => ALTER DATABASE exampledb SET HadoopFSTokenRefreshFrequency = '86400'; If the current age of the token is greater than the value specified in this parameter, Vertica refreshes the token before accessing data stored in HDFS. See Also How Vertica uses Kerberos With Hadoop Troubleshooting Kerberos Authentication HPE Vertica Analytic Database (7.2.x) Page 22 of 129

23 Using the ORC Reader Using the ORC Reader If your HDFS data is in the Optimized Row Columnar (ORC) format and uses no complex data types, then instead of using the HCatalog Connector you can use the ORC Reader to access the data directly. Reading directly may provide better performance. The decisions you make when writing ORC files can affect performance when using them. To get the best performance from the ORC Reader, do the following when writing: Use the latest available Hive version to write ORC files. (You can still read them with earlier versions.) Use a large stripe size; 256MB or greater is preferred. Partition the data at the table level. Sort the columns based on frequency of access, most-frequent first. Use Snappy or ZLib compression. Syntax In the COPY statement, specify a format of ORC as follows: COPY tablename FROM path ORC; In the CREATE EXTERNAL TABLE AS COPY statement, specify a format of ORC as follows: CREATE EXTERNAL TABLE tablename (columns) AS COPY FROM path ORC; If the file resides on the local file system of the node where you are issuing the command, use a local file path for path. If the file resides elsewhere in HDFS, use the webhdfs:// prefix and then specify the host name, port, and file path. Use ON ANY NODE for files that are not local to improve performance. COPY t FROM 'webhdfs://somehost:port/opt/data/orcfile' ON ANY NODE ORC; The ORC reader supports ZLib and Snappy compression. It does not support GZIP, BZIP, or LZO compression. The CREATE EXTERNAL TABLE AS COPY statement must consume all of the columns in the ORC file; unlike with some other data sources, you cannot select only HPE Vertica Analytic Database (7.2.x) Page 23 of 129

24 Using the ORC Reader the columns of interest. If you omit columns the ORC reader aborts with an error and does not copy any data. If you load from multiple ORC files in the same COPY statement and any of them is aborted, the entire load is aborted. This is different behavior than for delimited files, where the COPY statement loads what it can and ignores the rest. Supported Data Types The Vertica ORC file reader can natively read columns of all data types supported in HIVE version 0.11 and later except for complex types. If complex types such as maps are encountered, the COPY or CREATE EXTERNAL TABLE AS COPY statement aborts with an error message. The ORC reader does not attempt to read only some columns; either the entire file is read or the operation fails. For a complete list of supported types, see HIVE Data Types. Timestamps Reading timestamps from an ORC file in Vertica might result in different values than reading the same ORC file with another tool such as Hive. ORC files store timestamps without any time zone information. Vertica interprets timestamps without time zones as values in the local time zone, which might not be the time zone from which the ORC file was written. If you know which time zone was used as the reference to write the ORC file, you can set the default time zone in Vertica to correct the problem. For example, suppose the following values were written in an ORC file in the America/New_York time zone: :00: :00:00 When you read these values into Vertica the values might change, depending on your local time zone: => CREATE EXTERNAL TABLE t (ts timestamp) AS COPY FROM '/path/to/file.orc' orc; CREATE TABLE => select ts from t ; ts :00: :00:00 To adjust ORC time zones, set the TIMEZONE variable in Vertica: => SET TIMEZONE 'America/New_York' ; HPE Vertica Analytic Database (7.2.x) Page 24 of 129

25 Using the ORC Reader SET => SELECT ts AT timezone 'GMT' as tstz from t ; tstz :00: :00:00-04 The TIMEZONE variable is global to your Vertica cluster and affects all timestamp values that do not include a time zone. If you have ORC files written in more than one time zone, you cannot adjust all of them using this approach. For more information about the TIMEZONE variable, see Using Time Zones With Vertica. When Hive writes ORC files it converts dates before 1583 to the Julian calendar. Vertica does not perform this conversion. If your ORC file contains dates before this time, values in Hive and the corresponding values in Vertica will differ by up to ten days. This applies to both DATE and TIMESTAMP values. Kerberos If the ORC file is located on an HDFS cluster that uses Kerberos authentication, Vertica uses the current user's principal to authenticate. It does not use the database's principal. Query Performance When working with external tables in ORC format, Vertica tries to improve performance in two ways: by pushing query execution closer to the data so less has to be read and transmitted, and by taking advantage of data locality in planning the query. Predicate pushdown moves parts of the query execution closer to the data, reducing the amount of data that must be read from disk or across the network. ORC files have three levels of indexing: file statistics, stripe statistics and row group indexes. Predicates are applied only to the first two levels. Predicate pushdown works and is automatically applied for ORC files written with HIVE version 0.14 and later. ORC files written with earlier versions of HIVE might not contain the required statistics. When executing a query against an ORC file that lacks these statistics, Vertica logs an EXTERNAL_PREDICATE_PUSHDOWN_NOT_ SUPPORTED event in the QUERY_EVENTS system table. If you are seeing performance problems with your queries, check this table for these events. In a cluster where Vertica nodes are co-located on HDFS nodes, the query can also take advantage of data locality. If data is on an HDFS node where a database node is also present, and if the query is not restricted to specific nodes using ON NODE, then HPE Vertica Analytic Database (7.2.x) Page 25 of 129

26 Using the ORC Reader the query planner uses that database node to read that data. This allows Vertica to read data locally instead of making a network call. You can see how much ORC data is being read locally by inspecting the query plan. The label for LoadStep(s) in the plan contains a statement of the form: "X% of ORC data matched with co-located Vertica nodes". To increase the volume of local reads, consider adding more database nodes. HDFS data, by its nature, can't be moved to specific nodes, but if you run more database nodes you increase the likelihood that a database node is local to one of the copies of the data. Examples The following example shows how to read from all ORC files in a directory. It uses all supported data types. CREATE EXTERNAL TABLE t (a1 TINYINT, a2 SMALLINT, a3 INT, a4 BIGINT, a5 FLOAT, a6 DOUBLE PRECISION, a7 BOOLEAN, a8 DATE, a9 TIMESTAMP, a10 VARCHAR(20), a11 VARCHAR(20), a12 CHAR(20), a13 BINARY(20), a14 DECIMAL(10,5)) AS COPY FROM '/data/orc_test_*.orc' ORC; The following example shows the error that is produced if the file you specify is not recognized as an ORC file: CREATE EXTERNAL TABLE t (a1 TINYINT, a2 SMALLINT, a3 INT, a4 BIGINT, a5 FLOAT) AS COPY FROM '/data/not_an_orc_file.orc' ORC; ERROR 0: Failed to read orc source [/data/not_an_orc_file.orc]: Not an ORC file HPE Vertica Analytic Database (7.2.x) Page 26 of 129

27 Using the HCatalog Connector Using the HCatalog Connector The Vertica HCatalog Connector lets you access data stored in Apache's Hive data warehouse software the same way you access it within a native Vertica table. If your files are in the Optimized Columnar Row (ORC) format, you might be able to read them directly instead of going through this connector. For more information, see Using the ORC Reader. Hive, HCatalog, and WebHCat Overview There are several Hadoop components that you need to understand in order to use the HCatalog connector: Apache's Hive lets you query data stored in a Hadoop Distributed File System (HDFS) the same way you query data stored in a relational database. Behind the scenes, Hive uses a set of serializer and deserializer (SerDe) classes to extract data from files stored on the HDFS and break it into columns and rows. Each SerDe handles data files in a specific format. For example, one SerDe extracts data from comma-separated data files while another interprets data stored in JSON format. Apache HCatalog is a component of the Hadoop ecosystem that makes Hive's metadata available to other Hadoop components (such as Pig). WebHCat (formerly known as Templeton) makes HCatalog and Hive data available via a REST web API. Through it, you can make an HTTP request to retrieve data stored in Hive, as well as information about the Hive schema. Vertica's HCatalog Connector lets you transparently access data that is available through WebHCat. You use the connector to define a schema in Vertica that corresponds to a Hive database or schema. When you query data within this schema, the HCatalog Connector transparently extracts and formats the data from Hadoop into tabular data. The data within this HCatalog schema appears as if it is native to Vertica. You can even perform operations such as joins between Vertica-native tables and HCatalog tables. For more details, see How the HCatalog Connector Works. HCatalog Connection Features The HCatalog Connector lets you query data stored in Hive using the Vertica native SQL syntax. Some of its main features are: HPE Vertica Analytic Database (7.2.x) Page 27 of 129

28 Using the HCatalog Connector The HCatalog Connector always reflects the current state of data stored in Hive. The HCatalog Connector uses the parallel nature of both Vertica and Hadoop to process Hive data. The result is that querying data through the HCatalog Connector is often faster than querying the data directly through Hive. Since Vertica performs the extraction and parsing of data, the HCatalog Connector does not signficantly increase the load on your Hadoop cluster. The data you query through the HCatalog Connector can be used as if it were native Vertica data. For example, you can execute a query that joins data from a table in an HCatalog schema with a native table. HCatalog Connection Considerations There are a few things to keep in mind when using the HCatalog Connector: Hive's data is stored in flat files in a distributed filesystem, requiring it to be read and deserialized each time it is queried. This deserialization causes Hive's performance to be much slower than Vertica. The HCatalog Connector has to perform the same process as Hive to read the data. Therefore, querying data stored in Hive using the HCatalog Connector is much slower than querying a native Vertica table. If you need to perform extensive analysis on data stored in Hive, you should consider loading it into Vertica through the HCatalog Connector or the WebHDFS connector. Vertica optimization often makes querying data through the HCatalog Connector faster than directly querying it through Hive. Hive supports complex data types such as lists, maps, and structs that Vertica does not support. Columns containing these data types are converted to a JSON representation of the data type and stored as a VARCHAR. See Data Type Conversions from Hive to Vertica. Note: The HCatalog Connector is read only. It cannot insert data into Hive. How the HCatalog Connector Works When planning a query that accesses data from a Hive table, the Vertica HCatalog Connector on the initiator node contacts the WebHCat server in your Hadoop cluster to determine if the table exists. If it does, the connector retrieves the table's metadata from the metastore database so the query planning can continue. When the query executes, all nodes in the Vertica cluster directly retrieve the data necessary for completing the HPE Vertica Analytic Database (7.2.x) Page 28 of 129

29 Using the HCatalog Connector query from the Hadoop HDFS. They then use the Hive SerDe classes to extract the data so the query can execute. This approach takes advantage of the parallel nature of both Vertica and Hadoop. In addition, by performing the retrieval and extraction of data directly, the HCatalog Connector reduces the impact of the query on the Hadoop cluster. HCatalog Connector Requirements Before you can use the HCatalog Connector, both your Vertica and Hadoop installations must meet the following requirements. Vertica Requirements All of the nodes in your cluster must have a Java Virtual Machine (JVM) installed. See Installing the Java Runtime on Your Vertica Cluster. You must also add certain libraries distributed with Hadoop and Hive to your Vertica installation directory. See Configuring Vertica for HCatalog. Hadoop Requirements Your Hadoop cluster must meet several requirements to operate correctly with the Vertica Connector for HCatalog: HPE Vertica Analytic Database (7.2.x) Page 29 of 129

30 Using the HCatalog Connector It must have Hive and HCatalog installed and running. See Apache's HCatalog page for more information. It must have WebHCat (formerly known as Templeton) installed and running. See Apache' s WebHCat page for details. The WebHCat server and all of the HDFS nodes that store HCatalog data must be directly accessible from all of the hosts in your Vertica database. Verify that any firewall separating the Hadoop cluster and the Vertica cluster will pass WebHCat, metastore database, and HDFS traffic. The data that you want to query must be in an internal or external Hive table. If a table you want to query uses a non-standard SerDe, you must install the SerDe's classes on your Vertica cluster before you can query the data. See Using Non- Standard SerDes. Testing Connectivity To test the connection between your database cluster and WebHcat, log into a node in your Vertica cluster. Then, run the following command to execute an HCatalog query: $ curl Where: webhcatserver is the IP address or hostname of the WebHCat server port is the port number assigned to the WebHCat service (usually 50111) hcatusername is a valid username authorized to use HCatalog Usually, you want to append ;echo to the command to add a linefeed after the curl command's output. Otherwise, the command prompt is automatically appended to the command's output, making it harder to read. For example: $ curl echo If there are no errors, this command returns a status message in JSON format, similar to the following: {"status":"ok","version":"v1"} This result indicates that WebHCat is running and that the Vertica host can connect to it and retrieve a result. If you do not receive this result, troubleshoot your Hadoop HPE Vertica Analytic Database (7.2.x) Page 30 of 129

31 Using the HCatalog Connector installation and the connectivity between your Hadoop and Vertica clusters. For details, see Troubleshooting HCatalog Connector Problems. You can also run some queries to verify that WebHCat is correctly configured to work with Hive. The following example demonstrates listing the databases defined in Hive and the tables defined within a database: $ curl echo {"databases":["default","production"]} $ curl echo {"tables":["messages","weblogs","tweets","transactions"],"database":"default"} See Apache's WebHCat reference for details about querying Hive using WebHCat. Installing the Java Runtime on Your Vertica Cluster The HCatalog Connector requires a 64-bit Java Virtual Machine (JVM). The JVM must support Java 6 or later, and must be the same version as the one installed on your Hadoop nodes. Note: If your Vertica cluster is configured to execute User Defined Extensions (UDxs) written in Java, it already has a correctly-configured JVM installed. See Developing User Defined Functions in Java in Extending Vertica for more information. Installing Java on your Vertica cluster is a two-step process: 1. Install a Java runtime on all of the hosts in your cluster. 2. Set the JavaBinaryForUDx configuration parameter to tell Vertica the location of the Java executable. Installing a Java Runtime For Java-based features, Vertica requires a 64-bit Java 6 (Java version 1.6) or later Java runtime. Vertica supports runtimes from either Oracle or OpenJDK. You can choose to install either the Java Runtime Environment (JRE) or Java Development Kit (JDK), since the JDK also includes the JRE. Many Linux distributions include a package for the OpenJDK runtime. See your Linux distribution's documentation for information about installing and configuring OpenJDK. To install the Oracle Java runtime, see the Java Standard Edition (SE) Download Page. You usually run the installation package as root in order to install it. See the download page for instructions. HPE Vertica Analytic Database (7.2.x) Page 31 of 129

32 Using the HCatalog Connector Once you have installed a JVM on each host, ensure that the java command is in the search path and calls the correct JVM by running the command: $ java -version This command should print something similar to: java version "1.6.0_37" Java(TM) SE Runtime Environment (build 1.6.0_37-b06) Java HotSpot(TM) 64-Bit Server VM (build b01, mixed mode) Note: Any previously installed Java VM on your hosts may interfere with a newly installed Java runtime. See your Linux distribution's documentation for instructions on configuring which JVM is the default. Unless absolutely required, you should uninstall any incompatible version of Java before installing the Java 6 or Java 7 runtime. Setting the JavaBinaryForUDx Configuration Parameter The JavaBinaryForUDx configuration parameter tells Vertica where to look for the JRE to execute Java UDxs. After you have installed the JRE on all of the nodes in your cluster, set this parameter to the absolute path of the Java executable. You can use the symbolic link that some Java installers create (for example /usr/bin/java). If the Java executable is in your shell search path, you can get the path of the Java executable by running the following command from the Linux command line shell: $ which java /usr/bin/java If the java command is not in the shell search path, use the path to the Java executable in the directory where you installed the JRE. Suppose you installed the JRE in /usr/java/default (which is where the installation package supplied by Oracle installs the Java 1.6 JRE). In this case the Java executable is /usr/java/default/bin/java. You set the configuration parameter by executing the following statement as a database superuser: => ALTER DATABASE mydb SET JavaBinaryForUDx = '/usr/bin/java'; See ALTER DATABASE for more information on setting configuration parameters. To view the current setting of the configuration parameter, query the CONFIGURATION_PARAMETERS system table: HPE Vertica Analytic Database (7.2.x) Page 32 of 129

33 Using the HCatalog Connector => \x Expanded display is on. => SELECT * FROM CONFIGURATION_PARAMETERS WHERE parameter_name = 'JavaBinaryForUDx'; -[ RECORD 1 ] node_name ALL parameter_name JavaBinaryForUDx current_value /usr/bin/java default_value change_under_support_guidance f change_requires_restart f description Path to the java binary for executing UDx written in Java Once you have set the configuration parameter, Vertica can find the Java executable on each node in your cluster. Note: Since the location of the Java executable is set by a single configuration parameter for the entire cluster, you must ensure that the Java executable is installed in the same path on all of the hosts in the cluster. Configuring Vertica for HCatalog Before you can use the HCatalog Connector, you must add certain Hadoop and Hive libraries to your Vertica installation. You must also copy the Hadoop configuration files that specify various connection properties. Vertica uses the values in those configuration files to make its own connections to Hadoop. You need only make these changes on one node in your cluster. After you do this you can install the HCatalog connector. Copy Hadoop Libraries and Configuration Files Vertica provides a tool, hcatutil, to collect the required files from Hadoop. This tool copies selected libraries and XML configuration files from your Hadoop cluster to your Vertica cluster. This tool might also need access to additional libraries: If you plan to use Hive to query files that use Snappy compression, you need access to the Snappy native libraries, libhadoop*.so and libsnappy*.so. If you plan to use a JSON SerDe with a Hive table, you need access to its library. This is the same library that you used to configure Hive; for example: hive> add jar /home/release/json-serde-1.3-jar-with-dependencies.jar; hive> create external table nationjson (id int,name string,rank int,text string) HPE Vertica Analytic Database (7.2.x) Page 33 of 129

34 Using the HCatalog Connector ROW FORMAT SERDE 'org.openx.data.jsonserde.jsonserde' LOCATION '/user/release/vt/nationjson'; If either of these cases applies to you, do one of the following: Include the path(s) in the path you specify as the value of --hcatlibpath, or Copy the file(s) to a directory already on that path. If Vertica is not co-located on a Hadoop node, you should do the following: 1. Copy /opt/vertica/packages/hcat/tools/hcatutil to a Hadoop node and run it there, specifying a temporary output directory. Your Hadoop, HIVE, and HCatalog lib paths might be different; in particular, in newer versions of Hadoop the HCatalog directory is usually a subdirectory under the HIVE directory. Use the values from your environment in the following command: hcatutil --copyjars --hadoophivehome="/hadoop/lib;/hive/lib;/hcatalog/dist/share" --hadoophiveconfpath="/hadoop;/hive;/webhcat" --hcatlibpath=/tmp/hadoop-files 2. Verify that all necessary files were copied: hcatutil --verifyjars --hcatlibpath=/tmp/hadoop-files 3. Copy that output directory (/tmp/hadoop-files, in this example) to /opt/vertica/packages/hcat/lib on the Vertica node you will connect to when installing the HCatalog connector. If you are updating a Vertica cluster to use a new Hadoop cluster (or a new version of Hadoop), first remove all JAR files in /opt/vertica/packages/hcat/lib except vertica-hcatalogudl.jar. 4. Verify that all necessary files were copied: hcatutil --verifyjars --hcatlibpath=/opt/vertica/packages/hcat If you are using the Vertica for SQL on Hadoop product with co-located clusters, you can do this in one step on a shared node. Your Hadoop, HIVE, and HCatalog lib paths might be different; use the values from your environment in the following command: hcatutil --copyjars --hadoophivehome="/hadoop/lib;/hive/lib;/hcatalog/dist/share" --hadoophiveconfpath="/hadoop;/hive;/webhcat" --hcatlibpath=/opt/vertica/packages/hcat/lib The hcatutil script has the following arguments: HPE Vertica Analytic Database (7.2.x) Page 34 of 129

Integrating with Hadoop HPE Vertica Analytic Database. Software Version: 7.2.x

Integrating with Hadoop HPE Vertica Analytic Database. Software Version: 7.2.x HPE Vertica Analytic Database Software Version: 7.2.x Document Release Date: 5/18/2016 Legal Notices Warranty The only warranties for Hewlett Packard Enterprise products and services are set forth in the

More information

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.0.x HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 5/7/2014 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

Hadoop Integration Guide

Hadoop Integration Guide HP Vertica Analytic Database Software Version: 7.1.x Document Release Date: 12/9/2015 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

Supported Platforms HPE Vertica Analytic Database. Software Version: 7.2.x

Supported Platforms HPE Vertica Analytic Database. Software Version: 7.2.x HPE Vertica Analytic Database Software Version: 7.2.x Document Release Date: 2/4/2016 Legal Notices Warranty The only warranties for Hewlett Packard Enterprise products and services are set forth in the

More information

Plug-In for Informatica Guide

Plug-In for Informatica Guide HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 2/20/2015 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.1.x

Supported Platforms. HP Vertica Analytic Database. Software Version: 7.1.x HP Vertica Analytic Database Software Version: 7.1.x Document Release Date: 10/14/2015 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

Vertica OnDemand Getting Started Guide HPE Vertica Analytic Database. Software Version: 7.2.x

Vertica OnDemand Getting Started Guide HPE Vertica Analytic Database. Software Version: 7.2.x Vertica OnDemand Getting Started Guide HPE Vertica Analytic Database Software Version: 7.2.x Document Release Date: 12/15/2015 Legal Notices Warranty The only warranties for Hewlett Packard Enterprise

More information

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

Big Data Operations Guide for Cloudera Manager v5.x Hadoop Big Data Operations Guide for Cloudera Manager v5.x Hadoop Logging into the Enterprise Cloudera Manager 1. On the server where you have installed 'Cloudera Manager', make sure that the server is running,

More information

HADOOP MOCK TEST HADOOP MOCK TEST II

HADOOP MOCK TEST HADOOP MOCK TEST II http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

HP SiteScope. HP Vertica Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems. Software Version: 11.

HP SiteScope. HP Vertica Solution Template Best Practices. For the Windows, Solaris, and Linux operating systems. Software Version: 11. HP SiteScope For the Windows, Solaris, and Linux operating systems Software Version: 11.23 HP Vertica Solution Template Best Practices Document Release Date: December 2013 Software Release Date: December

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Architecting the Future of Big Data

Architecting the Future of Big Data Hive ODBC Driver User Guide Revised: July 22, 2013 2012-2013 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and

More information

ORACLE GOLDENGATE BIG DATA ADAPTER FOR HIVE

ORACLE GOLDENGATE BIG DATA ADAPTER FOR HIVE ORACLE GOLDENGATE BIG DATA ADAPTER FOR HIVE Version 1.0 Oracle Corporation i Table of Contents TABLE OF CONTENTS... 2 1. INTRODUCTION... 3 1.1. FUNCTIONALITY... 3 1.2. SUPPORTED OPERATIONS... 4 1.3. UNSUPPORTED

More information

Cloudera Manager Training: Hands-On Exercises

Cloudera Manager Training: Hands-On Exercises 201408 Cloudera Manager Training: Hands-On Exercises General Notes... 2 In- Class Preparation: Accessing Your Cluster... 3 Self- Study Preparation: Creating Your Cluster... 4 Hands- On Exercise: Working

More information

HDFS Users Guide. Table of contents

HDFS Users Guide. Table of contents Table of contents 1 Purpose...2 2 Overview...2 3 Prerequisites...3 4 Web Interface...3 5 Shell Commands... 3 5.1 DFSAdmin Command...4 6 Secondary NameNode...4 7 Checkpoint Node...5 8 Backup Node...6 9

More information

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide Microsoft SQL Server Connector for Apache Hadoop Version 1.0 User Guide October 3, 2011 Contents Legal Notice... 3 Introduction... 4 What is SQL Server-Hadoop Connector?... 4 What is Sqoop?... 4 Supported

More information

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Spring,2015 Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE Contents: Briefly About Big Data Management What is hive? Hive Architecture Working

More information

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5

How To Use Cloudera Manager Backup And Disaster Recovery (Brd) On A Microsoft Hadoop 5.5.5 (Clouderma) On An Ubuntu 5.2.5 Or 5.3.5 Cloudera Manager Backup and Disaster Recovery Important Notice (c) 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or

More information

HP AppPulse Active. Software Version: 2.2. Real Device Monitoring For AppPulse Active

HP AppPulse Active. Software Version: 2.2. Real Device Monitoring For AppPulse Active HP AppPulse Active Software Version: 2.2 For AppPulse Active Document Release Date: February 2015 Software Release Date: November 2014 Legal Notices Warranty The only warranties for HP products and services

More information

File S1: Supplementary Information of CloudDOE

File S1: Supplementary Information of CloudDOE File S1: Supplementary Information of CloudDOE Table of Contents 1. Prerequisites of CloudDOE... 2 2. An In-depth Discussion of Deploying a Hadoop Cloud... 2 Prerequisites of deployment... 2 Table S1.

More information

HP Business Service Management

HP Business Service Management HP Business Service Management for the Windows and Linux operating systems Software Version: 9.10 Business Process Insight Server Administration Guide Document Release Date: August 2011 Software Release

More information

Architecting the Future of Big Data

Architecting the Future of Big Data Hive ODBC Driver User Guide Revised: October 1, 2012 2012 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and

More information

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0

Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0 Enhanced Connector Applications SupportPac VP01 for IBM WebSphere Business Events 3.0.0 Third edition (May 2012). Copyright International Business Machines Corporation 2012. US Government Users Restricted

More information

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery Cloudera Backup and Disaster Recovery Important Note: Cloudera Manager 4 and CDH 4 have reached End of Maintenance (EOM) on August 9, 2015. Cloudera will not support or provide patches for any of the Cloudera

More information

Architecting the Future of Big Data

Architecting the Future of Big Data Hive ODBC Driver User Guide Revised: July 22, 2014 2012-2014 Hortonworks Inc. All Rights Reserved. Parts of this Program and Documentation include proprietary software and content that is copyrighted and

More information

IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL

IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL IUCLID 5 Guidance and support Installation Guide Distributed Version Linux - Apache Tomcat - PostgreSQL June 2009 Legal Notice Neither the European Chemicals Agency nor any person acting on behalf of the

More information

Connectivity Pack for Microsoft Guide

Connectivity Pack for Microsoft Guide HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 2/20/2015 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1

How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 How to Install and Configure EBF15328 for MapR 4.0.1 or 4.0.2 with MapReduce v1 1993-2015 Informatica Corporation. No part of this document may be reproduced or transmitted in any form, by any means (electronic,

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Simba XMLA Provider for Oracle OLAP 2.0. Linux Administration Guide. Simba Technologies Inc. April 23, 2013

Simba XMLA Provider for Oracle OLAP 2.0. Linux Administration Guide. Simba Technologies Inc. April 23, 2013 Simba XMLA Provider for Oracle OLAP 2.0 April 23, 2013 Simba Technologies Inc. Copyright 2013 Simba Technologies Inc. All Rights Reserved. Information in this document is subject to change without notice.

More information

Move Data from Oracle to Hadoop and Gain New Business Insights

Move Data from Oracle to Hadoop and Gain New Business Insights Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Kognitio Technote Kognitio v8.x Hadoop Connector Setup

Kognitio Technote Kognitio v8.x Hadoop Connector Setup Kognitio Technote Kognitio v8.x Hadoop Connector Setup For External Release Kognitio Document No Authors Reviewed By Authorised By Document Version Stuart Watt Date Table Of Contents Document Control...

More information

HDFS. Hadoop Distributed File System

HDFS. Hadoop Distributed File System HDFS Kevin Swingler Hadoop Distributed File System File system designed to store VERY large files Streaming data access Running across clusters of commodity hardware Resilient to node failure 1 Large files

More information

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support...

New Features... 1 Installation... 3 Upgrade Changes... 3 Fixed Limitations... 4 Known Limitations... 5 Informatica Global Customer Support... Informatica Corporation B2B Data Exchange Version 9.5.0 Release Notes June 2012 Copyright (c) 2006-2012 Informatica Corporation. All rights reserved. Contents New Features... 1 Installation... 3 Upgrade

More information

EMC Documentum Connector for Microsoft SharePoint

EMC Documentum Connector for Microsoft SharePoint EMC Documentum Connector for Microsoft SharePoint Version 7.1 Installation Guide EMC Corporation Corporate Headquarters Hopkinton, MA 01748-9103 1-508-435-1000 www.emc.com Legal Notice Copyright 2013-2014

More information

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes Contents Pentaho Corporation Version 5.1 Copyright Page New Features in Pentaho Data Integration 5.1 PDI Version 5.1 Minor Functionality Changes Legal Notices https://help.pentaho.com/template:pentaho/controls/pdftocfooter

More information

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam sastry.vedantam@oracle.com Agenda The rise of Big Data & Hadoop MySQL in the Big Data Lifecycle MySQL Solutions for Big Data Q&A

More information

Hadoop Integration Guide

Hadoop Integration Guide HP Vertica Analytic Database Software Version: 7.0.x Document Release Date: 2/20/2015 Legal Notices Warranty The only warranties for HP products and services are set forth in the express warranty statements

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Symantec Enterprise Solution for Hadoop Installation and Administrator's Guide 1.0

Symantec Enterprise Solution for Hadoop Installation and Administrator's Guide 1.0 Symantec Enterprise Solution for Hadoop Installation and Administrator's Guide 1.0 The software described in this book is furnished under a license agreement and may be used only in accordance with the

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com Hortonworks Data Platform: Configuring Kafka for Kerberos Over Ambari Copyright 2012-2015 Hortonworks, Inc. Some rights reserved. The Hortonworks Data Platform, powered by Apache Hadoop,

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

and Hadoop Technology

and Hadoop Technology SAS and Hadoop Technology Overview SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2015. SAS and Hadoop Technology: Overview. Cary, NC: SAS Institute

More information

HP Vertica Integration with SAP Business Objects: Tips and Techniques. HP Vertica Analytic Database

HP Vertica Integration with SAP Business Objects: Tips and Techniques. HP Vertica Analytic Database HP Vertica Integration with SAP Business Objects: Tips and Techniques HP Vertica Analytic Database HP Big Data Document Release Date: June 23, 2015 Legal Notices Warranty The only warranties for HP products

More information

Cloudera Backup and Disaster Recovery

Cloudera Backup and Disaster Recovery Cloudera Backup and Disaster Recovery Important Notice (c) 2010-2013 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, and any other product or service names or slogans

More information

HP Operations Orchestration Software

HP Operations Orchestration Software HP Operations Orchestration Software Software Version: 9.00 HP Service Desk Integration Guide Document Release Date: June 2010 Software Release Date: June 2010 Legal Notices Warranty The only warranties

More information

Metalogix SharePoint Backup. Advanced Installation Guide. Publication Date: August 24, 2015

Metalogix SharePoint Backup. Advanced Installation Guide. Publication Date: August 24, 2015 Metalogix SharePoint Backup Publication Date: August 24, 2015 All Rights Reserved. This software is protected by copyright law and international treaties. Unauthorized reproduction or distribution of this

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

HP reference configuration for entry-level SAS Grid Manager solutions

HP reference configuration for entry-level SAS Grid Manager solutions HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2

More information

HP Project and Portfolio Management Center

HP Project and Portfolio Management Center HP Project and Portfolio Management Center Software Version: 9.20 RESTful Web Services Guide Document Release Date: February 2013 Software Release Date: February 2013 Legal Notices Warranty The only warranties

More information

Control-M for Hadoop. Technical Bulletin. www.bmc.com

Control-M for Hadoop. Technical Bulletin. www.bmc.com Technical Bulletin Control-M for Hadoop Version 8.0.00 September 30, 2014 Tracking number: PACBD.8.0.00.004 BMC Software is announcing that Control-M for Hadoop now supports the following: Secured Hadoop

More information

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.

Hadoop. Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. Hadoop Source Alessandro Rezzani, Big Data - Architettura, tecnologie e metodi per l utilizzo di grandi basi di dati, Apogeo Education, ottobre 2013 wikipedia Hadoop Apache Hadoop is an open-source software

More information

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) Use Data from a Hadoop Cluster with Oracle Database Hands-On Lab Lab Structure Acronyms: OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS) All files are

More information

Deploying Hadoop with Manager

Deploying Hadoop with Manager Deploying Hadoop with Manager SUSE Big Data Made Easier Peter Linnell / Sales Engineer plinnell@suse.com Alejandro Bonilla / Sales Engineer abonilla@suse.com 2 Hadoop Core Components 3 Typical Hadoop Distribution

More information

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks

Hadoop Introduction. Olivier Renault Solution Engineer - Hortonworks Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013

More information

Interworks. Interworks Cloud Platform Installation Guide

Interworks. Interworks Cloud Platform Installation Guide Interworks Interworks Cloud Platform Installation Guide Published: March, 2014 This document contains information proprietary to Interworks and its receipt or possession does not convey any rights to reproduce,

More information

Innovative technology for big data analytics

Innovative technology for big data analytics Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of

More information

StreamServe Persuasion SP4

StreamServe Persuasion SP4 StreamServe Persuasion SP4 Installation Guide Rev B StreamServe Persuasion SP4 Installation Guide Rev B 2001-2009 STREAMSERVE, INC. ALL RIGHTS RESERVED United States patent #7,127,520 No part of this document

More information

Upgrading VMware Identity Manager Connector

Upgrading VMware Identity Manager Connector Upgrading VMware Identity Manager Connector VMware Identity Manager This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new

More information

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine Version 3.0 Please note: This appliance is for testing and educational purposes only; it is unsupported and not

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, 2015. Integration Guide IBM

IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, 2015. Integration Guide IBM IBM Campaign and IBM Silverpop Engage Version 1 Release 2 August 31, 2015 Integration Guide IBM Note Before using this information and the product it supports, read the information in Notices on page 93.

More information

How To Install An Aneka Cloud On A Windows 7 Computer (For Free)

How To Install An Aneka Cloud On A Windows 7 Computer (For Free) MANJRASOFT PTY LTD Aneka 3.0 Manjrasoft 5/13/2013 This document describes in detail the steps involved in installing and configuring an Aneka Cloud. It covers the prerequisites for the installation, the

More information

docs.hortonworks.com

docs.hortonworks.com docs.hortonworks.com : Security Administration Tools Guide Copyright 2012-2014 Hortonworks, Inc. Some rights reserved. The, powered by Apache Hadoop, is a massively scalable and 100% open source platform

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

ADAM 5.5. System Requirements

ADAM 5.5. System Requirements ADAM 5.5 System Requirements 1 1. Overview The schema below shows an overview of the ADAM components that will be installed and set up. ADAM Server: hosts the ADAM core components. You must install the

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big

More information

Revolution R Enterprise 7 Hadoop Configuration Guide

Revolution R Enterprise 7 Hadoop Configuration Guide Revolution R Enterprise 7 Hadoop Configuration Guide The correct bibliographic citation for this manual is as follows: Revolution Analytics, Inc. 2014. Revolution R Enterprise 7 Hadoop Configuration Guide.

More information

Installation Guide. SAP Control Center 3.3

Installation Guide. SAP Control Center 3.3 Installation Guide SAP Control Center 3.3 DOCUMENT ID: DC01002-01-0330-01 LAST REVISED: November 2013 Copyright 2013 by SAP AG or an SAP affiliate company. All rights reserved. No part of this publication

More information

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016. Integration Guide IBM

IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016. Integration Guide IBM IBM Campaign Version-independent Integration with IBM Engage Version 1 Release 3 April 8, 2016 Integration Guide IBM Note Before using this information and the product it supports, read the information

More information

Integrating with Apache Kafka HPE Vertica Analytic Database. Software Version: 7.2.x

Integrating with Apache Kafka HPE Vertica Analytic Database. Software Version: 7.2.x HPE Vertica Analytic Database Software Version: 7.2.x Document Release Date: 2/4/2016 Legal Notices Warranty The only warranties for Hewlett Packard Enterprise products and services are set forth in the

More information

PHD Virtual Backup for Hyper-V

PHD Virtual Backup for Hyper-V PHD Virtual Backup for Hyper-V version 7.0 Installation & Getting Started Guide Document Release Date: December 18, 2013 www.phdvirtual.com PHDVB v7 for Hyper-V Legal Notices PHD Virtual Backup for Hyper-V

More information

PATROL Console Server and RTserver Getting Started

PATROL Console Server and RTserver Getting Started PATROL Console Server and RTserver Getting Started Supporting PATROL Console Server 7.5.00 RTserver 6.6.00 February 14, 2005 Contacting BMC Software You can access the BMC Software website at http://www.bmc.com.

More information

HADOOP MOCK TEST HADOOP MOCK TEST I

HADOOP MOCK TEST HADOOP MOCK TEST I http://www.tutorialspoint.com HADOOP MOCK TEST Copyright tutorialspoint.com This section presents you various set of Mock Tests related to Hadoop Framework. You can download these sample mock tests at

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Parquet. Columnar storage for the people

Parquet. Columnar storage for the people Parquet Columnar storage for the people Julien Le Dem @J_ Processing tools lead, analytics infrastructure at Twitter Nong Li nong@cloudera.com Software engineer, Cloudera Impala Outline Context from various

More information

Installing and Configuring vcenter Multi-Hypervisor Manager

Installing and Configuring vcenter Multi-Hypervisor Manager Installing and Configuring vcenter Multi-Hypervisor Manager vcenter Server 5.1 vcenter Multi-Hypervisor Manager 1.1 This document supports the version of each product listed and supports all subsequent

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

HP LeftHand SAN Solutions

HP LeftHand SAN Solutions HP LeftHand SAN Solutions Support Document Installation Manuals Installation and Setup Guide Health Check Legal Notices Warranty The only warranties for HP products and services are set forth in the express

More information

Document Type: Best Practice

Document Type: Best Practice Global Architecture and Technology Enablement Practice Hadoop with Kerberos Deployment Considerations Document Type: Best Practice Note: The content of this paper refers exclusively to the second maintenance

More information

Quick Install Guide. Lumension Endpoint Management and Security Suite 7.1

Quick Install Guide. Lumension Endpoint Management and Security Suite 7.1 Quick Install Guide Lumension Endpoint Management and Security Suite 7.1 Lumension Endpoint Management and Security Suite - 2 - Notices Version Information Lumension Endpoint Management and Security Suite

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

H2O on Hadoop. September 30, 2014. www.0xdata.com

H2O on Hadoop. September 30, 2014. www.0xdata.com H2O on Hadoop September 30, 2014 www.0xdata.com H2O on Hadoop Introduction H2O is the open source math & machine learning engine for big data that brings distribution and parallelism to powerful algorithms

More information

http://docs.trendmicro.com

http://docs.trendmicro.com Trend Micro Incorporated reserves the right to make changes to this document and to the products described herein without notice. Before installing and using the product, please review the readme files,

More information

Installation and Configuration Guide for Windows and Linux

Installation and Configuration Guide for Windows and Linux Installation and Configuration Guide for Windows and Linux vcenter Operations Manager 5.7 This document supports the version of each product listed and supports all subsequent versions until the document

More information

Veeam Cloud Connect. Version 8.0. Administrator Guide

Veeam Cloud Connect. Version 8.0. Administrator Guide Veeam Cloud Connect Version 8.0 Administrator Guide April, 2015 2015 Veeam Software. All rights reserved. All trademarks are the property of their respective owners. No part of this publication may be

More information

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation

TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation TIBCO ActiveMatrix BusinessWorks Plug-in for TIBCO Managed File Transfer Software Installation Software Release 6.0 November 2015 Two-Second Advantage 2 Important Information SOME TIBCO SOFTWARE EMBEDS

More information

Automated Process Center Installation and Configuration Guide for UNIX

Automated Process Center Installation and Configuration Guide for UNIX Automated Process Center Installation and Configuration Guide for UNIX Table of Contents Introduction... 1 Lombardi product components... 1 Lombardi architecture... 1 Lombardi installation options... 4

More information

HYPERION SYSTEM 9 N-TIER INSTALLATION GUIDE MASTER DATA MANAGEMENT RELEASE 9.2

HYPERION SYSTEM 9 N-TIER INSTALLATION GUIDE MASTER DATA MANAGEMENT RELEASE 9.2 HYPERION SYSTEM 9 MASTER DATA MANAGEMENT RELEASE 9.2 N-TIER INSTALLATION GUIDE P/N: DM90192000 Copyright 2005-2006 Hyperion Solutions Corporation. All rights reserved. Hyperion, the Hyperion logo, and

More information

Basic Installation of the Cisco Collection Manager

Basic Installation of the Cisco Collection Manager CHAPTER 3 Basic Installation of the Cisco Collection Manager Introduction This chapter gives the information required for a basic installation of the Cisco Collection Manager and the bundled Sybase database.

More information

HP D2D NAS Integration with HP Data Protector 6.11

HP D2D NAS Integration with HP Data Protector 6.11 HP D2D NAS Integration with HP Data Protector 6.11 Abstract This guide provides step by step instructions on how to configure and optimize HP Data Protector 6.11 in order to back up to HP D2D Backup Systems

More information