Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Similar documents

Livezilla How to Install on Shared Hosting By: Jon Manning

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Creating a universe on Hive with Hortonworks HDP 2.0

Set Up Hortonworks Hadoop with SQL Anywhere

Installation and Configuration Documentation

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

Architecting the Future of Big Data

Monitoring Oracle Enterprise Performance Management System Release Deployments from Oracle Enterprise Manager 12c

Installation Guidelines (MySQL database & Archivists Toolkit client)

CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities

Using Internet or Windows Explorer to Upload Your Site

Installing GFI LANguard Network Security Scanner

Installation manual SAP Business Objects Data Services XI 3.2 on a Microsoft Windows 7-64-bit machine

Hadoop Multi-node Cluster Installation on Centos6.6

Hadoop Data Warehouse Manual

Querying Databases Using the DB Query and JDBC Query Nodes

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

Installation Manual for Setting up SAP Business Objects BI 4.0 Edge System

NSi Mobile Installation Guide. Version 6.2

Hadoop Training Hands On Exercise

Big Data Operations Guide for Cloudera Manager v5.x Hadoop

docs.hortonworks.com

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

StoreGrid Backup Server With MySQL As Backend Database:

Installation manual SAP BusinessObjects BI4.0

SAP Business Objects Data Services Setup Guide

Deploying BitDefender Client Security and BitDefender Windows Server Solutions

Installing The SysAidTM Server Locally

Using Microsoft Windows Authentication for Microsoft SQL Server Connections in Data Archive

Enterprise Apple Xserve Wiki and Blog using Active Directory. Table Of Contents. Prerequisites 1. Introduction 1

LAE 5.1. Windows Server Installation Guide. Version 1.0

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

This document summarizes the steps of deploying ActiveVOS on oracle Weblogic Platform.

Apache Hadoop new way for the company to store and analyze big data

Secure Messaging Server Console... 2

vcenter Operations Management Pack for SAP HANA Installation and Configuration Guide

Architecting the Future of Big Data

SAP BusinessObjects Business Intelligence Suite Document Version: 4.1 Support Package Patch 3.x Update Guide

OnCommand Performance Manager 1.1

Running Kmeans Mapreduce code on Amazon AWS

How to create connections with SAP BusinessObjects BI 4.0

How To Set Up A Backupassist For An Raspberry Netbook With A Data Host On A Nsync Server On A Usb 2 (Qnap) On A Netbook (Qnet) On An Usb 2 On A Cdnap (

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Test Case 3 Active Directory Integration

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

Installation and Deployment

BUSINESS OBJECTS EDGE

Moving to Plesk Automation 11.5

CA Business Intelligence

Active Directory Management. Agent Deployment Guide

PineApp Surf-SeCure Quick

ecopy ShareScan v4.3 Pre-Installation Checklist

Hadoop MultiNode Cluster Setup

IGEL Universal Management. Installation Guide

HP Vertica Integration with SAP Business Objects: Tips and Techniques. HP Vertica Analytic Database

TIMETABLE ADMINISTRATOR S MANUAL

SIMIAN systems. Setting up a Sitellite development environment on Windows. Sitellite Content Management System

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

AWS Schema Conversion Tool. User Guide Version 1.0

OneLogin Integration User Guide

TP1: Getting Started with Hadoop

Oracle Managed File Getting Started - Transfer FTP Server to File Table of Contents

MySQL quick start guide

Lets Get Started In this tutorial, I will be migrating a Drupal CMS using FTP. The steps should be relatively similar for any other website.

QuickStart Guide for Managing Computers. Version 9.2

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

A Study of Data Management Technology for Handling Big Data

How To - Implement Single Sign On Authentication with Active Directory

Cloudera Manager Training: Hands-On Exercises

CycleServer Grid Engine Support Install Guide. version 1.25

Department of Veterans Affairs VistA Integration Adapter Release Enhancement Manual

VMware vsphere Big Data Extensions Administrator's and User's Guide

Implementing Failover Capabilities in Red Hat Network Satellite

Getting Started using the SQuirreL SQL Client

Tutorial: Using HortonWorks Sandbox 2.3 on Amazon Web Services

SUSE Cloud Installation: Best Practices Using an Existing SMT and KVM Environment

Extending Remote Desktop for Large Installations. Distributed Package Installs

Expresso Quick Install

Upgrading a Single Node Cisco UCS Director Express, page 2. Supported Upgrade Paths to Cisco UCS Director Express for Big Data, Release 2.

Oracle Fusion Middleware. 1 Oracle Team Productivity Center Server System Requirements. 2 Installing the Oracle Team Productivity Center Server

Setting Up Specify to use a Shared Workstation as a Database Server

Cloudera ODBC Driver for Apache Hive Version

Single Node Hadoop Cluster Setup

RingStor User Manual. Version 2.1 Last Update on September 17th, RingStor, Inc. 197 Route 18 South, Ste 3000 East Brunswick, NJ

CDH installation & Application Test Report

Setting up DCOM for Windows XP. Research

Reference and Troubleshooting: FTP, IIS, and Firewall Information

18.2 user guide No Magic, Inc. 2015

The SkySQL Administration Console

How to install Apache Hadoop in Ubuntu (Multi node setup)

Defender Token Deployment System Quick Start Guide

FTP, IIS, and Firewall Reference and Troubleshooting

Cloud Services ADM. Agent Deployment Guide

Tool Tip. SyAM Management Utilities and Non-Admin Domain Users

QuickStart Guide for Client Management. Version 8.7

Setting up the Oracle Warehouse Builder Project. Topics. Overview. Purpose

IceWarp Server. Log Analyzer. Version 10

This handout describes how to start Hadoop in distributed mode, not the pseudo distributed mode which Hadoop comes preconfigured in as on download.

Transcription:

Integrating SAP BusinessObjects with Hadoop Using a multi-node Hadoop Cluster May 17, 2013

SAP BO HADOOP INTEGRATION Contents 1. Installing a Single Node Hadoop Server... 2 2. Configuring a Multi-Node Hadoop Cluster... 6 3. Configuring Hive Data Warehouse... 10 4. Integrating SAP BusinessObjects with Hadoop... 12 1

1. Installing a Single Node Hadoop Server Installing a single node Hadoop server involves the following steps 1. Install a stable Linux OS(Preferably CENT OS) with ssh, rsync and recent jdk from Oracle. 2. Download Hadoop.rpm(Equivalent to windows.exe) from the apache website. 3. Install the downloaded file with rpm or yum package manager. 4. Apache provides generic configuration options (mentioned below) that can be deployed by executing the scripts packed with the.rpm file. 5. Execute the configuration process by running the hadoop-setup-conf.sh script with root privilege. Select the default option for config, log, pid, NameNode, DataNode, job-tracker and task-tracker directories and provide the system name for NameNode and DataNode hosts. 6. To install single node server.conf files, run hadoop-setup-single-node.sh script with root privilege and select the default option for all categories. 7. Setup the single node and start Hadoop services by running hadoop-setup-hdfs.sh script with root privilege. The.rpm file used comes with some basic examples like wordcount, pi, teragen etc. This can be used to test if all the services are working. 8. Hadoop requires six different services to run for perfect functioning. (a) Hadoop NameNode (b) Hadoop DataNode (c) Hadoop JobTracker (d) Hadoop TaskTracker (e) Hadoop Secondary NameNode (f) Hadoop History Server 9. If all services are running then the single node cluster is ready for operation. 10. Hadoop services status can be checked with the following linux commands. $root : service hadoop-namenode status (These services are located in /etc/init.d dir) 11. Similarly to start or stop services service Linux command can be used. $root : service hadoop-datanode start $root : service hadoop-jobtracker stop. For more Detailed Info on Hadoop Services: http://www.cloudera.com, http://www.wikipedia.org For more Installation Options: http://hadoop.apache.org 2

Hadoop Running Services can be monitored through the web interfaces. NameNode DataNode 3

JobTracker TaskTracker 4

Hadoop Basic Commands 5

2. Configuring a Multi-Node Hadoop Cluster Single node Hadoop server can be expanded to a Hadoop cluster. In cluster mode the Hadoop NameNode will have many live DataNode and many TaskTracker. Steps involved in the installation of multi-node Hadoop cluster. 1. Install stable Linux (preferably CENT OS) in all machines (master and slaves). 2. Install Hadoop in all machines using Hadoop RPM from Apache. 3. Update /etc/hosts file in each machine, so that every single node in cluster knows the IP address of all other nodes. 4. In Master node /etc/hadoop directory update the master and slaves file with the domain names of master node and slaves nodes respectively. 5. Generate SSH key pair for the master node and place the public key in all the slave nodes. This will enable password-less ssh login from master to all slaves 6. Run the script hadoop-setup-conf.sh in all nodes. In master let all nodes point to the master Url. In slaves update NameNode and JobTracker urls to point to master node, other urls point to the localhost. 7. Open firewall ports for communication in both master and slave nodes. 8. In master run the command start-dfs.sh, this will start NameNode (In master) and DataNodes (Both Master and Slaves) 9. In master run the command start-mapred.sh, this will start JobTracker (In master) and TaskTracker (Both Master and Slaves). 10. Now the NameNode and JobTracker will have more active nodes compared to single node server. For More configuration options, refer: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/, http://hadoop.apache.org/docs/stable/cluster_setup.html 6

Some Screenshots of the Multi-node Hadoop Cluster at work NameNode DataNode 7

List of DataNodes List of TaskTrackers 8

JobTracker Job Status TaskTracker Task Status 9

3. Configuring Hive Data Warehouse Hive Data Warehousing environment runs on top of Hadoop. It performs ETL at run time and makes data available for reporting. Hive has to be installed initially and then hosted as a service using Hive-Server option. Steps Involved in Configuring Hive 1. Install and Configure Hadoop on all machines and make sure all the services are running. 2. Download Hive from Apache website. 3. Now install MySQL for HIVE metadata storage or just configure the default Derby Database. Any RDBMS system can be used for Hive metadata. This can be done by placing the correct JDBC connector in the hive lib directory. For detailed info on connectivity follow this link https://ccp.cloudera.com/display/cdhdoc/hive+installation#hiveinstallation-hiveconfiguration 4. Copy the needed.jar files to the required directories as per the instructions in the above link. 5. Now go to /bin directory in Hive package folder and execute hive command. 6. Queries can now be executed in the shell. 7. Hive Web Interface can be started by executing hive command as -> hive --service hwi. 8. Hive Thrift Server can be started by executing hive command as -> hive --service hiveserver. 9. Open the Hive server port (default 10000) in firewall for connection through JDBC. 10. If security is needed for hive server then configure Kerberos network authentication and bind it to hive server. For more information, refer http://www.cloudera.com. For more config options: http://hive.apache.org For Hive JDBC Connection:https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC 10

Screenshots of the Hive Server Hive Web Interface Hive Command Line 11

4. Integrating SAP BusinessObjects with Hadoop Universe Design Using IDT Steps Involved in Configuring SAP BusinessObjects for use with Hadoop 1. Configure SAP BusinessObjects with Hive JDBC drivers, if the server is of a version lower than BO 4.0 with SP5. In BO Server 4 SP5, SAP Provides Hive connectivity by default. In order to configure JDBC drivers in earlier versions refer to page 77 of this document http://help.sap.com/businessobject/product_guides/boexir4/en/xi4sp4_data_acs_en.pdf. 2. Create BO universe. 1. Open SAP IDT and create a user session with login credentials. 2. Under sessions, open connections folder. Create a new Relational connection. 12

3. Under Driver selection menu, select Apache -> Hadoop Hive -> JDBC Drivers. 4. In the next tab enter The Database URL:port, Username & Password and Click Test Connectivity. If it is successful, save the connection by clicking finish. 5. Now create a new project in IDT and create a shortcut for the above connection in the project. 13

6. Now create a new Data Foundation layer and bind the connection with the data foundation layer. 7. This connection will be used by Data Foundation layer to import data from Hive Server. 8. From the Data Foundation layer, drag and drop the tables which are needed by the universe. Create views in the Data foundation if required. 9. Create a new Business layer and bind the data foundation layer with the business layer. 10. Attributes can be set as measures with suitable aggregators in Data Foundation Layer. 11. Right click the business layer and select Publish -> Publish to Repository. Use integrity before publishing to check dependencies 12. Now log on to CMC and Set universe access policy for users. 13. Now Open WEBI Launchpad or Rich Client and select Universe as Source. The Published universe must be listed. For Detailed Info Refer http://scn.sap.com, http://help.sap.com 14

Some Screenshots of Universe Design Data Foundation Layer Business Layer 15

Convert To Measure Publish Universe 16

3. Create reports Published universe can be accessed through WEBI, Dashboards or Crystal Reports. Select Hive universe as Data Source and build queries using the Query Panel. Universe will convert user queries to HiveQL Statements and return the results for the report. Some Screenshots of Text Processing Reports WEBI Mobile Report on Word Count 17