Integrating SAP BusinessObjects with Hadoop Using a multi-node Hadoop Cluster May 17, 2013
SAP BO HADOOP INTEGRATION Contents 1. Installing a Single Node Hadoop Server... 2 2. Configuring a Multi-Node Hadoop Cluster... 6 3. Configuring Hive Data Warehouse... 10 4. Integrating SAP BusinessObjects with Hadoop... 12 1
1. Installing a Single Node Hadoop Server Installing a single node Hadoop server involves the following steps 1. Install a stable Linux OS(Preferably CENT OS) with ssh, rsync and recent jdk from Oracle. 2. Download Hadoop.rpm(Equivalent to windows.exe) from the apache website. 3. Install the downloaded file with rpm or yum package manager. 4. Apache provides generic configuration options (mentioned below) that can be deployed by executing the scripts packed with the.rpm file. 5. Execute the configuration process by running the hadoop-setup-conf.sh script with root privilege. Select the default option for config, log, pid, NameNode, DataNode, job-tracker and task-tracker directories and provide the system name for NameNode and DataNode hosts. 6. To install single node server.conf files, run hadoop-setup-single-node.sh script with root privilege and select the default option for all categories. 7. Setup the single node and start Hadoop services by running hadoop-setup-hdfs.sh script with root privilege. The.rpm file used comes with some basic examples like wordcount, pi, teragen etc. This can be used to test if all the services are working. 8. Hadoop requires six different services to run for perfect functioning. (a) Hadoop NameNode (b) Hadoop DataNode (c) Hadoop JobTracker (d) Hadoop TaskTracker (e) Hadoop Secondary NameNode (f) Hadoop History Server 9. If all services are running then the single node cluster is ready for operation. 10. Hadoop services status can be checked with the following linux commands. $root : service hadoop-namenode status (These services are located in /etc/init.d dir) 11. Similarly to start or stop services service Linux command can be used. $root : service hadoop-datanode start $root : service hadoop-jobtracker stop. For more Detailed Info on Hadoop Services: http://www.cloudera.com, http://www.wikipedia.org For more Installation Options: http://hadoop.apache.org 2
Hadoop Running Services can be monitored through the web interfaces. NameNode DataNode 3
JobTracker TaskTracker 4
Hadoop Basic Commands 5
2. Configuring a Multi-Node Hadoop Cluster Single node Hadoop server can be expanded to a Hadoop cluster. In cluster mode the Hadoop NameNode will have many live DataNode and many TaskTracker. Steps involved in the installation of multi-node Hadoop cluster. 1. Install stable Linux (preferably CENT OS) in all machines (master and slaves). 2. Install Hadoop in all machines using Hadoop RPM from Apache. 3. Update /etc/hosts file in each machine, so that every single node in cluster knows the IP address of all other nodes. 4. In Master node /etc/hadoop directory update the master and slaves file with the domain names of master node and slaves nodes respectively. 5. Generate SSH key pair for the master node and place the public key in all the slave nodes. This will enable password-less ssh login from master to all slaves 6. Run the script hadoop-setup-conf.sh in all nodes. In master let all nodes point to the master Url. In slaves update NameNode and JobTracker urls to point to master node, other urls point to the localhost. 7. Open firewall ports for communication in both master and slave nodes. 8. In master run the command start-dfs.sh, this will start NameNode (In master) and DataNodes (Both Master and Slaves) 9. In master run the command start-mapred.sh, this will start JobTracker (In master) and TaskTracker (Both Master and Slaves). 10. Now the NameNode and JobTracker will have more active nodes compared to single node server. For More configuration options, refer: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/, http://hadoop.apache.org/docs/stable/cluster_setup.html 6
Some Screenshots of the Multi-node Hadoop Cluster at work NameNode DataNode 7
List of DataNodes List of TaskTrackers 8
JobTracker Job Status TaskTracker Task Status 9
3. Configuring Hive Data Warehouse Hive Data Warehousing environment runs on top of Hadoop. It performs ETL at run time and makes data available for reporting. Hive has to be installed initially and then hosted as a service using Hive-Server option. Steps Involved in Configuring Hive 1. Install and Configure Hadoop on all machines and make sure all the services are running. 2. Download Hive from Apache website. 3. Now install MySQL for HIVE metadata storage or just configure the default Derby Database. Any RDBMS system can be used for Hive metadata. This can be done by placing the correct JDBC connector in the hive lib directory. For detailed info on connectivity follow this link https://ccp.cloudera.com/display/cdhdoc/hive+installation#hiveinstallation-hiveconfiguration 4. Copy the needed.jar files to the required directories as per the instructions in the above link. 5. Now go to /bin directory in Hive package folder and execute hive command. 6. Queries can now be executed in the shell. 7. Hive Web Interface can be started by executing hive command as -> hive --service hwi. 8. Hive Thrift Server can be started by executing hive command as -> hive --service hiveserver. 9. Open the Hive server port (default 10000) in firewall for connection through JDBC. 10. If security is needed for hive server then configure Kerberos network authentication and bind it to hive server. For more information, refer http://www.cloudera.com. For more config options: http://hive.apache.org For Hive JDBC Connection:https://cwiki.apache.org/Hive/hiveclient.html#HiveClient-JDBC 10
Screenshots of the Hive Server Hive Web Interface Hive Command Line 11
4. Integrating SAP BusinessObjects with Hadoop Universe Design Using IDT Steps Involved in Configuring SAP BusinessObjects for use with Hadoop 1. Configure SAP BusinessObjects with Hive JDBC drivers, if the server is of a version lower than BO 4.0 with SP5. In BO Server 4 SP5, SAP Provides Hive connectivity by default. In order to configure JDBC drivers in earlier versions refer to page 77 of this document http://help.sap.com/businessobject/product_guides/boexir4/en/xi4sp4_data_acs_en.pdf. 2. Create BO universe. 1. Open SAP IDT and create a user session with login credentials. 2. Under sessions, open connections folder. Create a new Relational connection. 12
3. Under Driver selection menu, select Apache -> Hadoop Hive -> JDBC Drivers. 4. In the next tab enter The Database URL:port, Username & Password and Click Test Connectivity. If it is successful, save the connection by clicking finish. 5. Now create a new project in IDT and create a shortcut for the above connection in the project. 13
6. Now create a new Data Foundation layer and bind the connection with the data foundation layer. 7. This connection will be used by Data Foundation layer to import data from Hive Server. 8. From the Data Foundation layer, drag and drop the tables which are needed by the universe. Create views in the Data foundation if required. 9. Create a new Business layer and bind the data foundation layer with the business layer. 10. Attributes can be set as measures with suitable aggregators in Data Foundation Layer. 11. Right click the business layer and select Publish -> Publish to Repository. Use integrity before publishing to check dependencies 12. Now log on to CMC and Set universe access policy for users. 13. Now Open WEBI Launchpad or Rich Client and select Universe as Source. The Published universe must be listed. For Detailed Info Refer http://scn.sap.com, http://help.sap.com 14
Some Screenshots of Universe Design Data Foundation Layer Business Layer 15
Convert To Measure Publish Universe 16
3. Create reports Published universe can be accessed through WEBI, Dashboards or Crystal Reports. Select Hive universe as Data Source and build queries using the Query Panel. Universe will convert user queries to HiveQL Statements and return the results for the report. Some Screenshots of Text Processing Reports WEBI Mobile Report on Word Count 17