Tableau Spark SQL Setup Instructions



Similar documents
HADOOP - MULTI NODE CLUSTER

Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Hadoop Lab - Setting a 3 node Cluster. Java -

Hadoop Installation. Sandeep Prasad

Installing Hadoop. Hortonworks Hadoop. April 29, Mogulla, Deepak Reddy VERSION 1.0

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

HADOOP. Installation and Deployment of a Single Node on a Linux System. Presented by: Liv Nguekap And Garrett Poppe

HSearch Installation

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.

Installation and Configuration Documentation

Pivotal HD Enterprise 1.0 Stack and Tool Reference Guide. Rev: A03

Pivotal HD Enterprise

Hadoop (pseudo-distributed) installation and configuration

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

HADOOP CLUSTER SETUP GUIDE:

How To Install Hadoop From Apa Hadoop To (Hadoop)

Architecting the Future of Big Data

Single Node Hadoop Cluster Setup

Deploy Apache Hadoop with Emulex OneConnect OCe14000 Ethernet Network Adapters

Platfora Installation Guide

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Setting up Hadoop with MongoDB on Windows 7 64-bit

docs.hortonworks.com

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

Installation Guide Setting Up and Testing Hadoop on Mac By Ryan Tabora, Think Big Analytics

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

Cloudera Manager Installation Guide

OpenDaylight & PacketFence install guide. for PacketFence version 4.5.0

AWS Schema Conversion Tool. User Guide Version 1.0

Deploying Cloudera CDH (Cloudera Distribution Including Apache Hadoop) with Emulex OneConnect OCe14000 Network Adapters

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

GroundWork Monitor Open Source Installation Guide

Partek Flow Installation Guide

Running Kmeans Mapreduce code on Amazon AWS

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

docs.hortonworks.com

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

About this Tutorial. Audience. Prerequisites. Copyright & Disclaimer

CN=Monitor Installation and Configuration v2.0

Spectrum Scale HDFS Transparency Guide

1. GridGain In-Memory Accelerator For Hadoop. 2. Hadoop Installation. 2.1 Hadoop 1.x Installation

Architecting the Future of Big Data

Net/FSE Installation Guide v1.0.1, 1/21/2008

Hadoop Installation MapReduce Examples Jake Karnes

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

OPENPROJECT INSTALL ON CENTOS 7 RUNNING IN VMWARE PLAYER

Control-M for Hadoop. Technical Bulletin.

TP1: Getting Started with Hadoop

Cloudera ODBC Driver for Apache Hive Version

Enterprise Manager. Version 6.2. Installation Guide

docs.hortonworks.com

Hadoop Distributed File System and Map Reduce Processing on Multi-Node Cluster

Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014

System and Network Monitoring With Zabbix

Revolution R Enterprise 7 Hadoop Configuration Guide

OpenGeo Suite for Linux Release 3.0

Perforce Helix Threat Detection On-Premise Deployment Guide

Managing SAS Web Infrastructure Platform Data Server High-Availability Clusters

Installing Dspace 1.8 on Ubuntu 12.04

Installation Guide. Copyright (c) 2015 The OpenNMS Group, Inc. OpenNMS SNAPSHOT Last updated :19:20 EDT

IMPLEMENTATION OF CIPA - PUDUCHERRY UT SERVER MANAGEMENT. Client/Server Installation Notes - Prepared by NIC, Puducherry UT.

Deploying Apache Hadoop with Colfax and Mellanox VPI Solutions

Running Knn Spark on EC2 Documentation

How to Install Multicraft on a VPS or Dedicated Server (Ubuntu bit)

OS Installation: CentOS 5.8

Windows Template Creation Guide. How to build your own Windows VM templates for deployment in Cloudturk.

Hadoop Multi-node Cluster Installation on Centos6.6

Deploying MongoDB and Hadoop to Amazon Web Services

CycleServer Grid Engine Support Install Guide. version 1.25

Parallels Plesk Automation

1. Product Information

Platfora Deployment Planning Guide

Online Backup Client User Manual Linux

Hadoop Training Hands On Exercise

IUCLID 5 Guidance and support. Installation Guide Distributed Version. Linux - Apache Tomcat - PostgreSQL

Kony MobileFabric. Sync Windows Installation Manual - WebSphere. On-Premises. Release 6.5. Document Relevance and Accuracy

Integrating Apache Web Server with Tomcat Application Server

CDH installation & Application Test Report

Cloud.com CloudStack Community Edition 2.1 Beta Installation Guide

This handout describes how to start Hadoop in distributed mode, not the pseudo distributed mode which Hadoop comes preconfigured in as on download.

Quick Deployment Step-by-step instructions to deploy Oracle Big Data Lite Virtual Machine

Setting Up CAS with Ofbiz 5

Quick Start Guide for VMware and Windows 7

RecoveryVault Express Client User Manual

How to Configure a BYOD Environment with the Unified AP in Standalone Mode

1. Technical requirements 2. Installing Microsoft SQL Server Configuring the server settings

Hadoop Setup Walkthrough

Local Caching Servers (LCS): User Manual

Data Domain Profiling and Data Masking for Hadoop

Online Backup Linux Client User Manual

Online Backup Client User Manual

Platfora Installation Guide

Cloudera ODBC Driver for Apache Hive Version 2.5.5

AWS Schema Conversion Tool. User Guide Version 1.0

SAS Data Loader 2.1 for Hadoop

Cloudera Distributed Hadoop (CDH) Installation and Configuration on Virtual Box

Transcription:

Tableau Spark SQL Setup Instructions 1. Prerequisites 2. Configuring Hive 3. Configuring Spark & Hive 4. Starting the Spark Service and the Spark Thrift Server 5. Connecting Tableau to Spark SQL 5A. Install Tableau DevBuild 8.2.3+ 5B. Install the Spark SQL ODBC 5C. Opening a Spark SQL ODBC Connection 6. Appendix: SparkSQL 1.1 Patch Installation Steps 6A. Pre-Requisites: 6B. Apache Hadoop Install: Install Java: Install Hadoop: Edit Config Files Start Hadoop and format namenode Create HDFS directories Install PostgreSQL Install and configure Hive Configure PostgreSQL as Hive metastore Start metastore and hiveserver2 services To Shutdown Hadoop: To Start Hadoop: Loading TestV1_v2 data via Beeline Spark SQL 1.1.patch Install 1. Prerequisites There are a number of prerequisites required to be able to run Tableau with Spark SQL. The main requirements are: Server Side: Spark V1.2 - please use the 1.2 branch at https://github.com/apache/spark/ tree/branch-1.2 Hadoop V2.4 or higher Hive V0.12 or V0.13 Client Side: Tableau 8.3.1+

Simba Spark ODBC Driver V1.0.4: http://databricks.com/spark-odbc-driver-download 2. Configuring Hive There are no special Hive configurations when using with Spark SQL If installing from scratch you can follow the Appendix 6B steps for our sample spark cluster configuration 3. Configuring Spark & Hive There are no special Spark configurations, the defaults will get you up and running See Appendix 6B for our sample cluster configuration 4. Starting the Spark Service and the Spark Thrift Server Verify that you have HiveServer2 running and you are using PostgreSQL or MySQL as a metastore then run the following $> $SPARK_HOME/sbin/start-master.sh $> $SPARK_HOME/sbin/start-slaves.sh $> $SPARK_HOME/sbin/start-thriftserver.sh --master spark://localhost:7077 -- driver-class-path $CLASSPATH --hiveconf hive.server2.thrift.bind.host localhost -- hiveconf hive.server2.thrift.port 10001 Note, we randomly choose port 10001 5. Connecting Tableau to Spark SQL 5A. Install Tableau DevBuild 8.3.1+ The first thing you must do is install the latest version of Tableau - anything 8.3.1 or later should work. The Spark SQL connection will be hidden in the product unless you install a special license key. Please e-mail Jackie Clough if you do not have the special license key. To install a new key you must go to Help -> Manage Product Keys

5B. Install the Spark SQL ODBC To install the Spark SQL ODBC driver, simply open the appropriate version of the driver for your system and follow the instructions: Windows 64-bit: SimbaSparkODBC64.msi Windows 32-bit: SimbaSparkODBC32.msi Max OSX: SimbaSparkODBC.dmg When installing the Mac driver, you may get a message that says SimbaSparkODBC.dmg can t be opened because it is from an unidentified developer. To allow the driver to be installed, go to Applications -> System Preferences -> Security & Privacy -> General Tab and click Open Anyways 5C. Opening a Spark SQL ODBC Connection If you have properly installed Tableau and the special license key, you should see Spark SQL (Beta) as one of the connection options after clicking Connect to Data. Select Spark SQL (Beta) and you will see a dialog box similar to below:

The parameters you need to enter include: Server: Server name or IP address of your Spark server Port: Default value of your Spark Thrift server port Type: Spark ThriftServer (Spark 1.1 and later) Authentication: User Name User Name: <blank>

6. Appendix: Spark SQL 1.1.x Installation Steps 6A. Pre-Requisites Sample: OS: CentOS 6.5 CPU: 2 dual core RAM: 16GB 6B. Apache Hadoop Install: Install Java: Copy/scp the java rpm file jdk-7u25-linux-x64.rpm to /tmp and extract/install with rpm: rpm -Uvh jdk-7u25-linux-x64.rpm Set environment variables: export JAVA_HOME=/usr/java/jdk1.7.0_25/ Verify java version: java -version Install Hadoop: ssh-keygen -t rsa -P "" cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh localhost ssh <actual server name> useradd hadoop cd / wget http://apache.mirror.uber.com.au/hadoop/common/current/hadoop- 2.4.1.tar.gz tar xzvf hadoop-2.4.1.tar.gz chown -R hadoop:hadoop /hadoop-2.4.1 Edit Config Files Edit the following config files located in /hadoop-2.4.1/etc/hadoop, These will vary depending on your environment but are provided here as a sample:

core-site.xml <configuration> <name>fs.defaultfs</name> <value>hdfs://localhost:8020</value> <final>true</final> <name>hadoop.tmp.dir</name> <value>/data/hadoop_data</value> <description>a base for other temporary directories.</description> </configuration> mapred-site.xml <configuration> <name>mapreduce.framework.name</name> <value>yarn</value> </configuration> yarn-site.xml <configuration> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.shufflehandler</value> <!-- To increase number of apps that can run in YARN --> <name>yarn.nodemanager.resource.cpu-vcores</name> <value>4</value> <name>yarn.nodemanager.resource.memory-mb</name> <value>8192</value> <name>yarn.scheduler.minimum-allocation-mb</name>

<value>512</value> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </configuration> In addition, add the following environment variables to ~/.bashrc: export JAVA_HOME=/usr/java/jdk1.7.0_25 export HADOOP_PREFIX=/hadoop-2.4.1 export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop export YARN_CONF_DIR=$HADOOP_CONF_DIR export PATH=$PATH:$HADOOP_PREFIX/bin export HADOOP_INSTALL=/hadoop-2.4.1 export HADOOP_HOME=/hadoop-2.4.1 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/" export HIVE_HOME=/usr/local/hive-0.12.0/ export PATH=$PATH:$HIVE_HOME/bin export SPARK_MASTER_PORT=7077 Start Hadoop and format namenode /hadoop-2.4.1/bin/hdfs namenode -format /hadoop-2.4.1/sbin/start-dfs.sh /hadoop-2.4.1/sbin/start-yarn.sh /hadoop-2.4.1/sbin/mr-jobhistory-daemon.sh start historyserver Create HDFS directories hdfs dfs -mkdir -p /user/root hdfs dfs -mkdir -p /user/hive hdfs dfs -mkdir -p /user/hive/metastore hdfs dfs mkdir /user/anonymous Install PostgreSQL

edit /etc/yum.repos.d/centos-base.repo by adding "exclude=postgresql*" to the "[base]" and "[update]" sections $>wget -O http://yum.postgresql.org/9.3/redhat/rhel-6-x86_64/pgdg-centos93-9.3-1.noarch.rpm $>rpm -Uvh pgdg-centos93-9.3.1.noarch.rpm $>yum install postgresql93-server $>service postgresql-9.3 initdb $>chkconfig postgresql-9.3 on $>service postgresql-9.3 start Install and configure Hive $>cd /usr/local $>wget http://apache.mirrors.hoobly.com/hive/hive-0.12.0/hive-0.12.0.tar.gz $>tar xvf hive-0.12.0.tar.gz Configure /usr/local/hive-0.12.0/conf/hive-site.xml to look something like this <configuration> <name>javax.jdo.option.connectionurl</name> <value>jdbc:postgresql://localhost/metastore</value> <name>javax.jdo.option.connectiondrivername</name> <value>org.postgresql.driver</value> <name>javax.jdo.option.connectionusername</name> <value>hiveuser</value> <name>javax.jdo.option.connectionpassword</name> <value>mypassword</value> <name>datanucleus.autocreateschema</name> <value>false</value>

<name>hive.metastore.uris</name> <value>thrift://localhost:9083</value> <description>ip address (or fully-qualified domain name) and port of the metastore host</description> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/metastore</value> </configuration> Configure PostgreSQL as Hive metastore Set "standard_conforming_strings" to off in /var/lib/pgsql/9.3/data/postgresql.conf standard_conforming_strings = off listen_addresses = '*' Allow remote access by adding the following to /var/lib/pgsql/9.3/data/pg_hba.conf under the IPv6 section host all all 0.0.0.0 0.0.0.0 password Restart service service postgresql-9.3 restart Install PostgreSQL JDBC Driver $>yum install postgresql-jdbc $>ln -s /usr/share/java/postgresql-jdbc.jar /usr/local/hive/lib/postgresqljdbc.jar Create metastore database and user account $>su - postgres $>psql CREATE USER hiveuser WITH PASSWORD 'password'; CREATE DATABASE metastore; \c metastore; \i /usr/local/hive-0.12.0/scripts/metastore/upgrade/postgres/hiveschema-0.12.0.postgres.sql \o /tmp/grant-privs SELECT 'GRANT SELECT,INSERT,UPDATE,DELETE ON "' schemaname '"."' tablename '" TO hiveuser;' FROM pg_tables WHERE tableowner = CURRENT_USER and schemaname = 'public'; \o \i /tmp/grant-privs Verify connection with hive user psql -h myhost -U hiveuser -d metastore Create softlink to hive-site.xml file ln -s /hadoop-2.4.1/etc/hadoop/hive-site.xml /usr/local/hive-0.12.0/conf/ hive-site.xml To Shutdown Hadoop:

$>/hadoop-2.4.1/sbin/mr-jobhistory-daemon.sh stop historyserver $>/hadoop-2.4.1/sbin/stop-yarn.sh $>/hadoop-2.4.1/sbin/stop-dfs.sh To Start Hadoop: $>/hadoop-2.4.1/sbin/start-dfs.sh $>/hadoop-2.4.1/sbin/start-yarn.sh $>/hadoop-2.4.1/sbin/mr-jobhistory-daemon.sh start historyserver Start metastore and hiveserver2 services $>mkdir /var/log/hive $>nohup hive --service metastore > /var/log/hive/metastore.log & $>nohup hive --service hiveserver2 > /var/log/hive/hiveserver2.log & Spark SQL 1.1.x Install build spark from source with maven $>cd /opt $>wget http://mirrors.koehn.com/apache/maven/maven-3/3.2.3/binaries/ apache-maven-3.2.3-bin.tar.gz $>tar xvf apache-maven-3.2.3-bin.tar.gz $>mv apache-maven-3.2.3 /opt/maven $>ln -s /opt/maven/bin/mvn /usr/bin/mvn $>vim /etc/profile.d/maven.sh Add the following contents: #!/bin/bash MAVEN_HOME=/opt/maven PATH=$MAVEN_HOME/bin:$PATH export PATH MAVEN_HOME export CLASSPATH=. Save and close the file. Make it executable using the following command. $>chmod +x /etc/profile.d/maven.sh Then, set the environment variables permanently by running the following command: $>source /etc/profile.d/maven.sh Get Spark source code $>mkdir /usr/local/spark-1.1.x-bin-hadoop2.4

$>cd /usr/local/spark-1.1.x-bin-hadoop2.4 $>wget https://github.com/apache/spark/archive/master.zip $>unzip master.zip $>mv spark-master/* /usr/local/spark-1.1.x-bin-hadoop2.4/ $>cd /usr/local/spark-1.1.x-bin-hadoop2.4 $>export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M - XX:ReservedCodeCacheSize=512m" $>mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package Wait for compiler to finish Configure Spark: The following is optional as the default setting work just fine. edit /usr/local/spark-1.1.x-bin-hadoop2.4/conf/spark-env.sh add the following, which will vary depending on your environment. Add the following under the Yarn configurations SPARK_EXECUTOR_CORES=4 #, Number of cores for the workers (Default: 1). SPARK_EXECUTOR_MEMORY=4G #, Memory per Worker (e.g. 1000M, 2G) (Default: 1G) SPARK_DRIVER_MEMORY=4G #, Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) Starting Spark: To start spark master/worker and hive-thriftserver connector run the following $>/usr/local/spark-1.1.x-bin-hadoop2.4/sbin/start-master.sh $>/usr/local/spark-1.1.x-bin-hadoop2.4/sbin/start-slaves.sh $>/usr/local/spark-1.1.x-bin-hadoop2.4/sbin/start-thriftserver.sh --master spark:// localhost:7077 --driver-class-path $CLASSPATH --hiveconf hive.server2.thrift.bind.host localhost --hiveconf hive.server2.thrift.port 10001