Hadoop and Hive. Introduction,Installation and Usage. Saatvik Shah. Data Analytics for Educational Data. May 23, 2014



Similar documents
11/18/15 CS q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in.

研 發 專 案 原 始 程 式 碼 安 裝 及 操 作 手 冊. Version 0.1

Installation and Configuration Documentation

CASE STUDY OF HIVE USING HADOOP 1

Hadoop Installation. Sandeep Prasad

Single Node Hadoop Cluster Setup

How To Install Hadoop From Apa Hadoop To (Hadoop)

Approaches for parallel data loading and data querying

INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)

Installing Hadoop. Hortonworks Hadoop. April 29, Mogulla, Deepak Reddy VERSION 1.0

Lecture 2 (08/31, 09/02, 09/09): Hadoop. Decisions, Operations & Information Technologies Robert H. Smith School of Business Fall, 2015

TP1: Getting Started with Hadoop

Apache Hadoop 2.0 Installation and Single Node Cluster Configuration on Ubuntu A guide to install and setup Single-Node Apache Hadoop 2.

Data Analytics. CloudSuite1.0 Benchmark Suite Copyright (c) 2011, Parallel Systems Architecture Lab, EPFL. All rights reserved.

HDFS. Hadoop Distributed File System

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

HADOOP. Installation and Deployment of a Single Node on a Linux System. Presented by: Liv Nguekap And Garrett Poppe

E6893 Big Data Analytics: Demo Session for HW I. Ruichi Yu, Shuguan Yang, Jen-Chieh Huang Meng-Yi Hsu, Weizhen Wang, Lin Haung.

Map Reduce & Hadoop Recommended Text:

Big Data Too Big To Ignore

Advanced SQL Query To Flink Translator

HADOOP - MULTI NODE CLUSTER

HSearch Installation

Big Data on Microsoft Platform

Big Data Course Highlights

Large scale processing using Hadoop. Ján Vaňo

Hadoop. Sunday, November 25, 12

CREDIT CARD DATA PROCESSING AND E-STATEMENT GENERATION WITH USE OF HADOOP

Big Data Analytics by Using Hadoop

MapReduce, Hadoop and Amazon AWS

How to install Apache Hadoop in Ubuntu (Multi node/cluster setup)

Performance Overhead on Relational Join in Hadoop using Hive/Pig/Streaming - A Comparative Analysis

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop Lab - Setting a 3 node Cluster. Java -

Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau

Comparison of Different Implementation of Inverted Indexes in Hadoop

CSE-E5430 Scalable Cloud Computing Lecture 2

Chapter 7. Using Hadoop Cluster and MapReduce

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Installing Hadoop. You need a *nix system (Linux, Mac OS X, ) with a working installation of Java 1.7, either OpenJDK or the Oracle JDK. See, e.g.

Tableau Spark SQL Setup Instructions

Workshop on Hadoop with Big Data

Using distributed technologies to analyze Big Data

HIVE. Data Warehousing & Analytics on Hadoop. Joydeep Sen Sarma, Ashish Thusoo Facebook Data Team

Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech

2.1 Hadoop a. Hadoop Installation & Configuration

Hadoop Distributed File System and Map Reduce Processing on Multi-Node Cluster

SCHOOL OF SCIENCE & ENGINEERING. Installation and configuration system/tool for Hadoop

Alternatives to HIVE SQL in Hadoop File Structure

DESIGN ARCHITECTURE-BASED ON WEB SERVER AND APPLICATION CLUSTER IN CLOUD ENVIRONMENT

Enhancing Massive Data Analytics with the Hadoop Ecosystem

Set JAVA PATH in Linux Environment. Edit.bashrc and add below 2 lines $vi.bashrc export JAVA_HOME=/usr/lib/jvm/java-7-oracle/

Running Kmeans Mapreduce code on Amazon AWS

BIG DATA WEB ORGINATED TECHNOLOGY MEETS TELEVISION BHAVAN GHANDI, ADVANCED RESEARCH ENGINEER SANJEEV MISHRA, DISTINGUISHED ADVANCED RESEARCH ENGINEER

Hadoop (pseudo-distributed) installation and configuration

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

American International Journal of Research in Science, Technology, Engineering & Mathematics

NetFlow Analysis with MapReduce

Introduction to HDFS. Prasanth Kothuri, CERN

CS242 PROJECT. Presented by Moloud Shahbazi Spring 2015

How To Scale Out Of A Nosql Database

Hadoop implementation of MapReduce computational model. Ján Vaňo

CactoScale Guide User Guide. Athanasios Tsitsipas (UULM), Papazachos Zafeirios (QUB), Sakil Barbhuiya (QUB)

Hadoop Training Hands On Exercise

L1: Introduction to Hadoop

Distributed Filesystems

BIG DATA USING HADOOP

Hadoop and Hive Development at Facebook. Dhruba Borthakur Zheng Shao {dhruba, Presented at Hadoop World, New York October 2, 2009

Single Node Setup. Table of contents

Scalable Cloud Computing Solutions for Next Generation Sequencing Data

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

How to install Apache Hadoop in Ubuntu (Multi node setup)

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

BIG DATA TRENDS AND TECHNOLOGIES

Hadoop Big Data for Processing Data and Performing Workload

Cassandra Installation over Ubuntu 1. Installing VMware player:

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Setup Hadoop On Ubuntu Linux. ---Multi-Node Cluster

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

BIG DATA ANALYSIS USING RHADOOP

The objective of this lab is to learn how to set up an environment for running distributed Hadoop applications.

APACHE HADOOP JERRIN JOSEPH CSU ID#

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

A Brief Outline on Bigdata Hadoop

Big Data Weather Analytics Using Hadoop

Apache Hadoop. Alexandru Costan

Big Data Analytics(Hadoop) Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide

White Paper. Big Data and Hadoop. Abhishek S, Java COE. Cloud Computing Mobile DW-BI-Analytics Microsoft Oracle ERP Java SAP ERP

Leveraging SAP HANA & Hortonworks Data Platform to analyze Wikipedia Page Hit Data

A Study of Data Management Technology for Handling Big Data

Keyword: YARN, HDFS, RAM

Big Data: Study in Structured and Unstructured Data

Transcription:

Hadoop and Hive Introduction,Installation and Usage Saatvik Shah Data Analytics for Educational Data May 23, 2014 Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 1 / 15

1 Big Data 2 Hadoop What is Hadoop? HDFS Installing Hadoop Prerequisites Download and Environment Configuring Hadoop Hadoop Usage 3 Hive Introduction Hive vs. RDBMS Hive Installation Hive Usage 4 References Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 2 / 15

Big Data Big Data Overview and Analysis 1 3 Dimensions[1] 1 Volume 2 Velocity 3 Variety 2 Data is Complex and Structured or Unstructured 3 Examples[2] 1 The Internet Archive stores around 2 petabytes of data, and is growing at a rate of 20 terabytes per month 2 The Large Hadron Collider near Geneva, Switzerland, will produce about 15 petabytes of data per year 3 Facebook hosts approximately 10 billion photos, taking up one petabyte of storage Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 3 / 15

Hadoop What is Hadoop? Hadoop What is Hadoop? 1 Apache Hadoop is an Open Source Project used for managing Big Data 2 Hadoop Distributed File System(HDFS)[2] 1 Distributed File System 2 Fault Tolerant 3 Low Cost Hardware - Every server is treated as an individual node 3 Hadoop Architecture[2] 1 Processors : 8-12 cores/node 2 Number of Nodes : 8-400 3 Disk Space Supported : Variable(few GB to many TB) Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 4 / 15

Hadoop HDFS Hadoop HDFS Keywords Name Node,Data Node,Rack,Secondary Name Node,Replication Factor,Heartbeats[3] Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 5 / 15

Hadoop Installing Hadoop Hadoop Installing Hadoop-Prerequisites Prerequisites[3] 1 Hadoop Client/User 1 sudo addgroup hadoop 2 sudo adduser ingroup hadoop hduser 3 sudo adduser hduser sudo 2 Java JDK(6 or higher) 3 SSH 1 sudo apt-get install openjdk-7-jdk 2 cd /usr/lib/jvm 3 ln -s java-7-openjdk-amd64 jdk 1 sudo apt-get install openssh-server 2 ssh-keygen -t rsa -P 3 ssh localhost Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 6 / 15

Hadoop Installing Hadoop Hadoop Installing Hadoop-Download and Environment[3] Download[3] 1 wget http://apache.mirrors.lucidnetworks.net/hadoop/common/stable/hadoop- 2.2.0.tar.gz 2 sudo tar vxzf hadoop-2.2.0.tar.gz -C /usr/local 3 cd /usr/local 4 sudo mv hadoop-2.2.0 hadoop 5 sudo chown -R hduser:hadoop hadoop Environment Setup - Add the following to.bashrc[3] 1 export JAVA HOME =/usr/lib/jvm/jdk/ 2 export HADOOP INSTALL=/usr/local/hadoop 3 export PATH=$PATH:$HADOOP INSTALL/bin 4 sudo mv hadoop-2.2.0 hadoop 5 sudo chown -R hduser:hadoop hadoop Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 7 / 15

Hadoop Installing Hadoop Hadoop Installing Hadoop-Configuration User based Configurations[3] 1 /usr/local/hadoop/etc/hadoop/core-site.xml 2 /usr/local/hadoop/etc/hadoop/yarn-site.xml 3 /usr/local/hadoop/etc/hadoop/mapred-site.xml.template 4 /usr/local/hadoop/etc/hadoop/hdfs-site.xml Launch![3] 1 mkdir -p mydata/hdfs/namenode 2 mkdir -p mydata/hdfs/datanode 3 hdfs namenode -format 4 start-dfs.sh 5 start-yarn.sh Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 8 / 15

Hadoop Hadoop Usage Hadoop Hadoop Usage 1 Hadoop Commands are similar to Bash Commands 2 Setup 1 Create a file : touch test.txt 2 echo Hello World of HDFS > test.txt 3 Start HDFS : start-dfs.sh,start-yarn.sh 4 HDFS Details : http://localhost:50070/dfshealth.jsp(mine at) 3 Make Directory in HDFS 1 hadoop fs -mkdir -p /user/hduser/docs/examples/ 4 List Files in a Directory and its Subdirectories 1 hadoop fs -ls /user/hduser/ 5 Copy From/To Local Machine to/from HDFS 1 hadoop fs -copyfromlocal test.txt /user/hduser/docs/examples/test.txt 2 hadoop fs -copytolocal /user/hduser/docs/examples/test.txt /tmp/test.txt Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 9 / 15

Hive Introduction Hive Introduction 1 SQL for Hadoop : HQL(Hive Query Language) supports DDL and DML statements [4, 5] 2 Hive Architecture[4, 5] 1 Hive Metastore:Schemas and Statistics for Data Acquisition and Query Optimization 2 Tables:Analogous to Relational Databases with each table having a HDFS directory 3 Partitions:Data in a table directory is partitioned into subdirectories of the directory 4 Buckets:Data in each partition may in turn be divided into buckets based on the hash of a column in the table Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 10 / 15

Hive Hive vs. RDBMS Hive Hive vs. RDBMS [6] Relational DBMS 1 Small size Data 2 Structured Data 3 Real Time Response and Low Latency Hive 1 Large Size Data 2 Structured or Unstructured Data 3 Scalable,Extensible,Batch Job Handling aatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 11 / 15

Hive Hive Installation Hive Hive Installation Download and Setup[7] 1 wget http://apache.mirrors.hoobly.com/hive/stable/apachehive-0.13.0-bin.tar.gz 2 sudo tar -zxvf (hive install) 3 sudo mv (hive-install) /usr/local/hive Environment Configuration 1 export HIVE PREFIX=/usr/local/hive 2 export PATH=$PATH:$HIVE PREFIX/bin Launch:hive Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 12 / 15

Hive Hive Usage Hive Hive Usage[7] Launch Hive $hive Create Table CREATE TABLE books(id INT,name STRING,author STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY, STORED AS TEXTFILE; Loading Data LOAD DATA (LOCAL) INPATH books.txt INTO TABLE books; Extracting Data SELECT * FROM books LIMIT 10; Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 13 / 15

References References I [1] IBM Documentation of Big Data. Available at http://www-01.ibm.com/software/in/data/bigdata/. Downloaded in March 2014. [2] T. White, Hadoop: The Definitive Guide: The Definitive Guide. O Reilly Media, 2009. [3] Apache Documentation. Available at http://hadoop.apache.org/docs/r0.18.2/. Downloaded in May 2014. [4] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, Hive: a warehousing solution over a map-reduce framework, Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1626 1629, 2009. Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 14 / 15

References References II [5] A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, S. Antony, H. Liu, and R. Murthy, Hive-a petabyte scale data warehouse using hadoop, in Data Engineering (ICDE), 2010 IEEE 26th International Conference on, pp. 996 1005, IEEE, 2010. [6] W. Chen, K.-C. Yin, D.-L. Yang, and M.-C. Hung, Data migration from grid to cloud computing., Applied Mathematics & Information Sciences, vol. 7, no. 1, 2013. [7] Hive Documentation Wiki. Available at https://cwiki.apache.org/confluence/display/hive. Downloaded in May 2014. Saatvik Shah (Data Analytics for Educational Data) Hadoop and Hive May 23, 2014 15 / 15