Big Data Hive! 2013-2014 Laurent d Orazio



Similar documents
Hive A Petabyte Scale Data Warehouse Using Hadoop

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Introduction to Apache Hive

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University

11/18/15 CS q Hadoop was not designed to migrate data from traditional relational databases to its HDFS. q This is where Hive comes in.

Information Systems SQL. Nikolaj Popov

Introduction to Apache Hive

MOC 20461C: Querying Microsoft SQL Server. Course Overview

Big Data. Donald Kossmann & Nesime Tatbul Systems Group ETH Zurich

How To Create A Table In Sql (Ahem)

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

SQL. Short introduction

BIG DATA HANDS-ON WORKSHOP Data Manipulation with Hive and Pig

Using distributed technologies to analyze Big Data

Introduction to Microsoft Jet SQL

Services. Relational. Databases & JDBC. Today. Relational. Databases SQL JDBC. Next Time. Services. Relational. Databases & JDBC. Today.

2874CD1EssentialSQL.qxd 6/25/01 3:06 PM Page 1 Essential SQL Copyright 2001 SYBEX, Inc., Alameda, CA

Oracle Database: SQL and PL/SQL Fundamentals NEW

Oracle Database: SQL and PL/SQL Fundamentals

Oracle SQL. Course Summary. Duration. Objectives

Advance DBMS. Structured Query Language (SQL)

Can the Elephants Handle the NoSQL Onslaught?

Oracle Database 12c: Introduction to SQL Ed 1.1

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Oracle Database 10g: Introduction to SQL

Database Migration from MySQL to RDM Server

Hadoop Distributed File System. -Kishan Patel ID#

Integration of Apache Hive and HBase

Duration Vendor Audience 5 Days Oracle End Users, Developers, Technical Consultants and Support Staff

Linas Virbalas Continuent, Inc.

Big Data. Facebook Friends Data on Amazon Elastic Cloud

Hadoop and Big Data Research

Introduction To Hive

D61830GC30. MySQL for Developers. Summary. Introduction. Prerequisites. At Course completion After completing this course, students will be able to:

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Integrate Master Data with Big Data using Oracle Table Access for Hadoop

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Oracle Database: SQL and PL/SQL Fundamentals

Comparing SQL and NOSQL databases

Xiaoming Gao Hui Li Thilina Gunarathne

American International Journal of Research in Science, Technology, Engineering & Mathematics

How To Use Facebook Data From A Microsoft Microsoft Hadoop On A Microsatellite On A Web Browser On A Pc Or Macode On A Macode Or Ipad On A Cheap Computer On A Network Or Ipode On Your Computer

HareDB HBase Client Web Version USER MANUAL HAREDB TEAM

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze

APACHE HADOOP JERRIN JOSEPH CSU ID#

CSE 530A Database Management Systems. Introduction. Washington University Fall 2013

Data storing and data access

Introduction to Database. Systems HANS- PETTER HALVORSEN,

Title. Syntax. stata.com. odbc Load, write, or view data from ODBC sources. List ODBC sources to which Stata can connect odbc list

SQL Server for developers. murach's TRAINING & REFERENCE. Bryan Syverson. Mike Murach & Associates, Inc. Joel Murach

SQL Server 2008 Core Skills. Gary Young 2011

Data Management in the Cloud PIG LATIN AND HIVE THANKS TO M. GROSSNIKLAUS

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park

Apache Hive. Big Data 2015

More on SQL. Juliana Freire. Some slides adapted from J. Ullman, L. Delcambre, R. Ramakrishnan, G. Lindstrom and Silberschatz, Korth and Sudarshan

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Advanced SQL. Jim Mason. Web solutions for iseries engineer, build, deploy, support, train

Big Data and Scripting Systems build on top of Hadoop

CASE STUDY OF HIVE USING HADOOP 1

Architecting the Future of Big Data

4 Logical Design : RDM Schema Definition with SQL / DDL

Introduction and Overview for Oracle 11G 4 days Weekends

BCA. Database Management System

COMP 5138 Relational Database Management Systems. Week 5 : Basic SQL. Today s Agenda. Overview. Basic SQL Queries. Joins Queries

Architecting the Future of Big Data

Instant SQL Programming

Hive Development. (~15 minutes) Yongqiang He Software Engineer. Facebook Data Infrastructure Team

Hadoop Job Oriented Training Agenda

Implement Hadoop jobs to extract business value from large and varied data sets

PIG LATIN AND HIVE THANKS TO M. GROSSNIKLAUS

Database Query 1: SQL Basics

Pivotal HAWQ Release Notes

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

Integrating MicroStrategy with Hadoop/Hive

P_Id LastName FirstName Address City 1 Kumari Mounitha VPura Bangalore 2 Kumar Pranav Yelhanka Bangalore 3 Gubbi Sharan Hebbal Tumkur

Real World Hadoop Use Cases

Oracle Database: Introduction to SQL

MySQL for Beginners Ed 3

SQL Server Table Design - Best Practices

Big Data and Scripting Systems build on top of Hadoop

Partitioning under the hood in MySQL 5.5

Financial Data Access with SQL, Excel & VBA

How, What, and Where of Data Warehouses for MySQL

Oracle Database: Introduction to SQL

6 CHAPTER. Relational Database Management Systems and SQL Chapter Objectives In this chapter you will learn the following:

Information Technology NVEQ Level 2 Class X IT207-NQ2012-Database Development (Basic) Student s Handbook

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

Chapter 9 Joining Data from Multiple Tables. Oracle 10g: SQL

Oracle Database 11g SQL

Lofan Abrams Data Services for Big Data Session # 2987

A Brief Introduction to MySQL

Oracle 10g PL/SQL Training

HIVE. Data Warehousing & Analytics on Hadoop. Joydeep Sen Sarma, Ashish Thusoo Facebook Data Team

How To Create A Large Data Storage System

Impala: A Modern, Open-Source SQL Engine for Hadoop. Marcel Kornacker Cloudera, Inc.

Chapter 1 Overview of the SQL Procedure

Hadoop and Hive Development at Facebook. Dhruba Borthakur Zheng Shao {dhruba, Presented at Hadoop World, New York October 2, 2009

Qsoft Inc

Transcription:

Big Data Hive! 2013-2014 Laurent d Orazio

Introduction! Context Parallel computation on large data sets on commodity hardware Hadoop [hadoop] Definition Open source implementation of MapReduce [DG08] Objective Large scale data sets process and generation Drawbacks Low level (developers are required to write custom programs) Hard to maintain Hard to reuse 2

Outline! Data model Type system Language 3

Data model! Principle Data stored in tables Table composed by rows Row composed by columns Column associated to a primitive or complex type 4

Outline! Data model Type system Language 5

Primitive types! Signed integers bigint(8 bytes) int(4 bytes) smallint(2 bytes) tinyint(1 byte) Floating point numbers float(single precision) double(double precision) String 6

Complex types! Associative arrays map<key-type, value-type> Lists list<element-type> Structs struct<file-name: field-type,... > Composed complex type Example list<map<string, struct<p1:int, p2:int>> 7

Operators. and []! Operator. Access to a field within a struct Operator [] Access to a value in a list or a array Example Table t1(st string, fl float, li list<map<string, struct<p1:int, p2:int>>); Instructions t1.li[0] t1.li[0]['key ] t1.li[0]['key'].p2 8

Outline! Data model Type system Language DDL DML Extensions 9

HiveQL! Principles Subset of SQL Extension for specificities of cloud computing 10

DDL! DDL Create Show Describe Drop Alert 11

Create! Objective Create a table Syntax CREATE TABLE <table_name> (<nom_attribut1> <type1>,... <nom_attribut_n> <typen>); Example Creating students table with the following schema students(num, lastname, firstname, gender, birth_date) create table students(num int, lastname string, firstname string, gender string, birthdate date); 12

Show! Objective List all tables Syntax SHOW TABLES [predicate]; Examples Listing all tables show tables; Listing tables that end with a s show tables '.*s'; 13

Describe! Objective List all columns Syntax DESCRIBE <table_name>; Example Listing columns of students table describe students; 14

Drop! Objective Delete a table Syntax DROP TABLE <table_name>; Example Removing students table drop table students; 15

Alter! Objective Update a table Rename Add a column Replace a column 16

Alter Rename! Syntax ALTER TABLE <old_table_name> RENAME TO <new_table_name>; Example Rename table students into persons alter table students rename to persons; 17

Alter Add column! Syntax ALTER TABLE <table_name> ADD COLUMNS (<attribute_name> <type>); Example Add a column address in table students alter table students add columns(address string); 18

Alter Replace column! Syntax ALTER TABLE <table_name> REPLACE COLUMNS (<attribute_name1> <type1>,..., <new_attribute_namei> <new_typei>,..., <nom_attribut_n> <typen>); Example Replace address in table students by column city alter table students replace columns(..., city string); 19

DML! DML Data management Insert/load Delete Update Querying 20

Limitations! Limitations Insert Impossible into an existing table or data partition Existing data overwritten Lack of INSERT INTO UPDATE DELETE Advantages Concurrency protocol Context: daily or hourly data loaded Example INSERT OVERWRITE TABLE t1 SELECT * FROM t2; 21

Load! Objective Insert data from a file Syntax LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2...)] Example Load data into students from /temp/students.txt load data local inpath '/temp/students' into table students; 22

Insert (1)! Objective Insert data through a query Syntax INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2...)] select_statement1 FROM from_statement Example Insert students lastname into a table students_lastname insert overwrite table students_lastname select lastname from students; 23

Insert (2)! N.B. Possibility to write data into the file system Syntax INSERT OVERWRITE [LOCAL] DIRECTORY <directory> <query> Example Write students data into a directory students INSERT OVERWRITE LOCAL DIRECTORY '/.../students' select * from students; 24

DML! DML Data management Querying Select Project Join Group by Etc. 25

SQL features! from clause subqueries joins inner, left outer, right outer and outer joins cartesian products group bys aggregations union all create table as select functions on primitive and complex types 26

Join! Limitations Join Only equality predicates ANSI join syntax Example SELECT t1.a1 as c1, t2.b1 as c2 FROM t1 JOIN t2 ON (t1.a2 = t2.b2); instead of SELECT t1.a1 as c1, t2.b1 as c2 FROM t1, t2 WHERE t1.a2 = t2.b2; 27

Extensions! Extensions SELECT <-> FROM Support MapReduce analysis Choice of programming language Sort on none distribution attribute Multiple insertions 28

SELECT vs FROM! Possibility to intervert from and select 29

MapReduce analysis! Map or reduce optional 30

Programming language! Example Wordcount and python program FROM ( MAP doc USING 'python wc_mapper.py AS (word, cnt) FROM docs CLUSTER BY word ) a REDUCE word, cnt USING 'python wc_reduce.py'; 31

Sort! Extensions Possibility to sort on a set of columns different from the ones used to do the distribution Example FROM ( ) a FROM session_table SELECT sessionid, tstamp, data DISTRIBUTE BY sessionid SORT BY tstamp REDUCE sessionid, tstamp, data USING 'session_reducer.sh'; 32

Multiple insertions (1)! Principle Inserting different transformation results into different Tables Partitions Hdfs directories Local directories as part of the same query Objective Reducing the number of scans done on the input data 33

Multiple insertions (2)! Example FROM t1 INSERT OVERWRITE TABLE t2 SELECT t3.c2, count(1) FROM t3 WHERE t3.c1 <= 20 GROUP BY t3.c2 INSERT OVERWRITE DIRECTORY '/output_dir SELECT t3.c2, avg(t3.c1) FROM t3 WHERE t3.c1 > 20 AND t3.c1 <= 30 GROUP BY t3.c2 INSERT OVERWRITE LOCAL DIRECTORY '/home/dir SELECT t3.c2, sum(t3.c1) FROM t3 WHERE t3.c1 > 30 GROUP BY t3.c2; 34