Unlocking Hadoop for Your Rela4onal DB. Kathleen Ting @kate_ting Technical Account Manager, Cloudera Sqoop PMC Member BigData.



Similar documents
Apache Sqoop. A Data Transfer Tool for Hadoop

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

A New GeneraAon of Data Transfer Tools for Hadoop: Sqoop 2

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Qsoft Inc

MySQL and Hadoop. Percona Live 2014 Chris Schneider

Integrating VoltDB with Hadoop

From Relational to Hadoop Part 2: Sqoop, Hive and Oozie. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Complete Java Classes Hadoop Syllabus Contact No:

Constructing a Data Lake: Hadoop and Oracle Database United!

Microsoft SQL Server Connector for Apache Hadoop Version 1.0. User Guide

ITG Software Engineering

Hadoop for MySQL DBAs. Copyright 2011 Cloudera. All rights reserved. Not to be reproduced without prior written consent.

Peers Techno log ies Pv t. L td. HADOOP

Cloudera Certified Developer for Apache Hadoop

Apache Sentry. Prasad Mujumdar

Teradata Connector for Hadoop Tutorial

Deploying Hadoop with Manager

Introduction to Big Data Training

ITG Software Engineering

Hadoop Ecosystem B Y R A H I M A.

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Hadoop Job Oriented Training Agenda

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Data Domain Profiling and Data Masking for Hadoop

HOW TO LIVE WITH THE ELEPHANT IN THE SERVER ROOM APACHE HADOOP WORKSHOP

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Data processing goes big

Important Notice. (c) Cloudera, Inc. All rights reserved.

Getting Started with SandStorm NoSQL Benchmark

BIG DATA HADOOP TRAINING

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Big Data Too Big To Ignore

BIG DATA & HADOOP DEVELOPER TRAINING & CERTIFICATION

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Toad for Apache Hadoop 1.1.0

CDH installation & Application Test Report

Defending Against Web App A0acks Using ModSecurity. Jason Wood Principal Security Consultant Secure Ideas

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Hive Interview Questions

Workshop on Hadoop with Big Data

Integrating SAP BusinessObjects with Hadoop. Using a multi-node Hadoop Cluster

Architecting the Future of Big Data

Implement Hadoop jobs to extract business value from large and varied data sets

Cloudera Navigator Installation and User Guide

Big Data Course Highlights

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

Informatica Data Replication FAQs

Red Hat Enterprise Linux OpenStack Platform 7 OpenStack Data Processing

OLH: Oracle Loader for Hadoop OSCH: Oracle SQL Connector for Hadoop Distributed File System (HDFS)

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Certified Big Data and Apache Hadoop Developer VS-1221

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Cloudera Manager Training: Hands-On Exercises

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Oracle Data Integrator for Big Data. Alex Kotopoulis Senior Principal Product Manager

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Move Data from Oracle to Hadoop and Gain New Business Insights

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

From Relational to Hadoop Part 1: Introduction to Hadoop. Gwen Shapira, Cloudera and Danil Zburivsky, Pythian

Cloudera Navigator Installation and User Guide

The Hadoop Eco System Shanghai Data Science Meetup

Lucid Key Server v2 Installation Documentation.

Big Data Management and Security

[Type text] Week. National summer training program on. Big Data & Hadoop. Why big data & Hadoop is important?

Toad for Apache Hadoop 1.2.0

Building Your Big Data Team

Cloudera Manager Installation Guide

Bringing Big Data to People

SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.

Big Data SQL and Query Franchising

Architecting the Future of Big Data

Agenda. ! Strengths of PostgreSQL. ! Strengths of Hadoop. ! Hadoop Community. ! Use Cases

SQL on NoSQL (and all of the data) With Apache Drill

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

The Pentaho Big Data Guide

project collects data from national events, both natural and manmade, to be stored and evaluated by

Talend for Data Integration guide

How Cisco IT Built Big Data Platform to Transform Data Management

Big Data Analytics Nokia

A Prototype Implementation of Recommendation Engine Using Big Data Analytics

Introduction To Hive

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

A Brief Introduction to MySQL

Sisense. Product Highlights.

DBX. SQL database extension for Splunk. Siegfried Puchbauer

<Insert Picture Here> Big Data

Apache Flume and Apache Sqoop Data Ingestion to Apache Hadoop Clusters on VMware vsphere SOLUTION GUIDE

Internals of Hadoop Application Framework and Distributed File System

Hadoop and Map-Reduce. Swati Gore

Secure Your Hadoop Cluster With Apache Sentry (Incubating) Xuefu Zhang Software Engineer, Cloudera April 07, 2014

HADOOP BIG DATA DEVELOPER TRAINING AGENDA

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Transcription:

Unlocking Hadoop for Your Rela4onal DB Kathleen Ting @kate_ting Technical Account Manager, Cloudera Sqoop PMC Member BigData.be April 4, 2014

Who Am I? Started 3 yr ago as 1 st Cloudera Support Eng Now manages Cloudera s 2 largest customers Sqoop CommiJer, PMC Member Co- Author of the Apache Sqoop Cookbook

What is Sqoop? Apache Top- Level Project SQl to hadoop Tool to transfer data from rela4onal databases Teradata, MySQL, PostgreSQL, Oracle, Netezza To/From Hadoop ecosystem HDFS (text, sequence file), Hive, HBase, Avro 3

Why Sqoop? Efficient/Controlled resource u4liza4on Concurrent connec4ons, Time of opera4on Datatype mapping and conversion Automa4c, and User override Metadata propaga4on Sqoop Record Hive Metastore Avro

Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources

Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources

7 Sqoop 1 Architecture

Sqoop 1 Command Line sqoop TOOL PROPS ARG [-- EXTRA] TOOL: import, export PROPS Hadoop (java) proper4es -Dwhatever.whenever=yes ARG Generic SQOOP arguments --table, --connect,... EXTRA connector specific --schema (PostgreSQL and Microsoa SQL Server)

Sqoop 1 Example sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities sqoop export \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities \ --export-dir /temp/cities

Sqoop 1 Challenges Cryp4c, contextual command line arguments Security concerns Type mapping is not clearly defined Client needs access to Hadoop binaries/configura4on and database JDBC model is enforced 10

Troubleshoo4ng Sqoop 1 Versions: Sqoop, Hadoop, OS, JDBC Console log aaer running with the --verbose flag Capture the en4re output via sqoop import &> sqoop.log En4re Sqoop command including the op4ons- file if applicable Expected output and actual output Table defini4on Small input data set that triggers the problem Especially with export, malformed data is oaen the culprit Hadoop task logs Oaen the task logs contain further informa4on describing the problem Permissions on input files

Troubleshoo4ng Sqoop 1 Imported table has more rows than source table? Data contains char used as Hive s delimiters Clean up data --hive-drop-import-delims Removes \n, \t, and \01 char --hive-delims-replacement SPECIAL Replaces \n, \t, and \01 char with string SPECIAL Not restricted to Hive - any import job using text files Ensure output files have one line per imported row

Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources

Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping

Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping

Protec4ng Your Password sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ -P sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ --password-file my-sqoop-password

Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping

Sqoop Works on CLI Not in Oozie Character parameter ' ' has multiple characters; only the first will be used. Got error creating database manager: java.io.ioexception: No manager for connect string: "jdbc:teradata...

Sqoop Works on CLI Not in Oozie sqoop import --password "speci@l\$" \ connect 'jdbc:x:/yyy;db=sqoop Remove all escaping that you ve added for the shell Use <arg> vs <command> tags as content is considered to be one parameter Put all - D parameters into configura4on sec4on Install driver into workflow s lib/ directory or shared ac4on library /user/oozie/share/lib/sqoop/

Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping

Choosing Proper Connector JDBC driver is dependency for all three connectors Sqoop automa4cally chooses most op4mal connector (OraOoop, built- in, Generic JDBC Connector) Or explicitly chose: --connection-manager com.quest.oraoop.oraoopconnmanager

Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping

Overriding Type Mapping - - map- column- java parameter comma separated list of key- value pairs key = exact column name value = target Java type sqoop import \ --map-column-java \ c1=float,c2=string,c3=string...

Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources

25 Sqoop 2 Architecture

Sqoop 2 Design Goals Security and Separa4on of Concerns Role based access and use Ease of extension No low- level Hadoop knowledge needed No func4onal overlap between Connectors Ease of Use Uniform func4onality Domain specific interac4ons

Sqoop 2 UI in Hue Troubleshoo4ng sqoop.log file is located in @LOGDIR@ and the rest should be in server/logs/* Look for catalina.out, catalina.log, localhost- *.log

28

29

30

31

32

33

34

35

36

37

Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources

Resources Sqoop 1 Sqoop 2 http://archive-primary.cloudera.com/ cdh5/cdh/5/sqoop2/ 39