Unlocking Hadoop for Your Rela4onal DB Kathleen Ting @kate_ting Technical Account Manager, Cloudera Sqoop PMC Member BigData.be April 4, 2014
Who Am I? Started 3 yr ago as 1 st Cloudera Support Eng Now manages Cloudera s 2 largest customers Sqoop CommiJer, PMC Member Co- Author of the Apache Sqoop Cookbook
What is Sqoop? Apache Top- Level Project SQl to hadoop Tool to transfer data from rela4onal databases Teradata, MySQL, PostgreSQL, Oracle, Netezza To/From Hadoop ecosystem HDFS (text, sequence file), Hive, HBase, Avro 3
Why Sqoop? Efficient/Controlled resource u4liza4on Concurrent connec4ons, Time of opera4on Datatype mapping and conversion Automa4c, and User override Metadata propaga4on Sqoop Record Hive Metastore Avro
Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources
Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources
7 Sqoop 1 Architecture
Sqoop 1 Command Line sqoop TOOL PROPS ARG [-- EXTRA] TOOL: import, export PROPS Hadoop (java) proper4es -Dwhatever.whenever=yes ARG Generic SQOOP arguments --table, --connect,... EXTRA connector specific --schema (PostgreSQL and Microsoa SQL Server)
Sqoop 1 Example sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities sqoop export \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop --password sqoop \ --table cities \ --export-dir /temp/cities
Sqoop 1 Challenges Cryp4c, contextual command line arguments Security concerns Type mapping is not clearly defined Client needs access to Hadoop binaries/configura4on and database JDBC model is enforced 10
Troubleshoo4ng Sqoop 1 Versions: Sqoop, Hadoop, OS, JDBC Console log aaer running with the --verbose flag Capture the en4re output via sqoop import &> sqoop.log En4re Sqoop command including the op4ons- file if applicable Expected output and actual output Table defini4on Small input data set that triggers the problem Especially with export, malformed data is oaen the culprit Hadoop task logs Oaen the task logs contain further informa4on describing the problem Permissions on input files
Troubleshoo4ng Sqoop 1 Imported table has more rows than source table? Data contains char used as Hive s delimiters Clean up data --hive-drop-import-delims Removes \n, \t, and \01 char --hive-delims-replacement SPECIAL Replaces \n, \t, and \01 char with string SPECIAL Not restricted to Hive - any import job using text files Ensure output files have one line per imported row
Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources
Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping
Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping
Protec4ng Your Password sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ -P sqoop import \ --connect jdbc:mysql://mysql.example.com/sqoop \ --username sqoop \ --table cities \ --password-file my-sqoop-password
Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping
Sqoop Works on CLI Not in Oozie Character parameter ' ' has multiple characters; only the first will be used. Got error creating database manager: java.io.ioexception: No manager for connect string: "jdbc:teradata...
Sqoop Works on CLI Not in Oozie sqoop import --password "speci@l\$" \ connect 'jdbc:x:/yyy;db=sqoop Remove all escaping that you ve added for the shell Use <arg> vs <command> tags as content is considered to be one parameter Put all - D parameters into configura4on sec4on Install driver into workflow s lib/ directory or shared ac4on library /user/oozie/share/lib/sqoop/
Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping
Choosing Proper Connector JDBC driver is dependency for all three connectors Sqoop automa4cally chooses most op4mal connector (OraOoop, built- in, Generic JDBC Connector) Or explicitly chose: --connection-manager com.quest.oraoop.oraoopconnmanager
Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping
Overriding Type Mapping - - map- column- java parameter comma separated list of key- value pairs key = exact column name value = target Java type sqoop import \ --map-column-java \ c1=float,c2=string,c3=string...
Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources
25 Sqoop 2 Architecture
Sqoop 2 Design Goals Security and Separa4on of Concerns Role based access and use Ease of extension No low- level Hadoop knowledge needed No func4onal overlap between Connectors Ease of Use Uniform func4onality Domain specific interac4ons
Sqoop 2 UI in Hue Troubleshoo4ng sqoop.log file is located in @LOGDIR@ and the rest should be in server/logs/* Look for catalina.out, catalina.log, localhost- *.log
28
29
30
31
32
33
34
35
36
37
Agenda Sqoop 1 Sqoop 1 Architecture Sqoop 1 Command Line Sqoop 1 Examples Sqoop 1 Challenges Troubleshoo4ng Sqoop 1 Common Sqoop 1 Issues Protec4ng Your Password Sqoop Works on CLI Not in Oozie Choosing Proper Connector Overriding Type Mapping Sqoop 2 Sqoop 2 Architecture Sqoop 2 Design Goals Sqoop 2 UI in Hue Resources
Resources Sqoop 1 Sqoop 2 http://archive-primary.cloudera.com/ cdh5/cdh/5/sqoop2/ 39