Building Big Data Pipelines using OSS. Costin Leau Staff Engineer VMware @CostinL



Similar documents
Big Data for the JVM developer. Costin Leau,

Word Count Code using MR2 Classes and API

Hadoop Lab Notes. Nicola Tonellotto November 15, 2010

Xiaoming Gao Hui Li Thilina Gunarathne

BIG DATA APPLICATIONS

Tutorial- Counting Words in File(s) using MapReduce

The Hadoop Eco System Shanghai Data Science Meetup

How To Write A Nosql Database In Spring Data Project

LANGUAGES FOR HADOOP: PIG & HIVE

Getting to know Apache Hadoop

Lambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014

Hadoop and ecosystem * 本 文 中 的 言 论 仅 代 表 作 者 个 人 观 点 * 本 文 中 的 一 些 图 例 来 自 于 互 联 网. Information Management. Information Management IBM CDL Lab

Workshop on Hadoop with Big Data

ITG Software Engineering

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Big Data 2012 Hadoop Tutorial

Cloud Computing Era. Trend Micro

Hadoop and Eclipse. Eclipse Hawaii User s Group May 26th, Seth Ladd

Introduc)on to Map- Reduce. Vincent Leroy

Internals of Hadoop Application Framework and Distributed File System

Enterprise Data Storage and Analysis on Tim Barr

Mrs: MapReduce for Scientific Computing in Python

Connecting Hadoop with Oracle Database

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Hadoop IST 734 SS CHUNG

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Complete Java Classes Hadoop Syllabus Contact No:

HDInsight Essentials. Rajesh Nadipalli. Chapter No. 1 "Hadoop and HDInsight in a Heartbeat"

Hadoop/MapReduce. Object-oriented framework presentation CSCI 5448 Casey McTaggart

Processing of massive data: MapReduce. 2. Hadoop. New Trends In Distributed Systems MSc Software and Systems

The Cloud Computing Era and Ecosystem. Phoenix Liau, Technical Manager

COURSE CONTENT Big Data and Hadoop Training

Hadoop Basics with InfoSphere BigInsights

Hadoop at Yahoo! Owen O Malley Yahoo!, Grid Team owen@yahoo-inc.com

Working With Hadoop. Important Terminology. Important Terminology. Anatomy of MapReduce Job Run. Important Terminology

Hadoop Streaming coreservlets.com and Dima May coreservlets.com and Dima May

Programming Hadoop Map-Reduce Programming, Tuning & Debugging. Arun C Murthy Yahoo! CCDI acm@yahoo-inc.com ApacheCon US 2008

Creating Big Data Applications with Spring XD

map/reduce connected components

Introduction to MapReduce and Hadoop

Using distributed technologies to analyze Big Data

Unified Batch & Stream Processing Platform

CS54100: Database Systems

IBM Big Data Platform

USING HDFS ON DISCOVERY CLUSTER TWO EXAMPLES - test1 and test2

Yahoo! Grid Services Where Grid Computing at Yahoo! is Today

Moving From Hadoop to Spark

Hadoop Configuration and First Examples

Hadoop Streaming. Table of contents

ITG Software Engineering

HPCHadoop: MapReduce on Cray X-series

Cloudera Certified Developer for Apache Hadoop

How to Install and Configure EBF15328 for MapR or with MapReduce v1

Hadoop & Spark Using Amazon EMR

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi,

Hadoop Framework. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

How To Write A Mapreduce Program On An Ipad Or Ipad (For Free)

Map-Reduce and Hadoop

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Hadoop: Understanding the Big Data Processing Method

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Istanbul Şehir University Big Data Camp 14. Hadoop Map Reduce. Aslan Bakirov Kevser Nur Çoğalmış

How To Use Hadoop


Big Data and Data Science Grows Up. Ron Bodkin Founder & CEO Think Big Analy8cs ron.bodkin@xthinkbiganaly8cs..com

Hadoop Job Oriented Training Agenda

Hadoop WordCount Explained! IT332 Distributed Systems

Qsoft Inc

Spring for Apache Hadoop - Reference Documentation

Hadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Management and NoSQL Databases

Distributed DataFrame on Spark: Simplifying Big Data For The Rest Of Us

Certified Big Data and Apache Hadoop Developer VS-1221

Extreme Computing. Hadoop MapReduce in more detail.

International Journal of Advancements in Research & Technology, Volume 3, Issue 2, February ISSN

Running Hadoop on Windows CCNP Server

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Upcoming Announcements

Programming in Hadoop Programming, Tuning & Debugging

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Important Notice. (c) Cloudera, Inc. All rights reserved.

Word count example Abdalrahman Alsaedi

Introduc8on to Apache Spark

Open source Google-style large scale data analysis with Hadoop

How To Write A Mapreduce Program In Java.Io (Orchestra)

BIG DATA - HADOOP PROFESSIONAL amron

CS380 Final Project Evaluating the Scalability of Hadoop in a Real and Virtual Environment

Facebook s Petabyte Scale Data Warehouse using Hive and Hadoop

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Real World Big Data Architecture - Splunk, Hadoop, RDBMS

Peers Techno log ies Pv t. L td. HADOOP

Transcription:

Building Big Data Pipelines using OSS Costin Leau Staff Engineer VMware @CostinL

Costin Leau Speaker Bio Spring committer since 2006 Spring Framework (JPA, @Bean, cache abstraction) Spring OSGi/Dynamic Modules, OSGi Blueprint spec Spring Data (GemFire, Redis, Hadoop) 3

Data Landscape

Data Trends http://www.emc.com/leadership/programs/digital-universe.htm

Enterprise Data Trends

Enterprise Data Trends Unstructured data No predefined model Often doesn t fit well in RDBMS Pre-Aggregated Data Computed during data collection Counters Running Averages

Cost Trends Big Iron: $40k/CPU ardware cost halving every 18 months Commodity Cluster: $1k/CPU

The Value of Data Value from Data Exceeds Hardware & Software costs Value in connecting data sets Grouping e-commerce users by user agent Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/418.9 (KHTML, like Gecko) Safari/419.3

Big Data Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze A subjective and moving target Big data in many sectors today range from 10 s of TB to multiple PB

(Big) Data Pipelines

A Holistic View of a Big Data System ETL Real Time Streams Real-Time Processing (s4, storm) Real Time Structured Database (hbase, Gemfre, Cassandra) Analytics Big SQL (Greenplum, AsterData, Etc ) Unstructured Data (HDFS) Batch Processing

Big Data probls == Integration probls Collect Transform RT Analysis Ingest Batch Analysis Distribute Use Real world big data solutions require workflow across systems Workflow for big data processing is an integration problem Share core components of a classic integration workflow Big data solutions need to integrate with existing data and apps Event-driven vs Batch workflows No silver bullet Michael Stonebraker: One Size Fits All, An Idea Whose Time Has Come And Pass

Big Data probls == Integration probls Collect Transform RT Analysis Ingest Batch Analysis Distribute Use Real world big data solutions require workflow across systems Workflow for big data processing is an integration problem Share core components of a classic integration workflow Big data solutions need to integrate with existing data and apps Event-driven vs Batch workflows No silver bullet Michael Stonebraker: One Size Fits All, An Idea Whose Time Has Come And Pass Spring projects can provide the foundation for Big Data workflows

Taming Big Data

Hadoop as a Big Data Platform Map Reduce Framework (MapRed) Hadoop Distributed File System (HDFS)

Spring for Hadoop - Goals Hadoop has a poor out of the box programming model Spring simplifies developing Hadoop applications By providing a familiar and consistent Applications are generally a collection of scripts calling command line apps programming and configuration mode Across a wide range of use cases HDFS usage Data Analysis (MR/Pig/Hive/Cascading) Workflow Event Streams Integration Allowing to start small and grow

Relationship with other Spring projects Spring Batch On and Off Hadoop workflows Spring Integration Event-driven applications, Enterprise Integration Patterns Spring Data Redis, MongoDB, Neo4j, Gemfire Spring Framework Web, Messaging Applications Spring for Apache Hadoop Simplify Hadoop programming

Capabilities: Spring + Hadoop Declarative configuration Create, configure, and parameterize Hadoop connectivity and all job types Environment profiles easily move from dev to qa to prod Developer productivity Create well-formed applications, not spaghetti script applications Simplify HDFS and FsShell API with support for JVM scripting Runner classes for MR/Pig/Hive/Cascading for small workflows Helper Template classes for Pig/Hive/HBase

Core Hadoop

Core Map Reduce idea

Counting Words M/R public class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); Text word = new Text(); public void map(object key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasmoretokens()) { word.set(itr.nexttoken());context.write(word, one); }}} public class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } }

Counting Words Configuring M/R Configuration conf = new Configuration(); Job job = new Job(conf, "wordcount"); job.setoutputkeyclass(text.class); job.setoutputvalueclass(intwritable.class); job.setmapperclass(map.class); job.setreducerclass(reduce.class); job.setinputformatclass(textinputformat.class); job.setoutputformatclass(textoutputformat.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.waitforcompletion(true);

Running Hadoop Jars (WordCount 1.0) Vanilla Hadoop bin/hadoop jar hadoop-examples.jar wordcount /wc/input /wc/output SHDP <hdp:configuration /> <hdp:jar-runner id= wordcount jar="hadoop-examples.jar> <hdp:arg value= wordcount /> <hdp:arg value= /wc/input /> <hdp:arg value= /wc/output /> </hdp:jar-runner>

Running Hadoop Tools (WordCount 2.0) Vanilla Hadoop bin/hadoop jar conf myhadoop-site.xml D ignorecase=true wordcount.jar org.myorg.wordcount /wc/input /wc/output SHDP <hdp:configuration resources= myhadoop-site.xml /> <hdp:tool-runner id="wc jar= wordcount.jar > <hdp:arg value= /wc/input /> <hdp:arg value= /wc/output /> ignorecase=true </hdp:tool-runner>

Configuring Hadoop <context:property-placeholder location="hadoop-dev.properties"/> <hdp:configuration> fs.default.name=${hd.fs} </hdp:configuration> <hdp:job id="word-count-job" applicationcontext.xml input-path= ${input.path}" output-path="${output.path} jar= myjob.jar mapper="org.apache.hadoop.examples.wordcount.tokenizermapper reducer="org.apache.hadoop.examples.wordcount.intsumreducer"/> <hdp:job-runner id= runner job-ref="word-count-job run-at-startup= true /> input.path=/wc/input/ output.path=/wc/word/ hd.fs=hdfs://localhost:9000 hadoop-dev.properties

Running a Streaming Job bin/hadoop jar hadoop-streaming.jar \ input /wc/input output /wc/output \ -mapper /bin/cat reducer /bin/wc \ -files stopwords.txt <context:property-placeholder location="hadoop-${env}.properties"/> <hdp:streaming id= wc input-path= ${input} output-path= ${output} mapper= ${cat} reducer= ${wc} files= classpath:stopwords.txt > </hdp:streaming> hadoop-dev.properties input.path=/wc/input/ output.path=/wc/word/ hd.fs=hdfs://localhost:9000 env=dev java jar SpringLauncher.jar applicationcontext.xml

Running a Streaming Job bin/hadoop jar hadoop-streaming.jar \ input /wc/input output /wc/output \ -mapper /bin/cat reducer /bin/wc \ -files stopwords.txt <context:property-placeholder location="hadoop-${env}.properties"/> <hdp:streaming id= wc input-path= ${input} output-path= ${output} mapper= ${cat} reducer= ${wc} files= classpath:stopwords.txt > </hdp:streaming> hadoop-dev.properties input.path=/wc/input/ output.path=/wc/word/ hd.fs=hdfs://localhost:9000 hadoop-qa.properties input.path=/gutenberg/input/ output.path=/gutenberg/word/ hd.fs=hdfs://darwin:9000 env=qa java jar SpringLauncher.jar applicationcontext.xml

Word Count Injecting Jobs Use Dependency Injection to obtain reference to Hadoop Job additional runtime Perform public class WordService { configuration and submit @Inject private Job mapreducejob; public void processwords() { mapreducejob.submit(); } }

HDFS and Hadoop Shell as APIs Has all bin/hadoop fs commands through FsShell chmod, test class { mkdir,myscript @Autowired FsShell fsh; @PostConstruct void init() { String outputdir = "/data/output"; if (fsshell.test(outputdir)) { fsshell.rmr(outputdir); } }}

HDFS and FsShell as APIs Excellent for JVM scripting init-files.groovy // use the shell (made available under variable fsh) if (!fsh.test(inputdir)) { fsh.mkdir(inputdir); fsh.copyfromlocal(sourcefile, inputdir); fsh.chmod(700, inputdir) } if (fsh.test(outputdir)) { fsh.rmr(outputdir) }

HDFS and FsShell as APIs appctx.xml <hdp:script id= init-script language= groovy > <hdp:property name= inputdir value= ${input} /> <hdp:property name= outputdir value= ${output} /> <hdp:property name= sourcefile value= ${source} /> // use the shell (made available under variable fsh) if (!fsh.test(inputdir)) { fsh.mkdir(inputdir); fsh.copyfromlocal(sourcefile, inputdir); fsh.chmod(700, inputdir) } if (fsh.test(outputdir)) { fsh.rmr(outputdir) } </hdp:script>

Counting Words - Pig input_lines = LOAD '/tmp/books' AS (line:chararray); -- Extract words from each line and put them into a pig bag words = FOREACH input_lines GENERATE FLATTEN(TOKENIZE(line)) AS word; -- filter out any words that are just white spaces filtered_ words = FILTER words BY word MATCHES '\\w+'; -- create a group for each word word_groups = GROUP filtered_words BY word; -- count the entries in each group word_count = FOREACH word_groups GENERATE COUNT(filtered_words) AS count, group AS word; ordered_word_count = ORDER word_count BY count DESC; STORE ordered_word_count INTO '/tmp/number-of-words';

Pig Vanilla Pig pig x mapreduce wordcount.pig pig wordcount.pig P pig.properties p pig.exec.nocombiner=true SHDP Creates a PigServer Executes script on startup (optional) <pig-factory job-name= wc properties-location= pig.properties"> pig.exec.nocombiner=true <script location= wordcount.pig"> <arguments>ignorecase=true</arguments> </script> </pig-factory>

PigRunner A small pig workflow @Scheduled(cron= 0 0 12 * *? ) public void process() { pigrunner.call(); }

PigTemplate - Configuration

PigTemplate Programmatic Use public class PigPasswordRepository implements PasswordRepository { private PigTemplate pigtemplate; private String pigscript = "classpath:password-analysis.pig"; public void processpasswordfile(string inputfile) { String outputdir = baseoutputdir + File.separator + counter.incrementandget(); Properties scriptparameters = new Properties(); scriptparameters.put("inputdir", inputfile); scriptparameters.put("outputdir", outputdir); pigtemplate.executescript(pigscript, scriptparameters); } //... }

Counting Words Hive -- import the file as lines CREATE EXTERNAL TABLE lines(line string) LOAD DATA INPATH books OVERWRITE INTO TABLE lines; -- create a virtual view that splits the lines SELECT word, count(*) FROM lines LATERAL VIEW explode(split(text, )) ltable as word GROUP BY word;

Vanilla Hive Command-line JDBC based

Hive w/ SHDP Create Hive JDBC Client and use with Spring JdbcTemplate <bean id="hive-driver" class="org.apache.hadoop.hive.jdbc.hivedriver"/> <bean id="hive-ds" class="org.springframework.jdbc.datasource.simpledriverdatasource" c:driver-ref="hive-driver" c:url="${hive.url}"/> <bean id="template" class="org.springframework.jdbc.core.jdbctemplate" c:data-source-ref="hive-ds"/>

Hive w/ SHDP Create Hive JDBC Client and use with Spring JdbcTemplate <bean id="hive-driver" class="org.apache.hadoop.hive.jdbc.hivedriver"/> <bean id="hive-ds" class="org.springframework.jdbc.datasource.simpledriverdatasource" c:driver-ref="hive-driver" c:url="${hive.url}"/> <bean id="template" class="org.springframework.jdbc.core.jdbctemplate" c:data-source-ref="hive-ds"/> Reuse Spring s Rich ResultSet to POJO Mapping Features public long count() { return jdbctemplate.queryforlong("select count(*) from " + tablename); } List<Password> result = jdbctemplate.query( select * from passwords", new ResultSetExtractor<List<Password>() { public String extractdata(resultset rs) throws SQLException { // extract data from result set }});

Vanilla Hive - Thrift HiveClient is not thread-safe, throws checked exceptions public long count() { HiveClient hiveclient = createhiveclient(); try { hiveclient.execute("select count(*) from " + tablename); return Long.parseLong(hiveClient.fetchOne()); // checked exceptions } catch (HiveServerException ex) { throw translateexcpetion(ex); } catch (org.apache.thrift.texception tex) { throw translateexcpetion(tex); } finally { try { hiveclient.shutdown(); } catch (org.apache.thrift.texception tex) { logger.debug("unexpected exception on shutting down HiveClient", tex); }}} protected HiveClient createhiveclient() { TSocket transport = new TSocket(host, port, timeout); HiveClient hive = new HiveClient(new TBinaryProtocol(transport)); try { transport.open(); } catch (TTransportException e) { throw translateexcpetion(e); } return hive; }

SHDP Hive Easy client confguration <hive-client-factory host="${hive.host}" port="${hive.port}"/> <hive-template id="hivetemplate"/> Can create an embedded Hive server instance <hive-server auto-startup="true" port="${hive.port}"/> Declarative Usage <hive-runner run-at-startup="true > <hdp:script> DROP TABLE IF EXISTS ${wc.table}; </hdp:script> <hdp:script location= word-count.q /> </hive-runner>

SHDP - HiveTemplate (Thrift) One-liners to execute queries @Repository public class HiveTemplatePasswordRepository implements PasswordRepository { private @Value("${hive.table}") String tablename; private @Autowired HiveOperations hivetemplate; @Override public Long count() { return hivetemplate.queryforlong("select count(*) from " + tablename); } } One-lines for executing scripts Properties scriptparameters = new Properties(); scriptparameters.put("inputdir", inputfile); scriptparameters.put("outputdir", outputdir); hivetemplate.query( classpath:hive-analysis.q", scriptparameters);

Cascading Counting Words Scheme sourcescheme = new TextLine(new Fields("line")); Tap source = new Hfs(sourceScheme, inputpath); Scheme sinkscheme = new TextLine(new Fields("word", "count")); Tap sink = new Hfs(sinkScheme, outputpath, SinkMode.REPLACE); Pipe assembly = new Pipe("wordcount"); String regex = "(?<!\\pl)(?=\\pl)[^ ]*(?<=\\pl)(?!\\pl)"; Function function = new RegexGenerator(new Fields("word"), regex); assembly = new Each(assembly, new Fields("line ), function ); assembly = new GroupBy(assembly, new Fields("word ) ); Aggregator count = new Count(new Fields("count )); assembly = new Every(assembly, count);

Cascading Based on Spring s type safe @Configuration <bean class= wordcount.cascading.cascadingconfig "/> <bean id="cascade" class="org.springframework.data.hadoop.cascading.hadoopflowfactorybean" p:configuration-ref="hadoopconfiguration" p:tails-ref= countpipe" /> <hdp:configuration />

HBase Bootstrap HBase confguration from Hadoop Confguration <hdp:configuration/> <hdp:hbase-configuration delete-connection="true /> <bean id="hbasetemplate class="org.springframework.data.hadoop.hbase.hbasetemplate p:configuration-ref="hbaseconfiguration /> Template usage public List<User> findall() { return hbasetemplate.find(tablename, "cfinfo", new RowMapper<User>() { @Override public User maprow(result result, int rownum) throws Exception { return new User(Bytes.toString(result.getValue(CF_INFO, quser)), Bytes.toString(result.getValue(CF_INFO, qemail)), Bytes.toString(result.getValue(CF_INFO, qpassword))); } });}

Batch Workflows

On Hadoop Workflows Reuse same infrastructure for Hadoop based workflows HDFS PIG MR Step can any Hadoop job Hive HDFS

Capabilities: Spring + Hadoop + Batch Collect Transform RT Analysis Ingest Batch Analysis Spring Batch for File/DB/NoSQL driven applications Collect: Process local files Transform: Scripting or Java code to transform and enrich RT Analysis: N/A Ingest: (batch/aggregate) write to HDFS or split/filtering Batch Analysis: Orchestrate Hadoop steps in a workflow Distribute: Copy data out of HDFS to structured storage JMX enabled along with REST interface for job control Distribute Use

Spring Batch Configuration <job id="job1"> <step id="import" next="wordcount"> <tasklet ref= import-tasklet"/> </step> <step id= wc" next="pig"> <tasklet ref="wordcount-tasklet"/> </step> <step id="pig"> <tasklet ref="pig-tasklet ></step> <split id="parallel" next="hdfs"> <flow><step id="mrstep"> <tasklet ref="mr-tasklet"/> </step></flow> <flow><step id="hive"> <tasklet ref="hive-tasklet"/> </step></flow> </split> <step id="hdfs"> <tasklet ref="hdfs-tasklet"/></step> </job>

Spring Batch Configuration Additional configuration behind the graph Reuse previous Hadoop job definitions

Spring Batch Admin

Event Driven Applications

Capabilities: Spring + Hadoop + EAI Collect Transform RT Analysis Ingest Batch Analysis Distribute Big data solutions need to integrate with existing data and apps Share core components of a classic integration workflow Spring Integration for Event driven applications Collect: Single node or distributed data collection (tcp/jms/rabbit) Transform: Scripting or Java code to transform and enrich RT Analysis: Connectivity to multiple analysis techniques Ingest: Write to HDFS, Split/Filter data stream to other stores JMX enabled + control bus for starting/stopping individual components Use

Spring Integration Polling Log File Poll a directory for files, files are rolled over every 10 min Copy files to staging area Copy files to HDFS Use an aggregator to wait for 10 files in 20 minute interval to launch MR job

Spring Integration Syslog to HDFS Use tcp/udp Syslog adapter Transformer categorizes messages Route to specific channels based on category One route leads to HDFS write and filtered data stored in Redis

Spring Integration Multi-node Syslog Spread log collection across multiple machines Use TCP Adapters to forward events across machines Can use other middleware Reusable flows, creak the flow at a channel boundary and insert inbound/outbound adapter pair

Resources Prepping for GA feedback welcome Project Page: springsource.org/spring-data/hadoop Source Code: github.com/springsource/spring-hadoop Books

Q&A @CostinL