A framework for easy development of Big Data applications

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "A framework for easy development of Big Data applications"

Transcription

1 A framework for easy development of Big Data applications Rubén

2 Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies 5. Conclusions

3 About me :-)

4 Ø PhD in Software Engineering Ø MSc in Computer Science Ø BSc in Computer Science Academics Work Experience

5 About Treelogic

6 Treelogic is an R&D intensive company with the mission of creating, boosting, developing and adapting scientific and technological knowledge to improve quality standards in our daily life

7 TREELOGIC Distributor and Sales

8 Interna2onal Projects Na2onal Projects Research Lines Solu6ons Regional Projects R&D Manag. System Computer Vision Big Data Teraherzt technology Data science Social Media Analysis Seman2cs Security & Safety Jus2ce Health Transport Financial services ICT tailored solu2ons Internal Projects R&D

9 7 ongoing FP7 projects ICT, SEC, OCEAN Coordinating 5 of them 3 ongoing Eurostars projects Coordinating all of them

10 7 years experience in R&D projects More than 300 partners in last 3 years More than 40 projects with budget over 120 MEUR Project coordinator in 7 European projects Overall participation in 11 European projects Research & INNOVATION

11

12 Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies 5. Conclusions

13 What is Big Data? A massive volume of both structured and unstructured data that is so large to process with traditional database and software techniques

14 How is Big Data? Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization - Gartner IT Glossary -

15 3 problems Volume Variety Velocity

16 3 solu6ons Batch processing NoSQL Real-time processing

17 3 solu6ons Batch processing NoSQL Real-time processing

18 Batch processing Scalable Large amount of static data Distributed Parallel Fault tolerant High latency Volume

19 Real- 6me processing Velocity Low latency Continuous unbounded streams of data Distributed Parallel Fault-tolerant

20 Hybrid computa6on model Low latency Massive data + Streaming data Scalable Combine batch and real-time results Volume Velocity

21 Hybrid computa6on model All data Batch processing Batch results New data Combination Final results Real-time processing Stream results

22 Processing Paradigms Inception 2003 Batch processing Large amount of statics data Scalable solution Volume 1ª Generation 2006 Real-time processing Computing streaming data Low latency Velocity 2ª Generation 2010 Hybrid computation Lambda Architecture Volume + Velocity 3ª Generation 2014

23 Processing Pipeline DATA ACQUISITION DATA STORAGE DATA ANALYSIS RESULTS

24 Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies 5. Conclusions

25 Open source framework What is Lambdoop? Software abstraction layer over Open Source technologies o Hadoop, HBase, Sqoop, Flume, Kafka, Storm, Trident, Avro, Redis Common patterns and operations (aggregation, filtering, statistics ) already implemented. No MapReduce-like process Same single API for the three processing paradigms o Batch processing similar to Pig / Cascading o Real time processing using built-in functions easier than Trident o Hybrid computation model transparent for the developer

26 Why Lambdoop? Building a batch processing application requires o o o MapReduce developing Use other Hadoop related tools (Sqoop, Zookeper, HCatalog ) Storage systems (Hbase, MongoDB, HDFS, Cassandra ) Real-time processing requires o Streaming computing (S4, Storm, Samza) o Unboundend input (Flume, Scribe) o Temporal data stores (In-memory, Kafka, Kestrel)

27 Why Lambdoop? Building a hybrid computation system (Lambda Architecture) requires o o o Application logic has to be defined in two different systems using different frameworks Data must be serialized consistently and kept in sync between each system Developer is responsible for reading, writing and managing two data storage systems, performing a final combination and serving the final updated results

28

29

30 Why Lambdoop? One of the most interesting areas of future work is high level abstractions that map to a batch processing component and a real-time processing component. There's no reason why you shouldn't have the conciseness of a declarative language with the robustness of the batch/real-time architecture. Nathan Marz Rajat Jain Lambda Architecture is a implementation challenge. In many real-world situations a stumbling block for switching to a Lambda Architecture lies with a scalable batch processing layer. Technologies like Hadoop ( ) are there but there is a shortage of people with the expertise to leverage them.

31 Lambdoop Streaming data Workflow Data Operation Data Static data

32 Lambdoop Batch Hybrid Real-Time

33 Information represented as Data objects o Types: o o StaticData StreamingData Data Input o Every Data object has a Schema to describe the Data fields (types, nulleables, keys ) o A Data object is composed by Datasets.

34 Data Input Dataset o A Data object is formed by one or more Datasets. o All Datasets of a Data object share the same Schema o Datasets are formed by Register objects, o A Register is composed by RegisterFields.

35 Schema Data Input o Very similar to Avro definition schemas. o Allow to define input data s structure, fields, types, nulleables o Json format Station Title Lat. Lon. Date SO2 NO CO PM10 O3 dd vv TMP HR PRB 23 street road { } "type": "csv", "name": "AirQuality records", "fieldseparator": ";", "PK": "", "header": "true", "fields": [ {"name": "Station","type": "string","index": 0}, {"name": "Tittle","type": "string","index": 1,"nullable": "true"}, {"name": "Lat.","type": "double","index": 2,"nullable": "true"}, {"name": "Long.","type": "double","index": 3,"nullable": "true"}, {"name": "PRB","type": "double","index": 20,"nullable": "true"} ]

36 Importing data into Lambdoop o Loaders: Import information from multiple sources and store it into the HDFS as Data objects o Producers: Get streaming data and represent it as Data objects o Heterogeneous sources. Data Input o Serialize information into Avro format

37 Data Input Static Data example: Importing a Air Quality dataset from local logs to HDFS o o Loader Schema s path is files/csv/air_quality_schema //Read schema from a file String schema = readschemafile(schema_file); Loader loader = new CSVLoader("AQ.avro", uri, schema) Data input = new StaticData(loader);

38 Data Input Streaming Data example: Reading streaming sensor data from TCP port o Producer o Weather stations emit messages to port 8080 o Schema s path is files/csv/air_quality_schema int port = 8080; //Read schema String schema = readschemafile (schema_file); Producer producer = new TCPProducer ("AirQualityListener", refresh, port, schema); // Create Data object Data data = new StreamingData(producer)

39 Data Input Extensibility o Users can implement their own data loaders/producers 1) Extend Loader/Producer interface 2) Read data from original source 3) Get and serialize information (Avro format) considering Schemas

40 Opera6ons Unitary actions to process data An Operation takes Data as input, processes the Data and produces another Data as output Types of operations: Aggregation: Produces a single value per DataSet Filter: Output data has the same schema as input data Group: Produces several DataSet, grouping registers together Projection: Changes the Data schema, but preserves the records and their values Join: Combines different Data objects

41 Opera6ons Operations Aggregation(1) Aggregation(2) Filter Group Projection Join Count Skewness Filter Group Select Inner Join Average Z-Test Limit RollUp Frecuency Left Join Sum Stderror TopN Cube Variation Right Join MinValue Variance BottomN N-Til Outer Join MaxValue Covariance Max Mode Min

42 Opera6ons Extensibility (User Defined Operations): New operations can be defined implementing a set of interfaces: OperationFactory: Factory used by the framework in order to get batch, streaming and hybrid operation implementations when needed BatchOperation: Provides MapReduce logic to process the input Data StreamingOperation: Provides Storm/Trident based functions to process streaming registers HybridOperation: Provides merging logic between streaming and batch results

43 Opera6ons User Defined Operation interfaces

44 Workflows Sequence of connected Operations. Manages tasks and resources (check-points) in order to produce an output using input data and a set of Operations o o o BatchWorkflow: Runs a set of operations on StaticData input and produces a new StaticData as output StreamingWorkflow: Operates on a StreamingData to produce another StreamingData HybridWorkflow: Combines Static and Streaming data to produce completed and updated results (StreamingData) Workflow connections Data Workflow Data Data Workflow Workflow Workflow Workflow Data Data Workflow Data

45 // Batch processing example String schema = readschemafile(schema_file); Loader loader = new CSVLoader("AQ.avro",uri, schema) Data input = new StaticData(loader); Workflow wf = new BatchWorkflow(input); //Add a filter operation Filter filter = new Filter(new RegisterField("Title"), ConditionType.EQUAL, new StaticValue(«street 45")); //Calculate SO2 average on filtered input data Avg avg = new Avg(new RegisterField("SO2")); wf.addoperation(filter); wf.addoperation(avg); //Run the workflow wf.run(); //Get the results Data output = wf.getresults(); Workflows

46 Workflows //Real-time processing example Producer producer = new TCPPortProducer("QAtest", schema, config); Data input = new StreamingData(producer); Workflow wf = new StreamingWorkflow(input); //Add a filter operation Filter filter = new Filter(new RegisterField("Title"), ConditionType.EQUAL, new StaticValue("Estación Av. Castilla")); //Calculate SO2 average on filtered input data Avg avg = new Avg(new RegisterField("SO2")); wf.addoperation(filter); wf.addoperation(avg); //Runs the workflow wf.run(); //Gets the results While (!stop){ Data output = wf.getresults(); }

47 // Hybrid computation example Producer producer = new PortProducer("catest", schema1, config); StreamingData streaminput = new StreamingData(producer); Loader loader = new CSVLoader("AQ.avro",uri, schema2) StaticData batchinput = new StaticData(loader); Data input = new HybridData(streamInput, batchinput); Workflow wf = new HybridWorkflow(input); //Add a filter operation Filter filter = new Filter(new RegisterField("Title"), ConditionType.EQUAL, new StaticValue("street 34")); wf.addoperation(filter); //Calculate SO2 average on filtered input data Avg avg = new Avg(new RegisterField("SO2")); wf.addoperation(avg); //Run the workflow wf.run(); Workflows //Get the results While (!stop) { Data output = wf.getresults();}

48 Results exploita6on Filter RollUp StdError VISUALIZATION Avg Select Data EXPORT Cube Variance CSV, JSON, join

49 Results exploita6on Visualization /* Produce from Twitter */ TwitterProducer producer = new TwitterProducer( ); Data data = new StreamingData(producer); StreamingWorkflow wf = new StreamingWorkflow(data); /* Add operations to workflow*/ wf.addoperation(new Count()); /* Get results from workflow*/ Data results = wf.getresults(); /* Show results. Set dashboard refresh*/ Dashboard d = new Dashboard(config); d.addchart(lambdoopchart.createbarchart(results, new RegisterField("count"), Tweetscount");

50 Results exploita6on Visualization

51 Results exploita6on Visualization

52 Results exploita6on Export Data data = new StaticData(loader); Workflow wf = new BatchWorkflow(data); /* Add operations to workflow*/ wf.addoperation(new Count()); /* Get results from workflow*/ Data results = wf.getresults(); CSV, JSON, /* Export results */ Exporter.asCSV(results, File); MongoExport(results, Map<String, String> conf); PostgresExport(results, Map<String, String> conf);

53 Results exploita6on Alarms Data data = new StreamingData(producer); StreamingWorkflow wf = new StreamingWorkflow(data); /* Add operations to workflow*/ wf.addoperation(new Count()); /* Get results from workflow*/ Data results = wf.getresults(); /* Set alarm condition: T/F (e.g time or certain value) action: execution (e.g. show results, send an )*/ AlarmFactory.setAlert(results, condition, action);

54 Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies 5. Conclusions

55 Change configurations and easily manage the cluster Friendly tools for monitoring the health of the cluster Wizard-driven Lambdoop installation of new nodes 55

56 Visual editor for defining workflows and scheduling tasks o Plugin for Eclipse o Visual elements for: Input Sources Loader Operations Operation parameters o RegisterFields o Static values Visualization elements o Generates workflow code o XML Import/Export o Scheduling of workflows

57 Tool for working with messy big data, cleaning it and transforming it. Import data in different formats Explore datasets Apply advanced cell transformations Refine inconsistencies Filter and partition your big data

58 Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies 5. Conclusions

59 Social Awareness Based Emergency Situation Solver Objective: To create event assessment and decision-making supporting tools which improve quickness and efficiency when facing emergency situations making. Exploit the information available in Social Networks to complement data about emergency situations Real-time processing

60 Alert detection Locations Information Attached resources (photo, video, links, )

61 Static stations and mobile sensors in Asturias sending streaming data Historical data of > 10 years Monitoring, trends identification, predictions Batch processing + Real processing+ Hybrid computation

62 Quantum Mechanics Molecular Dynamics Computer simulation of physical movements of microscopic elements Large amount of data as streaming in each time-step Real-time interaction (query, visual exploration) during the simulation Data analytics on the whole dataset Real time processing + Batch processing + Hybrid computation

63 Agenda 1. Big Data processing 2. Lambdoop framework 3. Lambdoop ecosystem 4. Case studies 5. Conclusions

64 Conclusions Big Data is not only batch processing To implement a Lambda Architecture is not trivial Lambdoop: Big Data made easy High abstraction layer for all processing model All steps in the data processing pipeline Same Java API for all programing paradigms Extensible

65 Roadmap Conclusions Now Next Release a early version of Lambdoop Framework as Open Source Get feedback from the community Increase the set of built-in functions o o Move all components to YARN Stable versions of Lambdoop ecosystem o Models (Mahout, Jubatus, Samoa, R) Beyond Configurable processing engines (Spark, S4, Samza ) Configurable data stores (Cassandra, MongoDB, ElephantDB, VoltDB )

66 If you want stay tuned about Lambdoop register in @treelogic

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

Building Scalable Big Data Pipelines

Building Scalable Big Data Pipelines Building Scalable Big Data Pipelines NOSQL SEARCH ROADSHOW ZURICH Christian Gügi, Solution Architect 19.09.2013 AGENDA Opportunities & Challenges Integrating Hadoop Lambda Architecture Lambda in Practice

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with

More information

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org

The Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora {mbalassi, gyfora}@apache.org The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH

Real-time Data Analytics mit Elasticsearch. Bernhard Pflugfelder inovex GmbH Real-time Data Analytics mit Elasticsearch Bernhard Pflugfelder inovex GmbH Bernhard Pflugfelder Big Data Engineer @ inovex Fields of interest: search analytics big data bi Working with: Lucene Solr Elasticsearch

More information

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385 brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Peers Techno log ies Pv t. L td. HADOOP

Peers Techno log ies Pv t. L td. HADOOP Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate

More information

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas

Apache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

Upcoming Announcements

Upcoming Announcements Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within

More information

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015 Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours

More information

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid

More information

The 4 Pillars of Technosoft s Big Data Practice

The 4 Pillars of Technosoft s Big Data Practice beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce

More information

Hadoop & Spark Using Amazon EMR

Hadoop & Spark Using Amazon EMR Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Qsoft Inc www.qsoft-inc.com

Qsoft Inc www.qsoft-inc.com Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:

More information

Real-time Big Data Analytics with Storm

Real-time Big Data Analytics with Storm Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap

More information

Unified Batch & Stream Processing Platform

Unified Batch & Stream Processing Platform Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

A Scalable Data Transformation Framework using the Hadoop Ecosystem

A Scalable Data Transformation Framework using the Hadoop Ecosystem A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional

More information

Hadoop Job Oriented Training Agenda

Hadoop Job Oriented Training Agenda 1 Hadoop Job Oriented Training Agenda Kapil CK hdpguru@gmail.com Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Introduction to Big Data and the Lambda Architecture

Introduction to Big Data and the Lambda Architecture Introduction to Big Data and the Lambda Architecture Marc Schöni Meinrad Weiss April 2014 BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 1 What

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS Enterprise Data Problems in Investment Banks BigData History and Trend Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. 3548 Hypothetical

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements

More information

HADOOP. Revised 10/19/2015

HADOOP. Revised 10/19/2015 HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...

More information

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture. Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011 Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball odiago, inc. OSCON Data 2011 The plan Background: Flume introduction The need for online analytics Introducing FlumeBase Demo! FlumeBase

More information

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must

More information

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS

3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS . 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade

More information

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM 1. Introduction 1.1 Big Data Introduction What is Big Data Data Analytics Bigdata Challenges Technologies supported by big data 1.2 Hadoop Introduction

More information

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

COURSE CONTENT Big Data and Hadoop Training

COURSE CONTENT Big Data and Hadoop Training COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Map Reduce & Hadoop Recommended Text:

Map Reduce & Hadoop Recommended Text: Big Data Map Reduce & Hadoop Recommended Text:! Large datasets are becoming more common The New York Stock Exchange generates about one terabyte of new trade data per day. Facebook hosts approximately

More information

Creating Big Data Applications with Spring XD

Creating Big Data Applications with Spring XD Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Openbus Documentation

Openbus Documentation Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Stream Processing on Demand for Lambda Architectures

Stream Processing on Demand for Lambda Architectures Madrid, 2015-09-01 Stream Processing on Demand for Lambda Architectures European Workshop on Performance Engineering (EPEW) 2015 Johannes Kroß 1, Andreas Brunnert 1, Christian Prehofer 1, Thomas A. Runkler

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

Predictive Analytics with Storm, Hadoop, R on AWS

Predictive Analytics with Storm, Hadoop, R on AWS Douglas Moore Principal Consultant & Architect February 2013 Predictive Analytics with Storm, Hadoop, R on AWS Leading Provider Data Science and Engineering Services Accelerating Your Time to Value using

More information

Streaming items through a cluster with Spark Streaming

Streaming items through a cluster with Spark Streaming Streaming items through a cluster with Spark Streaming Tathagata TD Das @tathadas CME 323: Distributed Algorithms and Optimization Stanford, May 6, 2015 Who am I? > Project Management Committee (PMC) member

More information

Data Services Advisory

Data Services Advisory Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian Tzolov @christzolov Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA by Christian Tzolov @christzolov Whoami Christian Tzolov Technical Architect at Pivotal, BigData, Hadoop, SpringXD,

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Scalable Network Measurement Analysis with Hadoop. Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL

Scalable Network Measurement Analysis with Hadoop. Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL Scalable Network Measurement Analysis with Hadoop Taghrid Samak and Daniel Gunter Advanced Computing for Sciences, LBNL Outline Motivation Hadoop overview Approach doing the right thing, Avro what worked,

More information

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah

Pro Apache Hadoop. Second Edition. Sameer Wadkar. Madhu Siddalingaiah Pro Apache Hadoop Second Edition Sameer Wadkar Madhu Siddalingaiah Contents J About the Authors About the Technical Reviewer Acknowledgments Introduction xix xxi xxiii xxv Chapter 1: Motivation for Big

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant

1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant 1.5 Million Log Lines per Second Building and maintaining Flume flows at Conversant Big Data Everywhere Chicago 2014 Mike Keane mkeane@conversant.com SLA for Event Driven Logging R with Flume Quicker insight

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Hybrid Software Architectures for Big Data. Laurence.Hubert@hurence.com @hurence http://www.hurence.com

Hybrid Software Architectures for Big Data. Laurence.Hubert@hurence.com @hurence http://www.hurence.com Hybrid Software Architectures for Big Data Laurence.Hubert@hurence.com @hurence http://www.hurence.com Headquarters : Grenoble Pure player Expert level consulting Training R&D Big Data X-data hot-line

More information

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @ The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...

More information

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

Complete Java Classes Hadoop Syllabus Contact No: 8888022204

Complete Java Classes Hadoop Syllabus Contact No: 8888022204 1) Introduction to BigData & Hadoop What is Big Data? Why all industries are talking about Big Data? What are the issues in Big Data? Storage What are the challenges for storing big data? Processing What

More information

the missing log collector Treasure Data, Inc. Muga Nishizawa

the missing log collector Treasure Data, Inc. Muga Nishizawa the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

Hadoop: The Definitive Guide

Hadoop: The Definitive Guide FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Google Bing Daytona Microsoft Research

Google Bing Daytona Microsoft Research Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large

More information

Real Time Data Processing using Spark Streaming

Real Time Data Processing using Spark Streaming Real Time Data Processing using Spark Streaming Hari Shreedharan, Software Engineer @ Cloudera Committer/PMC Member, Apache Flume Committer, Apache Sqoop Contributor, Apache Spark Author, Using Flume (O

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure

More information