Real-time Big Data Analytics with Storm
|
|
- Warren Cummings
- 8 years ago
- Views:
Transcription
1 Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm
2 Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap ILLUMINATE Training and Education IMPLEMENT Hands-On Data Science and Data Engineering No SQL Roadshow 2
3 Agenda What is Storm? When is Storm appropriate? API Options Use Cases No SQL Roadshow 3
4 Storm Overview DAG Processing of never ending streams of data - Open Source: - Used at Twitter plus > 24 other companies - Reliable - At Least Once semantics - Think MapReduce for data streams - Java / Clojure based - Bolts in Java and Shell Bolts - Not a queue, but usually reads from a queue. Related: - S4, CEP Compromises - Static topologies & cluster sizing Avoid messy dynamic rebalancing - Nimbus SPOF Strong Community Support, emerging commercial support No SQL Roadshow 4
5 Storm Concepts Cluster - Nimbus - Zookeeper - Supervisor, Workers Topology Streams Tuple Spout Bolt Stream Groupings - Shuffle, Field Trident DRPC No SQL Roadshow 5
6 What is Real-Time? Low latency - Query response - Data refresh - End-to-end response nanoseconds, milliseconds, seconds, or minutes depending on your problem No SQL Roadshow 6
7 Why Real-time? Better end-user experience - Ex: View an ad, see the counter move. Operational intelligence - Low latency analysis - Operational intelligence Event response - Content half life measured at 3 hours (H Mason: - Path to additional real-time capabilities - Scalable analysis - Example: Trend analysis to recommend hot articles. No SQL Roadshow 7
8 Why Storm? Scalable way to manage queue readers and logic pipeline Much better than roll your own Reliable (Message guarantees, fault tolerant) Multi-node scaling (1MM messages / 10 nodes) It works Open source community See also: No SQL Roadshow 8
9 Programming Options Raw Java API (tuples and raw computation) Trident Java DSL (SQL-like primitives) Python adapter Closure DSL RedStorm Ruby Wukong Ruby Trident-ML Storm-Esper No SQL Roadshow 9
10 Use Cases Scale analysis pipeline Lively stats Recommendations Better predictions Realtime analytics Online machine learning Distributed RPC No SQL Roadshow 10
11 Overall Architecture Monitoring & Alerting (Metrics, Alarms, Notifications) Ad Serving Impression View Ad LB Edge Edge Events Queue Memcache (Tuple fail tracking) Storm - Queue Management - Simple Bot Filtering - Real-time Bucketization - Performance Counters - Event Logging Archive Logs S3 S3S3 DFS Management Server Edge NoSQL Hadoop Ad Management Ad Selling LB Edge Performance Counters Impression Buckets Cleansing Model Training Recommendations getprediction Relational Store Model Parameters No SQL Roadshow 11
12 Integration Patterns No SQL Roadshow 12
13 Analytics Architecture Storm Impression Spout Simple Bot Annotator DFS Adapter Time Series BucketBolt NoSQL Bolt Impressions Impressions Impressions Time Series Buckets (Batch) Predictive Model Parameters Time Series Buckets (Realtime) Hadoop Impression Bucketization Predictive Model Training Web Server Impression Prediction Time No SQL Roadshow 13
14 Solution Approach Analyze Massive Historical Data Set Analyze Recent Past Realtime Prediction Historical Data Set = S3 Analyze = Hadoop + Pig + R Recent Past = Storm + NoSQL Analyze = R + Web Service No SQL Roadshow 14
15 Example: Real-Time Recommendation Updates Web Servers Batch Log Feed Read & Update Batch Model Feed NoSQL Database Ad Hoc Queries Requests read single user profile May update that profile record Classic key/value store problem Can split transient state into read/write NoSQL with batch view NoSQL Classic BI Access Hadoop Cluster Batch Export MPP Database No SQL Roadshow 15
16 Real-Time Recommendation Updates w/ Storm Web Servers Batch Log Feed Batch Model Feed Storm NoSQL Database Ad Hoc Queries More complex recommendation logic Parallelize analysis across topology Processing benefits from distributed logic of streaming Classic BI Access Hadoop Cluster Batch Export MPP Database No SQL Roadshow 16
17 Social Updates (real-time news feed) Web Servers Batch Log Feed Batch Model Feed Storm NoSQL Database Ad Hoc Queries Pages now integrate many profile records Fan-out-on-read or Fan-out-on-write Processing benefits from distributed logic of streaming Classic BI Access Hadoop Cluster Batch Export MPP Database No SQL Roadshow 17
18 Finance: Trade Compliance Monitoring Market Data Feed Order Mgmt System Batch Log Feed Hadoop Cluster Batch Model Feed Storm NoSQL Database Ad Hoc Queries Storm reduces market data to usable subset Integrates high, medium and low frequency data Real-time Storm filter triggers deeper search via NoSQL Batch Export Classic BI Access MPP Database No SQL Roadshow 18
19 Operational Intelligence: Stats and Search Web Servers Sys Logs Storm Search & NoSQL DB Cache and microbatch updates to multidimensional stats Support distributed queries Cache updates Batch System Log Feed Batch Summary Views Ad Hoc Queries Hadoop Cluster No SQL Roadshow 19
20 Lessons Easy to develop, hard to debug start locally - Timeouts Storm infinite loop of failures - Use Memcached to count # tuple failures At Least Once processing - Hadoop based read-repair job Performance Counters not getting flushed - Tick Tuples Always ACK Batching to S3 - Run a compaction & event de-duplication job in Hadoop No SQL Roadshow 20
21 Conclusions There s many kinds of real-time problems Use of Hadoop and/or NoSQL can solve - Low latency queries - Event response with localized intelligence - Operational intelligence Storm is valuable for - Ingesting data within seconds - Complex real-time distributed logic - Operational intelligence We re Hiring! 6/12/13 21 No SQL Roadshow 21
Predictive Analytics with Storm, Hadoop, R on AWS
Douglas Moore Principal Consultant & Architect February 2013 Predictive Analytics with Storm, Hadoop, R on AWS Leading Provider Data Science and Engineering Services Accelerating Your Time to Value using
More informationIntroducing Storm 1 Core Storm concepts Topology design
Storm Applied brief contents 1 Introducing Storm 1 2 Core Storm concepts 12 3 Topology design 33 4 Creating robust topologies 76 5 Moving from local to remote topologies 102 6 Tuning in Storm 130 7 Resource
More informationApache Storm vs. Spark Streaming Two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming Two Stream Platforms compared DBTA Workshop on Stream Berne, 3.1.014 Guido Schmutz BASEL BERN BRUGG LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationOpenbus Documentation
Openbus Documentation Release 1 Produban February 17, 2014 Contents i ii An open source architecture able to process the massive amount of events that occur in a banking IT Infraestructure. Contents:
More informationData Stream Algorithms in Storm and R. Radek Maciaszek
Data Stream Algorithms in Storm and R Radek Maciaszek Who Am I? l Radek Maciaszek l l l l l l Consul9ng at DataMine Lab (www.dataminelab.com) - Data mining, business intelligence and data warehouse consultancy.
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationHADOOP. Revised 10/19/2015
HADOOP Revised 10/19/2015 This Page Intentionally Left Blank Table of Contents Hortonworks HDP Developer: Java... 1 Hortonworks HDP Developer: Apache Pig and Hive... 2 Hortonworks HDP Developer: Windows...
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationBig Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island
Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm
More informationCAPTURING & PROCESSING REAL-TIME DATA ON AWS
CAPTURING & PROCESSING REAL-TIME DATA ON AWS @ 2015 Amazon.com, Inc. and Its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationArchitectures for massive data management
Architectures for massive data management Apache Kafka, Samza, Storm Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Stream Engine Motivation Digital Universe EMC Digital Universe with
More informationScaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationGetting Real Real Time Data Integration Patterns and Architectures
Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture
More informationPerformance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
More informationSQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Source Software Sudhir Tonse (@stonse) Danny Yuan (@g9yuayon) Netflix is a log generating company that also happens to stream movies
More informationUnified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationIntroduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.
Big Data Hadoop Administration and Developer Course This course is designed to understand and implement the concepts of Big data and Hadoop. This will cover right from setting up Hadoop environment in
More informationApache HBase. Crazy dances on the elephant back
Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationBIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane
BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements
More informationSimplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationSystems Engineering II. Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de
Systems Engineering II Pramod Bhatotia TU Dresden pramod.bhatotia@tu- dresden.de About me! Since May 2015 2015 2012 Research Group Leader cfaed, TU Dresden PhD Student MPI- SWS Research Intern Microsoft
More informationBigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
More informationDRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING
DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING Supreet Oberoi VP Field Engineering, Concurrent Inc GET TO KNOW CONCURRENT Leader in Application Infrastructure
More informationHybrid Software Architectures for Big Data. Laurence.Hubert@hurence.com @hurence http://www.hurence.com
Hybrid Software Architectures for Big Data Laurence.Hubert@hurence.com @hurence http://www.hurence.com Headquarters : Grenoble Pure player Expert level consulting Training R&D Big Data X-data hot-line
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationOperations and Big Data: Hadoop, Hive and Scribe. Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011
Operations and Big Data: Hadoop, Hive and Scribe Zheng Shao @ 铮 9 12/7/2011 Velocity China 2011 Agenda 1 Operations: Challenges and Opportunities 2 Big Data Overview 3 Operations with Big Data 4 Big Data
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationThe Current State of Big Data Analysis Technology Trends. 3.1 The Current State of Big Data. 3. Technology Trends
3. The Current State of Big Data Analysis In this report we discuss the current state of big data, which is estimated to already exist in the exabyte range. We also examine trends in analysis platform
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationSTREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day
STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA Processing billions of events every day Neha Narkhede Co-founder and Head of Engineering @ Stealth Startup Prior to this Lead, Streams Infrastructure
More informationthe missing log collector Treasure Data, Inc. Muga Nishizawa
the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days
More informationTHE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationDelivering Intelligence to Publishers Through Big Data
Delivering Intelligence to Publishers Through Big Data 2015-05- 21 Jonathan Sharley Team Lead, Data Operations www.sovrn.com Who is Sovrn? Ø An advertising network with direct relationships to 20,000+
More informationSession 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica jcampbell@vertica.com Big
More informationBIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
More informationFast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture
Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Adeniyi Abdul 2522715 Agenda Abstract Introduction
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationIntegrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
More informationReal-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011
Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball odiago, inc. OSCON Data 2011 The plan Background: Flume introduction The need for online analytics Introducing FlumeBase Demo! FlumeBase
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationHadoop: The Definitive Guide
FOURTH EDITION Hadoop: The Definitive Guide Tom White Beijing Cambridge Famham Koln Sebastopol Tokyo O'REILLY Table of Contents Foreword Preface xvii xix Part I. Hadoop Fundamentals 1. Meet Hadoop 3 Data!
More informationBig Data Analysis: Apache Storm Perspective
Big Data Analysis: Apache Storm Perspective Muhammad Hussain Iqbal 1, Tariq Rahim Soomro 2 Faculty of Computing, SZABIST Dubai Abstract the boom in the technology has resulted in emergence of new concepts
More informationA stream computing approach towards scalable NLP
A stream computing approach towards scalable NLP Xabier Artola, Zuhaitz Beloki, Aitor Soroa IXA group. University of the Basque Country. LREC, Reykjavík 2014 Table of contents 1
More informationUnified Big Data Analytics Pipeline. 连 城 lian@databricks.com
Unified Big Data Analytics Pipeline 连 城 lian@databricks.com What is A fast and general engine for large-scale data processing An open source implementation of Resilient Distributed Datasets (RDD) Has an
More informationBig data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel
Big data platform for IoT Cloud Analytics Chen Admati, Advanced Analytics, Intel Agenda IoT @ Intel End-to-End offering Analytics vision Big data platform for IoT Cloud Analytics Platform Capabilities
More informationBig Data Advanced Analytics for Game Monetization. Kimberly Chulis
Big Data Advanced Analytics for Game Monetization Kimberly Chulis CEO Core Analytics, LLC Core Analytics / Game Loyalty Bay area and Chicago based digital advanced analytics firm Big Data / NoSQL Advanced
More informationThe Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
More information3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS
. 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade
More informationBringing Big Data into the Enterprise
Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?
More informationPrepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationUsing an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
More informationINTRODUCING APACHE IGNITE An Apache Incubator Project
WHITE PAPER BY GRIDGAIN SYSTEMS FEBRUARY 2015 INTRODUCING APACHE IGNITE An Apache Incubator Project COPYRIGHT AND TRADEMARK INFORMATION 2015 GridGain Systems. All rights reserved. This document is provided
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationHybrid Solutions Combining In-Memory & SSD
Hybrid Solutions Combining In-Memory & SSD Author: christos@gigaspaces.com Agenda 1 2 3 4 Overview of the big data technology landscape Building a high-speed SSD-backed data store Complex & compound queries
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationBig Data Analytics for Cyber
Big Data Analytics for Cyber AFCEA International Cyber Symposium June 24, 2014 Jon Lau, Vice President and CTO UMBC Training Centers 6/26/2014 umbctraining.com 443-692-6600 1 Agenda About UMBC & UMBC Training
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationBig Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
More informationBig Data Introduction
Big Data Introduction Ralf Lange Global ISV & OEM Sales 1 Copyright 2012, Oracle and/or its affiliates. All rights Conventional infrastructure 2 Copyright 2012, Oracle and/or its affiliates. All rights
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationBIG DATA HADOOP TRAINING
BIG DATA HADOOP TRAINING DURATION 40hrs AVAILABLE BATCHES WEEKDAYS (7.00AM TO 8.30AM) & WEEKENDS (10AM TO 1PM) MODE OF TRAINING AVAILABLE ONLINE INSTRUCTOR LED CLASSROOM TRAINING (MARATHAHALLI, BANGALORE)
More informationHADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics
HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics ESSENTIALS EMC ISILON Use the industry's first and only scale-out NAS solution with native Hadoop
More informationYARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing
YARN, the Apache Hadoop Platform for Streaming, Realtime and Batch Processing Eric Charles [http://echarles.net] @echarles Datalayer [http://datalayer.io] @datalayerio FOSDEM 02 Feb 2014 NoSQL DevRoom
More informationHadoop 101. Lars George. NoSQL- Ma4ers, Cologne April 26, 2013
Hadoop 101 Lars George NoSQL- Ma4ers, Cologne April 26, 2013 1 What s Ahead? Overview of Apache Hadoop (and related tools) What it is Why it s relevant How it works No prior experience needed Feel free
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationBig Data Analytics! Architectures, Algorithms and Applications! Part #3: Analytics Platform
Big Data Analytics! Architectures, Algorithms and Applications! Part #3: Analytics Platform Simon Wu! HTC (Prior: Twitter & Microsoft)! Edward Chang 張 智 威 HTC (prior: Google & U. California)! 1/26/2015
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationCOURSE CONTENT Big Data and Hadoop Training
COURSE CONTENT Big Data and Hadoop Training 1. Meet Hadoop Data! Data Storage and Analysis Comparison with Other Systems RDBMS Grid Computing Volunteer Computing A Brief History of Hadoop Apache Hadoop
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationApache Kafka Your Event Stream Processing Solution
01 0110 0001 01101 Apache Kafka Your Event Stream Processing Solution White Paper www.htcinc.com Contents 1. Introduction... 2 1.1 What are Business Events?... 2 1.2 What is a Business Data Feed?... 2
More information