Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Case Study: Real-time Analytics With Druid. Salil Kalia, Tech Lead, TO THE NEW Digital"

Transcription

1 Case Study: Real-time Analytics With Druid Salil Kalia, Tech Lead, TO THE NEW Digital

2 Agenda Understanding the use-case Ad workflow Our use case Experiments with technologies Redis Cassandra Introduction to Druid Architecture Druid in production Demo

3 Understanding the use-case

4 What Is Analytics? Processing the HISTORICAL data to: Understand potential trends Analyze the effects of certain decisions or events Evaluate the performance of a system Make better business decisions

5 What Is Real-time Analytics?

6 Understanding The Ad Workflow Web Page Request PUBLISHER SERVER USER Ad Request Ad-Content AD EXCHANGE AD AGENCY-2 AD AGENCY-1 AD AGENCY-3

7 Examples From Our Use Case How many times a video has been viewed in a particular time-span? in a particular time-span at a particular site? in a particular time-span at a particular site in a particular country? in a particular time-span at a particular site in a particular country on a particular device?

8 Let s play a video ad

9 Video Events For The Analysis LOAD START PLAYING VIEW STOP / PAUSE FINISH

10 Event Data (Sample) TIMESTAMP Ad Site Advertiser Event Action T01:01:27 Z T01:01:33 Z T01:01:40 Z T01:01:45 Z T01:01:50 Z T01:01:51 Z 123 abc.com Brand X Player Load 234 abcd.com Brand Y Player Load 123 abc.com Brand X Player Start 123 abc.com Brand X Player Playing 123 abc.com Brand Y Player Playing 123 abc.com Brand X Player Stop

11 Why Real-time Analytics? Understand the real-time performance Control the velocity Control the targeting Avoid over serving Avoid under serving

12 Recap Things We Understood Our use-case How the ad-tech works (in general) Different video player events We are expecting a huge amount of data coming at a very high velocity.

13 Experiments with technologies

14 Experience From Redis There was a huge variety of keys all over the place Not a good fit to deal with time-series (big) data Persistence is another issue we can t afford loosing data. Not a right match for our use-case

15 Conclusion From Redis Never blame Redis It was too early decision Our misunderstanding with the real use-case Thanks to Redis to help us understanding our requirements, very soon.

16 Working With Cassandra Very good support for the time-series data Extremely good for writing the data at a very high speed Very easy to scale horizontally Supports aggregations through Counters

17 Writing into Cassandra AD PLAYER ANALYTICS SERVER CASSANDRA

18 Reading from Cassandra ANALYTICS SERVER CAMPAIGN MANAGER CASSANDRA

19 What didn t work with Cassandra Inconsistent results Unreliable counters No ad-hoc queries support Nodes were crashing out very frequently

20 Crossroads What next? Third party tools on the top of Cassandra for better consistency DataStax Enterprise edition Taking a deeper dive into Cassandra to reconfigure the whole architecture and setup Switching to different technology

21 Understanding druid

22 About Druid An open-source analytics data store Supports streaming - data ingestion Flexible filters for ad-hoc queries Fast aggregations sub second queries Distributed, shared-nothing architecture Highly scalable

23 Setting Up Druid In Production KAFKA (CLUSTER) AD PLAYER ANALYTICS SERVER DRUID CLUSTER CASSANDRA

24 Druid s Reliability Check KAFKA (CLUSTER) DRUID CLUSTER AD PLAYER ANALYTICS SERVER RAW FILES Job To Test Druid s Integrity RAW FILE CONSUME R RAW FILES RAW FILES

25 A Quick Demo

26 Druid Architecture Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R BROKER NODES Client Queries DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData

27 Druid Data Ingestion Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R BROKER NODES Client Queries DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData

28 Druid Data Ingestion KAFKA (CLUSTER) AD PLAYER ANALYTICS SERVER DRUID Realtime Node

29 Druid Data Retrieval Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R BROKER NODES Client Queries DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData

30 Druid Data Coordination Druid Nodes External Dependencies Steaming Data REAL TIME NODES MY SQL COORDINATO R NODES ZOOKEEPE R DEEP STORAGE HISTORICA L NODES Queries Data/Segments MetaData

31 COMPANIES USING DRUID

32 Questions?

33

Time series IoT data ingestion into Cassandra using Kaa

Time series IoT data ingestion into Cassandra using Kaa Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka ashvayka@cybervisiontech.com Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging

More information

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM An Overview Contents Contents... 1 BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM... 1 Program Overview... 4 Curriculum... 5 Module 1: Big Data: Hadoop

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform

STREAM ANALYTIX. Industry s only Multi-Engine Streaming Analytics Platform STREAM ANALYTIX Industry s only Multi-Engine Streaming Analytics Platform One Platform for All Create real-time streaming data analytics applications in minutes with a powerful visual editor Get a wide

More information

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon. Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new

More information

Information Retrieval Elasticsearch

Information Retrieval Elasticsearch Information Retrieval Elasticsearch IR Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches

More information

From Spark to Ignition:

From Spark to Ignition: From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for

More information

Integrating VoltDB with Hadoop

Integrating VoltDB with Hadoop The NewSQL database you ll never outgrow Integrating with Hadoop Hadoop is an open source framework for managing and manipulating massive volumes of data. is an database for handling high velocity data.

More information

Dashboard Engine for Hadoop

Dashboard Engine for Hadoop Matt McDevitt Sr. Project Manager Pavan Challa Sr. Data Engineer June 2015 Dashboard Engine for Hadoop Think Big Start Smart Scale Fast Agenda Think Big Overview Engagement Model Solution Offerings Dashboard

More information

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Using distributed technologies to analyze Big Data

Using distributed technologies to analyze Big Data Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT 2014 Cisco and/or its affiliates. All rights reserved. 2 2014 Cisco and/or its affiliates. All rights reserved. 3 IoT World Forum Architecture Committee 2013 Cisco and/or

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the

More information

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS MICHAEL E. DRISCOLL CEO @ METAMARKETS - @MEDRISCOLL Metamarkets is the bridge from

More information

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment

More information

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya

Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming. by Dibyendu Bhattacharya Near Real Time Indexing Kafka Message to Apache Blur using Spark Streaming by Dibyendu Bhattacharya Pearson : What We Do? We are building a scalable, reliable cloud-based learning platform providing services

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity

EXECUTIVE REPORT. Big Data and the 3 V s: Volume, Variety and Velocity EXECUTIVE REPORT Big Data and the 3 V s: Volume, Variety and Velocity The three V s are the defining properties of big data. It is critical to understand what these elements mean. The main point of the

More information

Customized Report- Big Data

Customized Report- Big Data GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.

More information

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

SELF-SERVICE DATA LAKES ON HADOOP

SELF-SERVICE DATA LAKES ON HADOOP SELF-SERVICE DATA LAKES ON HADOOP Introduction A recent Gartner survey on Hadoop cited the two biggest challenges in working with Hadoop: Skills gaps continue to be a major adoption inhibitor for 57% of

More information

Big Data Pipeline and Analytics Platform

Big Data Pipeline and Analytics Platform Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Source Software Sudhir Tonse (@stonse) Danny Yuan (@g9yuayon) Netflix is a log generating company that also happens to stream movies

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

NoSQL Databases. Polyglot Persistence

NoSQL Databases. Polyglot Persistence The future is: NoSQL Databases Polyglot Persistence a note on the future of data storage in the enterprise, written primarily for those involved in the management of application development. Martin Fowler

More information

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO ANTHONY A. KALINDE SIGMA DATA SCIENCE GROUP ASSOCIATE "REALTIME BEHAVIOURAL DATA COLLECTION CLICKSTREAM EXAMPLE" WHAT IS CLICKSTREAM ANALYTICS?

More information

MakeMyTrip CUSTOMER SUCCESS STORY

MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip CUSTOMER SUCCESS STORY MakeMyTrip is the leading travel site in India that is running two ClustrixDB clusters as multi-master in two regions. It removed single point of failure. MakeMyTrip frequently

More information

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore

Architecting Open source solutions on Azure. Nicholas Dritsas Senior Director, Microsoft Singapore Learn. Connect. Explore. Architecting Open source solutions on Azure Nicholas Dritsas Senior Director, Microsoft Singapore Agenda Developing OSS Apps on Azure Customer case with OSS Apps Hadoop on Azure

More information

Adobe Anywhere for video Collaborate without boundaries

Adobe Anywhere for video Collaborate without boundaries Introducing Adobe Anywhere for video Collaborate without boundaries As a professional video producer, you face a growing demand for more content on a widening variety of devices and platforms. Today s

More information

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Connecting the World Through Games Zynga Analytics Leveraging Big Data to Make Games More Fun and Social Daniel McCaffrey General Manager, Platform and Analytics Engineering World s leading social game

More information

Designing a Scalable Database for Online Video Analytics

Designing a Scalable Database for Online Video Analytics Designing a Scalable Database for Online Video Analytics White Paper Ooyala I 800 West El Camino Real, Suite 350 Mountain View CA 94040 I 1-877-366-9252 I www.ooyala.com Online Video Analytics A deep understanding

More information

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015

SEIZE THE DATA. 2015 SEIZE THE DATA. 2015 1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation

More information

Time-Series Databases and Machine Learning

Time-Series Databases and Machine Learning Time-Series Databases and Machine Learning Jimmy Bates November 2017 1 Top-Ranked Hadoop 1 3 5 7 Read Write File System World Record Performance High Availability Enterprise-grade Security Distribution

More information

Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture

Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Adeniyi Abdul 2522715 Agenda Abstract Introduction

More information

Big Data Infrastructure at Spotify

Big Data Infrastructure at Spotify Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure June 12, 2013 2 Agenda Let s talk about Data Infrastructure, how we did it, what we learned and how we ve failed Some Context

More information

WSO2 Message Broker. Scalable persistent Messaging System

WSO2 Message Broker. Scalable persistent Messaging System WSO2 Message Broker Scalable persistent Messaging System Outline Messaging Scalable Messaging Distributed Message Brokers WSO2 MB Architecture o Distributed Pub/sub architecture o Distributed Queues architecture

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information

ANALYTICS BUILT FOR INTERNET OF THINGS

ANALYTICS BUILT FOR INTERNET OF THINGS ANALYTICS BUILT FOR INTERNET OF THINGS Big Data Reporting is Out, Actionable Insights are In In recent years, it has become clear that data in itself has little relevance, it is the analysis of it that

More information

The RADStack: Open Source Lambda Architecture for Interactive Analytics

The RADStack: Open Source Lambda Architecture for Interactive Analytics The RADStack: Open Source Lambda Architecture for Interactive Analytics Fangjin Yang Imply Data, Inc. fj@imply.io Gian Merlino Imply Data, Inc. gian@imply.io Xavier Léauté Metamarkets Group, Inc. xavier@metamarkets.com

More information

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK

SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK SIMPLE MACHINE HEURISTIC INTELLIGENT AGENT FRAMEWORK Simple Machine Heuristic (SMH) Intelligent Agent (IA) Framework Tuesday, November 20, 2011 Randall Mora, David Harris, Wyn Hack Avum, Inc. Outline Solution

More information

http://glennengstrand.info/analytics/fp

http://glennengstrand.info/analytics/fp Functional Programming and Big Data by Glenn Engstrand (September 2014) http://glennengstrand.info/analytics/fp What is Functional Programming? It is a style of programming that emphasizes immutable state,

More information

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D. Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

More information

Architecting Distributed Databases for Failure A Case Study with Druid

Architecting Distributed Databases for Failure A Case Study with Druid Architecting Distributed Databases for Failure A Case Study with Druid Fangjin Yang Cofounder @ Imply The Bad The Really Bad Overview The Catastrophic Best Practices: Operations Everything is going to

More information

Real-time Big Data Analytics with Storm

Real-time Big Data Analytics with Storm Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap

More information

Reusable Data Access Patterns

Reusable Data Access Patterns Reusable Data Access Patterns Gary Helmling, Software Engineer @gario HBaseCon 2015 - May 7 Agenda A brief look at data storage challenges How these challenges have influenced our work at Cask Exploration

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers

Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers Innovative, High-Density, Massively Scalable Packet Capture and Cyber Analytics Cluster for Enterprise Customers The Enterprise Packet Capture Cluster Platform is a complete solution based on a unique

More information

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation

More information

Maximize MicroStrategy Speed and Throughput with High Performance Tuning

Maximize MicroStrategy Speed and Throughput with High Performance Tuning Maximize MicroStrategy Speed and Throughput with High Performance Tuning Jochen Demuth, Director Partner Engineering Maximize MicroStrategy Speed and Throughput with High Performance Tuning Agenda 1. Introduction

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010

Microsoft Services Exceed your business with Microsoft SharePoint Server 2010 Microsoft Services Exceed your business with Microsoft SharePoint Server 2010 Business Intelligence Suite Alexandre Mendeiros, SQL Server Premier Field Engineer January 2012 Agenda Microsoft Business Intelligence

More information

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation

More information

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015

Leveraging the Power of SOLR with SPARK. Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Leveraging the Power of SOLR with SPARK Johannes Weigend QAware GmbH Germany pache Big Data Europe September 2015 Welcome Johannes Weigend - CTO QAware GmbH - Software architect / developer - 25 years

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Wisdom from Crowds of Machines

Wisdom from Crowds of Machines Wisdom from Crowds of Machines Analytics and Big Data Summit September 19, 2013 Chetan Conikee Irfan Ahmad About Us CloudPhysics' mission is to discover the underlying principles that govern systems behavior

More information

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011

Real-time Streaming Analysis for Hadoop and Flume. Aaron Kimball odiago, inc. OSCON Data 2011 Real-time Streaming Analysis for Hadoop and Flume Aaron Kimball odiago, inc. OSCON Data 2011 The plan Background: Flume introduction The need for online analytics Introducing FlumeBase Demo! FlumeBase

More information

Apache Cassandra for Big Data Applications

Apache Cassandra for Big Data Applications Apache Cassandra for Big Data Applications Christof Roduner COO and co-founder christof@scandit.com Java User Group Switzerland January 7, 2014 2 AGENDA Cassandra origins and use How we use Cassandra Data

More information

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led

Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Microsoft Enterprise Search for IT Professionals Course 10802A; 3 Days, Instructor-led Course Description This three day course prepares IT Professionals to administer enterprise search solutions using

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Enterprise Operational SQL on Hadoop Trafodion Overview

Enterprise Operational SQL on Hadoop Trafodion Overview Enterprise Operational SQL on Hadoop Trafodion Overview Rohit Jain Distinguished & Chief Technologist Strategic & Emerging Technologies Enterprise Database Solutions Copyright 2012 Hewlett-Packard Development

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Turn Big Data to Small Data

Turn Big Data to Small Data Turn Big Data to Small Data Use Qlik to Utilize Distributed Systems and Document Databases October, 2014 Stig Magne Henriksen Image: kdnuggets.com From Big Data to Small Data Agenda When do we have a Big

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015 Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a

More information

Abstraction of a failure free Software Defined Network (SDN Application)

Abstraction of a failure free Software Defined Network (SDN Application) Abstraction of a failure free Software Defined Network (SDN Application) Divya Chaudhary & Sumit Kapoor April 2015 1 Copyright 2015 Tata Consultancy Services Limited Agenda 1 What and How of abstracting

More information

Data Pipeline with Kafka

Data Pipeline with Kafka Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66 AGENDA Big Data & Data Pipeline Kafka Introduction

More information

Dell* In-Memory Appliance for Cloudera* Enterprise

Dell* In-Memory Appliance for Cloudera* Enterprise Built with Intel Dell* In-Memory Appliance for Cloudera* Enterprise Find out what faster big data analytics can do for your business The need for speed in all things related to big data is an enormous

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

Real-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai jason.dai@intel.com Intel Software and Services Group

Real-Time Analytical Processing (RTAP) Using the Spark Stack. Jason Dai jason.dai@intel.com Intel Software and Services Group Real-Time Analytical Processing (RTAP) Using the Spark Stack Jason Dai jason.dai@intel.com Intel Software and Services Group Project Overview Research & open source projects initiated by AMPLab in UC Berkeley

More information

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015 Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours

More information

Creating Big Data Applications with Spring XD

Creating Big Data Applications with Spring XD Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic

BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop

More information

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave

More information

Methods and Technologies for Business Process Monitoring

Methods and Technologies for Business Process Monitoring Methods and Technologies for Business Monitoring Josef Schiefer Vienna, June 2005 Agenda» Motivation/Introduction» Real-World Examples» Technology Perspective» Web-Service Based Business Monitoring» Adaptive

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT

OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

Traditional BI vs. Business Data Lake A comparison

Traditional BI vs. Business Data Lake A comparison Traditional BI vs. Business Data Lake A comparison The need for new thinking around data storage and analysis Traditional Business Intelligence (BI) systems provide various levels and kinds of analyses

More information

Cloud Big Data Architectures

Cloud Big Data Architectures Cloud Big Data Architectures Lynn Langit QCon Sao Paulo, Brazil 2016 About this Workshop Real-world Cloud Scenarios w/aws, Azure and GCP 1. Big Data Solution Types 2. Data Pipelines 3. ETL and Visualization

More information

FAST DATA APPLICATION REQUIRMENTS FOR CTOS AND ARCHITECTS

FAST DATA APPLICATION REQUIRMENTS FOR CTOS AND ARCHITECTS WHITE PAPER Fast Data FAST DATA APPLICATION REQUIRMENTS FOR CTOS AND ARCHITECTS CTOs and Enterprise Architects recognize that the consumerization of IT is changing how software is developed, requiring

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel

Big data platform for IoT Cloud Analytics. Chen Admati, Advanced Analytics, Intel Big data platform for IoT Cloud Analytics Chen Admati, Advanced Analytics, Intel Agenda IoT @ Intel End-to-End offering Analytics vision Big data platform for IoT Cloud Analytics Platform Capabilities

More information

Industry 4.0 and Big Data

Industry 4.0 and Big Data Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Introduction to Cassandra

Introduction to Cassandra Introduction to Cassandra DuyHai DOAN, Technical Advocate Agenda! Architecture cluster replication Data model last write win (LWW), CQL basics (CRUD, DDL, collections, clustering column) lightweight transactions

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information