Real Time Analy:cs for Big Data Lessons Learned from Facebook

Size: px
Start display at page:

Download "Real Time Analy:cs for Big Data Lessons Learned from Facebook"

Transcription

1 SINGLE PLATFORM. COMPLETE SCALABILITY. Real Time Analy:cs for Big Data Lessons Learned from Head of Product GigaSpaces

2 About Me MTBK Junky A Proud Dad Technology addict Head of GigaSpaces

3 Real Time Analy:cs Use Cases Ecommerce Auc=on monitoring, addwards Search engines Real- =me Marke=ng Improving conversion rate Weather repor=ng Traffic analysis Call Center Management Supply- Chain Op=miza=on Quality Management in Manufacturing SLA Monitoring and Maintenance Global Shipment & Delivery Monitoring Fraud Detec=on in Financial Companies

4 TwiJer Counting How many request/day? What s the average latency? How many signups, sms, tweets? Correlating Desktop vs Mobile user? What devices fail at the same time? What features get user hooked? Research Duplicate detection Sentiment analysis Patterns and trends

5 Note the Time dimension Counting Real time (msec/sec) Correlating Near real time(min/hours) Research Batch (Days..)

6 The data resolu:on & processing models Counting Mostly Event Driven High resolution every tweet counts Correlating Ad-hoc queries Mid resolution - Aggregated counters Research Pre generated reports Cross grain resolution trends,..

7 Tradi:onal analy:cs applica:ons Scale- up Database Use tradi=onal SQL database Use stored procedure for event driven reports Use flash memory disks to reduce disk I/O Use read only replica to scale- out read queries Limita=ons Doesn t scale on write Extremely expensive (HW + SW) Copyright 2011 Gigaspaces Ltd. All Rights Reserved 7

8 CEP Complex Event Processing Process the data as it comes Maintain a window of the data in- memory Pros: Extremely low- latency Rela=vely low- cost Cons Hard to scale (Mostly limited to scale- up) Not agile - Queries must be pre- generated Fairly complex Copyright 2011 Gigaspaces Ltd. All Rights Reserved 8

9 In Memory Data Grid Distributed in- memory database Scale out Pros Scale on write/read Fits to event driven (CEP style), ad- hoc query model Cons - Cost of memory vs disk - Memory capacity is limited Copyright 2011 Gigaspaces Ltd. All Rights Reserved 9

10 NoSQL Use distributed database Hbase, Cassandra, MongoDB Pros Scale on write/read Elas=c Cons Read latency Consistency tradeoffs are hard Maturity fairly young technology Copyright 2011 Gigaspaces Ltd. All Rights Reserved 10

11 Hadoop MapReudce Distributed batch processing Pros Designed to process massive amount of data Mature Low cost Cons Not real- =me Copyright 2011 Gigaspaces Ltd. All Rights Reserved 11

12 Hadoop Map/Reduce Reality check.. With the paths that go through Hadoop [at Yahoo!], the latency is about fifteen minutes. [I]t will never be true real-time.. (Yahoo CTO Raymie Stata) Hadoop/Hive..Not realtime. Many dependencies. Lots of points of failure. Complicated system. Not dependable enough to hit realtime goals ( Alex Himel, Engineering Manager at Facebook.) "MapReduce and other batch-processing systems cannot process small updates individually as they rely on creating large batches for efficiency, (Google senior director of engineering Eisar Lipkovitz) Copyright 2011 Gigaspaces Ltd. All Rights Reserved 12

13 So what s the bojom line? One size doesn t fit all.. The solution has to be a combination of several technologies and patterns.. Copyright 2011 Gigaspaces Ltd. All Rights Reserved 13

14 FACEBOOK REAL- TIME ANALYTICS SYSTEM Copyright 2011 Gigaspaces Ltd. All Rights Reserved 14

15 Goals Show why plugins are valuable What value is your business deriving from it? Make the data more ac=onable Help users take ac=on to make their content more valuable. How many people see a plugin, how many people take ac=on on it, and how many are converted to traffic back on your site. Make the data more =mely Went from a 48- hour turn around to 30 seconds. Mul=ple points of failure were removed to make this goal. Handle massive load 20 billion events per day (200,000 events per second) Copyright 2011 Gigaspaces Ltd. All Rights Reserved 15

16 The actual analy:cs.. Like bujon analy:cs Comments box analy:cs Copyright 2011 Gigaspaces Ltd. All Rights Reserved 16

17 Technology Evalua:on MySQL DB Counters In- Memory Counters MapReduce Cassandra HBase Copyright 2011 Gigaspaces Ltd. All Rights Reserved 17

18 The solu:on.. Real Time Long Term FACEBOOK Scribe HDFS 10,000 write/sec per server Hbase Log FACEBOOK PTail Puma Log FACEBOOK Batch 1.5 Sec Log

19 Checking the assump:ons.. Memory is still core The NoSQL space is very dynamic.. (We) write extremely lean log lines. The more compact the log lines the more can be stored in memory.. (We) batch for 1.5 seconds on average. Would like to batch longer but they have so many URLs that they run out of memory when creating a hashtable When Facebook engineers started the project 6 months ago, Cassandra did not have distributed counters which is now committed in trunk.. (Eric Hauser Senior Software Engineer at ExactTarget) Copyright 2011 Gigaspaces Ltd. All Rights Reserved 19

20 Facebook Analy:cs.Next.. What if.. We can rely on memory as a reliable store? We can t decide on a particular NoSQL database? We need to package the solution as a product? Copyright 2011 Gigaspaces Ltd. All Rights Reserved 20

21 Step 1: Use memory.. We rely on memory anyway to get 10k msg/ sec.. Why not use memory to store the events Reliability is achieved through redundancy and replica=on FACEBOOK FACEBOOK FACEBOOK Events Memory Grid Data Grid Data Grid Data Grid Copyright 2011 Gigaspaces Ltd. All Rights Reserved 21

22 Step 1: Use memory.. We rely on memory anyway to get 10k msg/ sec.. Why not use memory to store the events Reliability is achieved through redundancy and replica=on FACEBOOK FACEBOOK FACEBOOK Events Any API Data Grid Copyright 2011 Gigaspaces Ltd. All Rights Reserved 22

23 Step 2 Collocate Pulng the code together with the data. FACEBOOK Events Processing Grid Data Grid FACEBOOK Data Grid FACEBOOK Data Grid

24 Step 2 Collocate Pulng the code together with the data. FACEBOOK Events Processing Grid Data public class SimpleListener { Data unprocesseddata() { Data template = new Data(); template.setprocessed(false); return template; } public Data eventlistener(data event) { //process Data here } Data Grid Data Grid

25 Step 3 Write behind to SQL/NoSQL FACEBOOK Events Processing Grid Open Long Term persistency Data Grid Write Behind MySQL FACEBOOK Data Grid Data Source Adaptor HBase FACEBOOK Cassandra Data Grid

26 Economic Data Scaling Combine memory and disk Memory is x10, x100 lower than disk for high data access rate (Stanford research) Disk is lower at cost for high capacity lower access rate. Solu=on: Memory - short- term data, Disk - long term. data Only ~16G required to store the log in memory ( 500b messages at 10k/h ) at a cost of ~32$ month per server. High Memory Memory Cores Clock speed Dell Price TB (~960GB)/ Month 192GB 12 cores 3.2 GHhz $367/month $1.9/GB 5x Blades = $1835/month Copyright 2011 Gigaspaces Ltd. All Rights Reserved 26

27 Economic Opera:ons Scaling Automa=on - reduce opera=onal cost Elas=c Scaling reduce over provisioning cost Cloud portability (JClouds) choose the right cloud for the job Cloud burs=ng scavenge extra capacity when needed Copyright 2011 Gigaspaces Ltd. All Rights Reserved 27

28 Pu_ng it all together Event Sources Write behind - In Memory Data Grid - RT Processing Grid Light Event Processing Map-reduce Event driven Execute code with data Transactional Secured Elastic Analytic Application Generate Patterns NoSQL DB Low cost storage Write/Read scalability Dynamic scaling Raw Data and aggregated Data 28

29 Pu_ng it all together Event Sources Analytic Application Write behind - In Memory Data Grid - RT Processing Grid Light Event Processing Map-reduce Event driven Execute code with data Transactional Secured Script script = new StaticScritpt( groovy, println Elastic hi; return 0 ) NoSQL DB Low cost storage Query q = em.createnativequery( execute Write/Read? ); scalability Generate Patterns q.setparamter(1, Dynamic script); scaling Raw Data and Integer result = aggregated Data query.getsingleresult(); 29

30 5x bejer performance per server! Event injector Up to 128 threads GigaSpaces/ (Other Msg Server) App Services Up to 128 threads Hardware Linux HP DL380 G6 servers - each has: 2 Intel quad- core Xeon X5560 processors (2.8 Ghz Nehalem) 32 Gb RAM (4GB per core) 6 * 146 Gb 15K RPM SAS disks 60,000 Red Hat ,000 50,000 write/sec per server 40,000 30,000 20,000 GS Giga WLS Other 10,000 0 Event injection throughput Event injection throughput with write multiple EJB/Remoting service invocation throughput

31 Pu_ng it all together Elas:c Big Data Plaborm The best of both worlds Support Real Time and Batch Fully managed stack Makes the development and deployment of Big Data applica=on significantly simpler Extremely cost effec=ve Best ra=o of Disk + Memory Run on any cloud TRUE Cloud burs=ng support Copyright 2011 Gigaspaces Ltd. All Rights Reserved 31

32 Other benefits Designed for real time event processing Built-in Pub/Sub Built-in CEP Open Standard Query Any database Reliable Transactional, consistent Survive complete database failure Simple Can be packaged into a single product Fully automated deployment End to end management and monitoring Copyright 2010 Gigaspaces Ltd. All Rights Reserved 32

33 Further reading.. natishalom.typepad.com Real Time Analytics for Big Data: An Alternative Approach GigaOM Big data in real time is no fantasy Highscalability.com Facebook's New Realtime Analytics System: HBase To Process 20 Billion Events Per Day GigaSpaces.com Copyright 2011 Gigaspaces Ltd. All Rights Reserved 33

34 THANK hjp://blog.gigaspaces.com 34

35 Economic Scaling Cloudify Application Cluster Console Controller er Cloudify Agent Cloudify Agent JClouds Cloud Driver Worker VM Instance Role VM Instance Scale-in Scale-out Load Balancer Storage Network Compute Services Copyright 2011 Gigaspaces Ltd. All Rights Reserved 35

36

GigaSpaces Real-Time Analytics for Big Data

GigaSpaces Real-Time Analytics for Big Data GigaSpaces Real-Time Analytics for Big Data GigaSpaces makes it easy to build and deploy large-scale real-time analytics systems Rapidly increasing use of large-scale and location-aware social media and

More information

Real Time Analytics for Big Data. NtiSh Nati Shalom @natishalom

Real Time Analytics for Big Data. NtiSh Nati Shalom @natishalom Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for

More information

SQream Technologies Ltd - Confiden7al

SQream Technologies Ltd - Confiden7al SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Using RDBMS, NoSQL or Hadoop?

Using RDBMS, NoSQL or Hadoop? Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest

More information

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB

NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de info@bankmark.de T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB,

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Cost-Effective Business Intelligence with Red Hat and Open Source

Cost-Effective Business Intelligence with Red Hat and Open Source Cost-Effective Business Intelligence with Red Hat and Open Source Sherman Wood Director, Business Intelligence, Jaspersoft September 3, 2009 1 Agenda Introductions Quick survey What is BI?: reporting,

More information

Towards Smart and Intelligent SDN Controller

Towards Smart and Intelligent SDN Controller Towards Smart and Intelligent SDN Controller - Through the Generic, Extensible, and Elastic Time Series Data Repository (TSDR) YuLing Chen, Dell Inc. Rajesh Narayanan, Dell Inc. Sharon Aicler, Cisco Systems

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010

Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Hadoop: Distributed Data Processing Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010 Outline Scaling for Large Data Processing What is Hadoop? HDFS and MapReduce

More information

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES 1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson

Texas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

How To Write An Article On An Hp Appsystem For Spera Hana

How To Write An Article On An Hp Appsystem For Spera Hana Technical white paper HP AppSystem for SAP HANA Distributed architecture with 3PAR StoreServ 7400 storage Table of contents Executive summary... 2 Introduction... 2 Appliance components... 3 3PAR StoreServ

More information

Scalable Architecture on Amazon AWS Cloud

Scalable Architecture on Amazon AWS Cloud Scalable Architecture on Amazon AWS Cloud Kalpak Shah Founder & CEO, Clogeny Technologies kalpak@clogeny.com 1 * http://www.rightscale.com/products/cloud-computing-uses/scalable-website.php 2 Architect

More information

Corso di Reti di Calcolatori L-A. Cloud Computing

Corso di Reti di Calcolatori L-A. Cloud Computing Università degli Studi di Bologna Facoltà di Ingegneria Corso di Reti di Calcolatori L-A Cloud Computing Antonio Corradi Luca Foschini Some Clouds 1 What is Cloud computing? The architecture and terminology

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

DNS Big Data Analy@cs

DNS Big Data Analy@cs Klik om de s+jl te bewerken Klik om de models+jlen te bewerken! Tweede niveau! Derde niveau! Vierde niveau DNS Big Data Analy@cs Vijfde niveau DNS- OARC Fall 2015 Workshop October 4th 2015 Maarten Wullink,

More information

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory)

How To Store Data On An Ocora Nosql Database On A Flash Memory Device On A Microsoft Flash Memory 2 (Iomemory) WHITE PAPER Oracle NoSQL Database and SanDisk Offer Cost-Effective Extreme Performance for Big Data 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Abstract... 3 What Is Big Data?...

More information

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business

More information

MADOCA II Data Logging System Using NoSQL Database for SPring-8

MADOCA II Data Logging System Using NoSQL Database for SPring-8 MADOCA II Data Logging System Using NoSQL Database for SPring-8 A.Yamashita and M.Kago SPring-8/JASRI, Japan NoSQL WED3O03 OR: How I Learned to Stop Worrying and Love Cassandra Outline SPring-8 logging

More information

TRAINING PROGRAM ON BIGDATA/HADOOP

TRAINING PROGRAM ON BIGDATA/HADOOP Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Nutanix Solutions for Private Cloud. Kees Baggerman Performance and Solution Engineer

Nutanix Solutions for Private Cloud. Kees Baggerman Performance and Solution Engineer Nutanix Solutions for Private Cloud Kees Baggerman Performance and Solution Engineer Nutanix: Web-Scale Converged Infrastructure ü Founded in 2009 ü Now on fourth generation ü Core team from industry leaders

More information

Real-Time Analytics for Big Market Data with XAP In-Memory Computing

Real-Time Analytics for Big Market Data with XAP In-Memory Computing Real-Time Analytics for Big Market Data with XAP In-Memory Computing March 2015 Real Time Analytics for Big Market Data Table of Contents Introduction 03 Main Industry Challenges....04 Achieving Real-Time

More information

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Realtime Apache Hadoop at Facebook Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens Agenda 1 Why Apache Hadoop and HBase? 2 Quick Introduction to Apache HBase 3 Applications of HBase at

More information

Accelerating Application Performance on Virtual Machines

Accelerating Application Performance on Virtual Machines Accelerating Application Performance on Virtual Machines...with flash-based caching in the server Published: August 2011 FlashSoft Corporation 155-A W. Moffett Park Dr Sunnyvale, CA 94089 info@flashsoft.com

More information

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation

More information

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage

Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage White Paper Scaling Objectivity Database Performance with Panasas Scale-Out NAS Storage A Benchmark Report August 211 Background Objectivity/DB uses a powerful distributed processing architecture to manage

More information

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Using an In-Memory Data Grid for Near Real-Time Data Analysis SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses

More information

IT Infrastructure Management

IT Infrastructure Management IT Infrastructure Management Server-Database Monitoring An Overview XIPHOS TECHNOLOGY SOLUTIONS PVT LIMITED 32/3L, GARIAHAT ROAD (SOUTH) KOLKATA 700 078, WEST BENGAL, INDIA WWW.XIPHOSTEC.COM Xiphos Technology

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14 Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul

More information

<Insert Picture Here> Big Data

<Insert Picture Here> Big Data Big Data Kevin Kalmbach Principal Sales Consultant, Public Sector Engineered Systems Program Agenda What is Big Data and why it is important? What is your Big

More information

Big data blue print for cloud architecture

Big data blue print for cloud architecture Big data blue print for cloud architecture -COGNIZANT Image Area Prabhu Inbarajan Srinivasan Thiruvengadathan Muralicharan Gurumoorthy Praveen Codur 2012, Cognizant Next 30 minutes Big Data / Cloud challenges

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

Cisco IT Hadoop Journey

Cisco IT Hadoop Journey Cisco IT Hadoop Journey Alex Garbarini, IT Engineer, Cisco 2015 MapR Technologies 1 Agenda Hadoop Platform Timeline Key Decisions / Lessons Learnt Data Lake Hadoop s place in IT Data Platforms Use Cases

More information

Cyber Security With Big Data

Cyber Security With Big Data Cyber Security With Big Data Fast. Complete. Cost-Effec1ve. Harry J Foxwell, PhD Principal Consultant Oracle Public Sector Oct 2015 Safe Harbor Statement The following is intended to outline our general

More information

.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken

.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken Klik om de s+jl te bewerken Klik om de models+jlen te bewerken Tweede niveau Derde niveau Vierde niveau.nl ENTRADA Vijfde niveau CENTR-tech 33 November 2015 Marco Davids, SIDN Labs Wie zijn wij? Mijlpalen

More information

Splunk for Networking and SDN

Splunk for Networking and SDN Copyright 2013 Splunk Inc. Splunk for Networking and SDN Stela Udovicic Senior Product Marke?ng Manager, Splunk #splunkconf Legal No?ces During the course of this presenta?on, we may make forward- looking

More information

Description of Application

Description of Application Description of Application Operating Organization: Coeur d Alene Tribe, Plummer, Idaho Community of Interest: U.S. Indian tribes and their governments; rural governments OS and software requirements: Microsoft

More information

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack HIGHLIGHTS Real-Time Results Elasticsearch on Cisco UCS enables a deeper

More information

All You Wanted to Know About Big Data Projects Chida Sadayappan @schida. Jan 2014

All You Wanted to Know About Big Data Projects Chida Sadayappan @schida. Jan 2014 All You Wanted to Know About Big Data Projects Chida Sadayappan @schida Jan 2014 1 WHAT WE DISCUSS HERE AGENDA > > > > > > Need History Open Source - Hadoop BigData EcoSystem Use Cases Managing BigData

More information

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation

Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Can Flash help you ride the Big Data Wave? Steve Fingerhut Vice President, Marketing Enterprise Storage Solutions Corporation Forward-Looking Statements During our meeting today we may make forward-looking

More information

Cloud/SaaS enablement of existing applications

Cloud/SaaS enablement of existing applications Cloud/SaaS enablement of existing applications GigaSpaces: Nati Shalom, CTO & Founder About GigaSpaces Technologies Enabling applications to run a distributed cluster as if it was a single machine 75+

More information

Benchmarking Cassandra on Violin

Benchmarking Cassandra on Violin Technical White Paper Report Technical Report Benchmarking Cassandra on Violin Accelerating Cassandra Performance and Reducing Read Latency With Violin Memory Flash-based Storage Arrays Version 1.0 Abstract

More information

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho

Ins+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments Important Notice 2010-2015 Cloudera, Inc. All rights reserved. Cloudera, the Cloudera logo, Cloudera Impala, Impala, and

More information

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate

More information

Yahoo! Cloud Serving Benchmark

Yahoo! Cloud Serving Benchmark Yahoo! Cloud Serving Benchmark Overview and results March 31, 2010 Brian F. Cooper cooperb@yahoo-inc.com Joint work with Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears System setup and

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

How To Choose Between A Relational Database Service From Aws.Com

How To Choose Between A Relational Database Service From Aws.Com The following text is partly taken from the Oracle book Middleware and Cloud Computing It is available from Amazon: http://www.amazon.com/dp/0980798000 Cloud Databases and Oracle When designing your cloud

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

NoSQL Data Base Basics

NoSQL Data Base Basics NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

Big Data and Industrial Internet

Big Data and Industrial Internet Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015

More information

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT

Hadoop on OpenStack Cloud. Dmitry Mescheryakov Software Engineer, @MirantisIT Hadoop on OpenStack Cloud Dmitry Mescheryakov Software Engineer, @MirantisIT Agenda OpenStack Sahara Demo Hadoop Performance on Cloud Conclusion OpenStack Open source cloud computing platform 17,209 commits

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

Assignment # 1 (Cloud Computing Security)

Assignment # 1 (Cloud Computing Security) Assignment # 1 (Cloud Computing Security) Group Members: Abdullah Abid Zeeshan Qaiser M. Umar Hayat Table of Contents Windows Azure Introduction... 4 Windows Azure Services... 4 1. Compute... 4 a) Virtual

More information

HP reference configuration for entry-level SAS Grid Manager solutions

HP reference configuration for entry-level SAS Grid Manager solutions HP reference configuration for entry-level SAS Grid Manager solutions Up to 864 simultaneous SAS jobs and more than 3 GB/s I/O throughput Technical white paper Table of contents Executive summary... 2

More information

Scaling in a Hypervisor Environment

Scaling in a Hypervisor Environment Scaling in a Hypervisor Environment Richard McDougall Chief Performance Architect VMware VMware ESX Hypervisor Architecture Guest Monitor Guest TCP/IP Monitor (BT, HW, PV) File System CPU is controlled

More information

4 th Workshop on Big Data Benchmarking

4 th Workshop on Big Data Benchmarking 4 th Workshop on Big Data Benchmarking MPP SQL Engines: architectural choices and their implications on benchmarking 09 Oct 2013 Agenda: Big Data Landscape Market Requirements Benchmark Parameters Benchmark

More information

White Paper on Consolidation Ratios for VDI implementations

White Paper on Consolidation Ratios for VDI implementations White Paper on Consolidation Ratios for VDI implementations Executive Summary TecDem have produced this white paper on consolidation ratios to back up the return on investment calculations and savings

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp

Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)

More information

GigaSpaces XAP 9.7 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS

GigaSpaces XAP 9.7 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS GigaSpaces XAP 9.7 Administration Training ADMINISTRATION, MONITORING AND TROUBLESHOOTING GIGASPACES XAP DISTRIBUTED SYSTEMS Learn about GigaSpaces XAP internal protocols, its configuration, monitoring

More information

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra

Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra Dell Reference Configuration for DataStax Enterprise powered by Apache Cassandra A Quick Reference Configuration Guide Kris Applegate kris_applegate@dell.com Solution Architect Dell Solution Centers Dave

More information

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Using Cloud Services for Test Environments A case study of the use of Amazon EC2 Lee Hawkins (Quality Architect) Quest Software, Melbourne Copyright 2010 Quest Software We are gathered here today to talk

More information

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce Elastic Application Platform for Market Data Real-Time Analytics Can you deliver real-time pricing, on high-speed market data, for real-time critical for E-Commerce decisions? Market Data Analytics applications

More information

HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN

HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN HADOOP AT NOKIA JOSH DEVINS, NOKIA HADOOP MEETUP, JANUARY 2011 BERLIN Two parts: * technical setup * applications before starting Question: Hadoop experience levels from none to some to lots, and what

More information

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher

Cloud Computing and Amazon Web Services. CJUG March, 2009 Tom Malaher Cloud Computing and Amazon Web Services CJUG March, 2009 Tom Malaher Agenda What is Cloud Computing? Amazon Web Services (AWS) Other Offerings Composing AWS Services Use Cases Ecosystem Reality Check Pros&Cons

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Meeting Real Time Risk Management Challenge XAP In-Memory Computing

Meeting Real Time Risk Management Challenge XAP In-Memory Computing Meeting Real Time Risk Management Challenge XAP In-Memory Computing March 2015 Meeting Real-Time Risk Management Challenges Table of Contents Introduction 03 Main Industry Challenges....04 Meeting Real-Time

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

How to Build a Data Center?

How to Build a Data Center? Next up Cloud Compu-ng Warehouse scale computers How to build/program data centers Google so?ware stack GFS BigTable Sawzall Chubby Map/reduce What is cloud compu-ng Illusion of infinite compu-ng resources

More information

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling

More information

Centralized Orchestration and Performance Monitoring

Centralized Orchestration and Performance Monitoring DATASHEET NetScaler Command Center Centralized Orchestration and Performance Monitoring Key Benefits Performance Management High Availability (HA) Support Seamless VPX management Enables Extensible architecture

More information

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan

Data Management in the Cloud: Limitations and Opportunities. Annies Ductan Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management

More information

Case Study : 3 different hadoop cluster deployments

Case Study : 3 different hadoop cluster deployments Case Study : 3 different hadoop cluster deployments Lee moon soo moon@nflabs.com HDFS as a Storage Last 4 years, our HDFS clusters, stored Customer 1500 TB+ data safely served 375,000 TB+ data to customer

More information

Clearing Away the Clouds: What is the Future of Cloud Computing? BEBO WHITE PEWE WORKSHOP BRATISLAVA APRIL 2010

Clearing Away the Clouds: What is the Future of Cloud Computing? BEBO WHITE PEWE WORKSHOP BRATISLAVA APRIL 2010 Clearing Away the Clouds: What is the Future of Cloud Computing? BEBO WHITE PEWE WORKSHOP BRATISLAVA APRIL 2010 The Top 10 Strategic Technologies for 2010 Gartner Report 1 Cloud Computing 2 Advanced Analytics

More information