How To Use Big Data For Telco (For A Telco)



Similar documents
How To Scale Out Of A Nosql Database

Using distributed technologies to analyze Big Data

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

So What s the Big Deal?

INTRODUCTION TO CASSANDRA

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Cloud Scale Distributed Data Storage. Jürmo Mehine

Integrating Big Data into the Computing Curricula

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Reference Architecture, Requirements, Gaps, Roles

Big Data Development CASSANDRA NoSQL Training - Workshop. March 13 to am to 5 pm HOTEL DUBAI GRAND DUBAI

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Unified Batch & Stream Processing Platform

Can the Elephants Handle the NoSQL Onslaught?

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Putting Apache Kafka to Use!

Practical Cassandra. Vitalii

Big Data Big Data/Data Analytics & Software Development

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Big Data Course Highlights

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Introduction to Hbase Gkavresis Giorgos 1470

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

ANALYTICS BUILT FOR INTERNET OF THINGS

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

REAL-TIME BIG DATA ANALYTICS

Big Data Analytics Nokia

Challenges for Data Driven Systems

NoSQL and Hadoop Technologies On Oracle Cloud

Big Data Analytics - Accelerated. stream-horizon.com

Transforming the Telecoms Business using Big Data and Analytics

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

NoSQL Data Base Basics

Introduction to Apache Cassandra

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island

Data Modeling for Big Data

Oracle Big Data SQL Technical Update

Big Data Technologies Compared June 2014

Comparing SQL and NOSQL databases

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

How To Handle Big Data With A Data Scientist

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Hadoop Ecosystem B Y R A H I M A.


Native Connectivity to Big Data Sources in MSTR 10

MySQL és Hadoop mint Big Data platform (SQL + NoSQL = MySQL Cluster?!)

Real Time Analytics for Big Data. NtiSh Nati

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Scalable Architecture on Amazon AWS Cloud

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Applications for Big Data Analytics

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Elastic Application Platform for Market Data Real-Time Analytics. for E-Commerce

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Big Data and Data Science: Behind the Buzz Words

Structured Data Storage

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

An Approach to Implement Map Reduce with NoSQL Databases

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Unified Big Data Processing with Apache Spark. Matei

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Navigating the Big Data infrastructure layer Helena Schwenk

The Internet of Things and Big Data: Intro

Understanding Neo4j Scalability

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Customized Report- Big Data

F1: A Distributed SQL Database That Scales. Presentation by: Alex Degtiar (adegtiar@cmu.edu) /21/2013

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

[Hadoop, Storm and Couchbase: Faster Big Data]

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

How Companies are! Using Spark

Big Systems, Big Data

OLTP Meets Bigdata, Challenges, Options, and Future Saibabu Devabhaktuni

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

MapReduce with Apache Hadoop Analysing Big Data

A survey of big data architectures for handling massive data

Making Sense of Big Data in Insurance

Transcription:

ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium

ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call detailed records, your pictures, your video, your tweets, your holiday experiences, your relationships, your favorite playlist, your agenda, your job activities,

SETTING THE STAGE Big data technologies applied to a telco use case Smart analytics applied to your daily interests and activities

AGENDA ALCATEL-LUCENT S ONLINE VIDEO ANALYTICS SOLUTION BIG data challenges BIG data technologies BIG data architecture BIGGER and faster tomorrow 4

APPGLIDE ALCATEL-LUCENT S VIDEO ANALYTICS SOLUTION End-User Quality of Experience Content Delivery Network Performance Cross- Correlation Engine Content Usage & Viewer Engagement Unique online video analytics solution using data from multiple data sources that feeds an analytics collection & correlation engine, provides unparalleled insights into End-User Experience and Behavior, all viewed through a rich visual customer portal or sent to a third party system. 5

BIG DATA CHALLENGES APPLIED GARTNER S DEFINITION VOLUME VARIETY IMPORTANT BIG DATA ASPECTS VELOCITY 6

LIMITATIONS EXISTING DATA INFRASTRUCTURE ORACLE MYSQL Scaling write throughput limited options : 1. Switch to more powerful hardware 2. Buy more expensive database solutions 3. Horizontal scaling through sharding/partitioning Availability concerns when data set grows 1. Schema upgrades requiring downtime for table locking (or expensive copy operations) 2. Failover from master to slave is causing downtime 7

QUIZ QUESTION: WHAT ARE - SOLUTIONS? 8

COMMON CHARACTERISTICS NOSQL DATA INFRASTRUCTURES Non-relational data models Distributed Horizontal scalability Schema-less/schema free Trade-off consistency-high availability Specialization of data infrastructure for specific use cases and data models Document-oriented data stores Key-Value stores Graph databases Table-oriented data stores 9

MAKING A DISTRIBUTED DATA INFRASTRUCTURE SYSTEM DESIGN CHOICES Consistency or availability A B Fail the request or not? C Favoring consistency in case of failure Favoring availability in case of failure 10

OUR VIDEO ANALYTICS USE CASE OUR BIG DATA INFRASTRUCTURE SELECTION Next Slides: What is Cassandra? What is Cassandra not? Why did we select Cassandra? A couple of mechanism we applied 11

SELECTION FOR OUR SOLUTION APACHE CASSANDRA Free and Open Source Software (FOSS) Horizontal scalable data infrastructure up to 100s of nodes Supports schema-less structure Highly optimized for write performance with very good read performance characteristics Fault-tolerant: advanced replication strategies Ad-hoc querying support with Hadoop map-reduce overlay Good middle ground! Volume Variety Velocity 12

APACHE CASSANDRA IN A NUTSHELL Distributed data infrastructure Peer-to-peer architecture All nodes identical Consistent hash ring cola: colb: colc: Hash(key1) a1 b1 c1 colb: cole: colf: Hash(key2) b1 e1 e2 cole: Hash(key3) E1` cola: colb: colc: Hash(key1) a1 b1 c1 colb: cole: colf: Hash(key2) b1 e1 e2 cole: Hash(key3) E1` Multiple data partitioning strategies Random hash partitioner Order preserved partitioner Hash(key10) E F Cassandra ring D cola: a1 Hash(key1) Hash(key2) Hash(key3) cola: a1 colb: b1 cole: E1` Up to 2 billion sorted columns colb: b1 colb: b1 cole: e1 colc: c1 colf: e2 A Hash(key1) C Hash(key2) Hash(key3) B Hash(key1) Hash(key2) Hash(key3) colc: c1 cola: a1 colb: b1 cole: E1` colb: b1 cole: e1 colc: c1 colf: e2 cola: a1 colb: b1 cole: E1` colb: b1 cole: e1 colc: c1 colf: e2 Hash(key7) colb: b1 cole: e1 colf: e2 Hash(key13) cole: E1` 13

CASSANDRA VERSUS RDBMS No relational data model No joins Limited support for transactional properties of RDBMS Only simple native indexing mechanisme allowing some grouping of data No transaction support No rollback mechanism Ad-hoc queries Only non real-time, through Hadoop map-reduce overlay 14

DATA MODELLING PATTERNS DENORMALISATION Slicing and dicing through graphs is stored in fully denormalized format Each data view corresponds with one row in Cassandra 15

DATA MODELLING PATTERNS EXPLOIT CASSANDRA WIDE ROW SUPPORT & COUNTER COLUMNFAMILIES All timeserie graphs for all metrics (per data view) in one row 16

BIG DATA ARCHITECTURE CDN Log Files Analytics Engine Linear scalable, big data infrastructure OLAP cubes Analytics Portal End-User Clients HLS Multiple Data Sources Dynamic Streaming Smooth Streaming Near-Real Time Data Collection & Analysis Analysis and Scoring Rich Portal with Dynamic Filters Other Sources QoE Agents Industry-Leading Video Analytics Algorithms Unique Video Scoring Model QoE Scoring CDN Performance Content Trends Web Services-Based DPI EMS/NMS Static Probes API For Raw and Processed Data 17

near-realtime is not realtime enough? 18

BIG DATA ARCHITECTURE TELL ME WHAT HAPPENS NOW! CDN Log Files Analytics Engine Linear scalable, big data infrastructure Network Operations Center ALERTS In Real Time Analytics Portal End-User Clients HLS Dynamic Streaming Smooth Streaming Multiple Data Sources Near-Real Time Data Collection & Analysis Streaming Analytics Engine Analysis and Scoring Rich Portal with Dynamic Filters Other Sources QoE Agents Industry-Leading Video Analytics Algorithms Unique Video Scoring Model QoE Scoring CDN Performance Content Trends Web Services-Based DPI EMS/NMS Static Probes API For Raw and Processed Data 19

BIG DATA INFRASTRUCTURE TELL ME WHAT HAPPENS NOW! Streaming analytics, complex event processing engines Future work: event stream distribution frameworks: E.g. open source project Storm Horizontal scalable Fault tolerant Hadoop Map-Reduce for realtime cases Millions messages/second 20

CONCLUSION ADVANTAGES OF THE NEW INFRASTRUCTURE 1. Horizontal scalable OLTP and near-realtime analytics infrastructure 2. TCO: solution can runs on commodity hardware, FOSS based 3. High availability: no downtime for upgrades 4. New data sources can be integrated without database schema changes 5. Very good fit for cloud environments 21

CONCLUSION LESSONS LEARNED 1. Realtime <->ad-hoc queries conflict 2. RDBMS with surrounded tooling is pretty well understood by development team 3. Learning curve of best practices is longer when using distributed data infrastructure like Cassandra 22

BIG DATA STORY CONTINUES LATEST EVOLUTIONS IN BIG DATA LANDSCAPE Cloudera released near realtime query engine on batch oriented Hadoop HDFS (Impala) + HBase. Adding near- realtime SQL query facilities New relational database infrastructures and enhancements with improved scaleability properties closing the gap with NoSQL solutions. Complete new databases New MySQL storage engines Application transparant clustering and sharding 23

Questions? Feel free to stop by at our booth: Big Data Empowered Online Video Analytics