Big Data Analytics Platform @ Nokia



Similar documents
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Ganzheitliches Datenmanagement

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Luncheon Webinar Series May 13, 2013

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

How To Handle Big Data With A Data Scientist

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Big Data Analytics Roadmap Energy Industry

SAP and Hortonworks Reference Architecture

HDP Hadoop From concept to deployment.

The Future of Data Management

Introducing Oracle Exalytics In-Memory Machine

So What s the Big Deal?

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Cisco IT Hadoop Journey

Information Builders Mission & Value Proposition

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Information Architecture

Native Connectivity to Big Data Sources in MSTR 10

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Real Time Big Data Processing

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Implement Hadoop jobs to extract business value from large and varied data sets

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

AtScale Intelligence Platform

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Architecting for the Internet of Things & Big Data

Big Data for Investment Research Management

Azure Data Lake Analytics

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013

Oracle BI Roadmap & Visual Analyzer Ljiljana Perica, Oracle Business Solution Leader Ljiljana.perica@oracle.com

Reference Architecture, Requirements, Gaps, Roles

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Agenda. Big Data & Hadoop ViPR HDFS Pivotal Big Data Suite & ViPR HDFS ViON Customer Feedback #EMCVIPR

Making Sense of Big Data in Insurance

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Moving From Hadoop to Spark

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Comprehensive Analytics on the Hortonworks Data Platform

Oracle Database 12c Plug In. Switch On. Get SMART.

Modernizing Your Data Warehouse for Hadoop

Big Data & Cloud Computing. Faysal Shaarani

This Symposium brought to you by

Hadoop & Spark Using Amazon EMR

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

Databricks. A Primer

Big Data Can Drive the Business and IT to Evolve and Adapt

Why Big Data in the Cloud?

Extend your analytic capabilities with SAP Predictive Analysis

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

XpoLog Competitive Comparison Sheet

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

IBM BigInsights for Apache Hadoop

Enterprise GIS Architecture Deployment Options. Andrew Sakowicz

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

How To Use Big Data For Telco (For A Telco)

Data Virtualization A Potential Antidote for Big Data Growing Pains

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

How Companies are! Using Spark

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Cisco IT Hadoop Journey

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Big Data Integration: A Buyer's Guide

Tap into Hadoop and Other No SQL Sources

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Best Practices for Hadoop Data Analysis with Tableau

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Apache Hadoop: Past, Present, and Future

Transcription:

Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012

Agenda Big Data Analytics Platform @Nokia Who we are Use case data flows Big data platform Big data challenges Selecting the Right Tool for the Right Workload Hadoop VS SQL Which analytical database Why InfiniDB 2

Great Mobile Products That Sense the World CREATE A LEADING WHERE PLATFORM WIN IN SMART DEVICES CONNECT THE NEXT BILLION INVEST IN FUTURE DISRUPTIONS 3 Nokia Internal Use Only

One Platform, Enabling Contextually Rich Mobile Experiences Content Platform Maps Positions Places Directions Guidance Traffic Smart Data Apps 4 Nokia Internal Use Only

Click to edit Master title style Big DATA ANALYTICS Platform @Nokia 5

Business Challenges Data silos, missing semantics Multiple sources - overlapping, conflicting Timely processing of large volumes of data Partial, insufficient, inaccurate, inconsistent.. data Security, privacy and other policies unknown Central Analytics Platform created! 6

Statistics 10 s PB of data all across Nokia Multi-tenant, multi-petabyte analytics cluster 10-20K+ jobs per day 600+ internal users 250M+ KV queries Over a terabyte flowing every day Multiple data centers around the world 7

Places Data Store (POI)- Use Case Authentication Account Management Access Control Places Manager FTP Data Intake Oozie Sched Data Processing Hive BI Pig Places Content Analytics Places Extract Portal Search Suppliers HDFS MR Blend Platform Cloud Infrastructure MR Analyti cal DB SQL Places API Transaction al Data K-V Store Data flow 1 User Logs In 2 Place CRUD 6 3 4 5 Updated Supplier ETL and Blend Places Data Blend Uploads Data places Analytics Record 7 Delivered to OnlineSystems 8 Nokia Internal Use Only

Big Data Analytics Platform Data Flows Data Discovery Web Applications Activity Logs VShards (NoSQL) Data Asset Catalog Device Applications Device Activity Sensor Data Intake ETL, Algorithms Aggregation Analytical DB InfiniDB Reports Reference Data 3rd Party User Profile HDFS Oracle Dashboard s Probes POI, Map Batch Queries Analytics Cluster Interactive Queries 9 Nokia Internal Use Only

Big Data Analytics Platform Data Flows Data Discovery Data Asset Catalog Data Intake ETL, Algorithms Aggregation Analytical DB InfiniDB HDFS Oracle Batch Queries Interactive Queries Analytics Cluster 10 Nokia Internal Use Only

Big Data Analytics Platform Logical Tiers Technology Platform Data Platform End User Layer (not shown) ETL, Algorithms Aggregation Data Asset Catalog Data Data Intake HDFS Analytical DB Technology

Technology Platform Hadoop R VShards (KV) Scribe, FTP Hive, Pig InfiniDB, Oracle Export/ Import Workflow Engine Config./ Deploy Monitor Alerts Archiver Scheduler Security/Kerberos & ACL Cloud Infrastructure 12

Data Platform Workflow Orchestration Self Serve Tools ETL, Agg Algorithms Data Quality Data Asset Catalog Data, Metadata, Operational Data Technology Platform 13

Data Platform Analytics Lifecycle Collect Ingest Organize Analyze Deliver Self Serve Tools ETL, Agg Algorithms Data Quality Data Asset Catalog Data, Metadata, Operational Data Technology Platform 14

Nokia Internal Use Only Data Platform: Managing the Data Asset Data Quality - garbage in, garbage out Rules for validating, cleaning data, other heuristics Trusting your insights Process Quality Light weight governance (semantics, integrity, privacy and quality) Data Asset Catalog describe your data Capture essential metadata and logical domain models for assets physical model, logical model, policies, classifications dependencies with other assets Serves as a entry-point to data browsing and asset discovery Insulates subject matter experts from physical details of data asset

Nokia Internal Use Only Big Data Challenges At every level - capture, curate, storage, process, visualize.. Hadoop or SQL? Performance of analytical database? Batch or Interactive analysis Neither SQL nor MR fits all problems Data & Metadata Fragmentation

Click to edit Master title style Selecting the Right Tool for the Right Workload 17

Hadoop VS SQL/Analytical DB SQL/DW SQL/Analytical DB Discover the question Standard industry tools Interactive/Fast No coding Interactive/Fast (secs) Standard industry tools Mutable (Type 1 SCD) No coding, e.g. builtin functions Schema on Write Analyst Reasonable complex Time to Wisdom Discover the question Hadoop/Hive/MR ETL on steroids, Scale Batch/slow Bunch of coding, arbitrary complex Harvest & load into DW Discover the answer 18 Nokia Internal Use Only

Why InfiniDB? Cloud deployment model Column oriented, MPP, clean architecture Horizontal and vertical partitioning, clever pruning No indexes Efficient joins Impressive benchmarks Stream based MR like processing Works with BI tools (standard JDBC driver) 19

InfiniDB vs Hive Performance Execution Time(secs) 2500 2000 1500 1000 500 Query InfiniDB (sec) Hive (sec) A B 76.32 2155.92 C 25.59 1181.48 D 59.72 1497.22 E 1.8 446.5 F 12.38 1307.38 G 24.32 1886.81 infinidb (sec) Hive (sec) 0 A B C D E F G Analytic Queries Nokia Internal Use Only

21 Click to edit Master title style InfiniDB Under the Hood

What is InfiniDB? Scalable Fast Simple 22

Analytics Data Platform Foundation Analytics Data Platform Columnar Performance Efficiency MapReduce style Query Execution Widely used MySQL Interface 23

InfiniDB Building Blocks or Single Server Purpose built for big data analytics. User Module (UM) Performance Module (PM) 24

InfiniDB Building Blocks or Single Server Purpose built for big data analytics. User Module (UM) Understands SQL Performance Module (PM) Operates on data blocks 25

InfiniDB M/R Style Distribution of Work Map-Reduce Inside InfiniDB DoW Scalability Linear Linear N-squared Problem Avoided Avoided Hadoop M/R Latency Low Medium-High Intermediate Results Handling Stream-based File-based Report Language SQL Erlang M/R, Hive, Pig Tuning Automatic Manual Real-Time Analytics Real-time access to granular data Ad-Hoc Full Ad-Hoc performance None Access to pre-defined aggregates Data Storage Structured Unstructured 26 Nokia Internal Use Only

Independent InfiniDB Benchmark Q4 Series 5 table Joins Q1 Series 2 table Joins Q2 Series 3 table Joins Q3 Series 4 table Joins 27

Takeaways Hadoop is good but. Pay attention to data quality Hadoop or SQL Describe your data 28

THANK YOU Yekesa Kosuru Distinguished Architect, Nokia yekesa.kosuru@nokia.com www.nokia.com @Nokia Jim Tommaney CTO, Calpont jtomanney@calpont.com www.calpont.com @Calpont, @InfiniDB