Turn Big Data to Small Data

Similar documents
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Architectures for Big Data Analytics A database perspective

Sentimental Analysis using Hadoop Phase 2: Week 2

How To Handle Big Data With A Data Scientist

QlikView, Creating Business Discovery Application using HDP V1.0 March 13, 2014

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Creating a universe on Hive with Hortonworks HDP 2.0

Tap into Hadoop and Other No SQL Sources

Bringing Big Data to People

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Big Data on Microsoft Platform

In Memory Accelerator for MongoDB

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Oracle Database 12c Plug In. Switch On. Get SMART.

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Luncheon Webinar Series May 13, 2013

Preview of Oracle Database 12c In-Memory Option. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

In-Memory Analytics for Big Data

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Tackling Big Data with MATLAB Adam Filion Application Engineer MathWorks, Inc.

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS

Scaling Out With Apache Spark. DTL Meeting Slides based on

QlikView 11.2 SR5 DIRECT DISCOVERY

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

GigaSpaces Real-Time Analytics for Big Data

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

The Future of Data Management

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

BigData in Real-time. Impala Introduction. TCloud Computing 天 云 趋 势 孙 振 南 2012/12/13 Beijing Apache Asia Road Show

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Bringing Big Data Modelling into the Hands of Domain Experts

Big Data Visualization with JReport

Assignment # 1 (Cloud Computing Security)

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Unified Big Data Processing with Apache Spark. Matei

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Big Data With Hadoop

Comparison of the Frontier Distributed Database Caching System with NoSQL Databases

Big Data. Facebook Wall Data using Graph API. Presented by: Prashant Patel Jaykrushna Patel

Einsatzfelder von IBM PureData Systems und Ihre Vorteile.

Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau

INTRODUCTION TO CASSANDRA

SAP and Hortonworks Reference Architecture

How Companies are! Using Spark

Ad Hoc Analysis of Big Data Visualization

Connecting to Manage Your MS SQL Database

Hadoop: Embracing future hardware

Embedded Analytics & Big Data Visualization in Any App

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Using Tableau Software with Hortonworks Data Platform

Oracle Big Data SQL Technical Update

Navigating the Big Data infrastructure layer Helena Schwenk

Real Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May Santa Clara, CA

Introducing Oracle Exalytics In-Memory Machine

Advanced Big Data Analytics with R and Hadoop

How To Scale Out Of A Nosql Database

HadoopRDF : A Scalable RDF Data Analysis System

Parallel Data Warehouse

An Approach to Implement Map Reduce with NoSQL Databases

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Ganzheitliches Datenmanagement

Understanding the Value of In-Memory in the IT Landscape

White Paper April 2006

Big Data Visualization and Dashboards

Exploring the Synergistic Relationships Between BPC, BW and HANA

nosql and Non Relational Databases

NoSQL Database Options

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Supercharge your MySQL application performance with Cloud Databases

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Apache Kylin Introduction Dec 8,

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

[Hadoop, Storm and Couchbase: Faster Big Data]

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Realtime Apache Hadoop at Facebook. Jonathan Gray & Dhruba Borthakur June 14, 2011 at SIGMOD, Athens

KPACK: SQL Capacity Monitoring

Log Mining Based on Hadoop s Map and Reduce Technique

NoSQL. Thomas Neumann 1 / 22

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

NoSQL for SQL Professionals William McKnight

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Real Time Big Data Processing

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Sisense. Product Highlights.

GridGain In- Memory Data Fabric: UlCmate Speed and Scale for TransacCons and AnalyCcs

Data Warehouse as a Service. Lot 2 - Platform as a Service. Version: 1.1, Issue Date: 05/02/2014. Classification: Open

Reference Architecture, Requirements, Gaps, Roles

Use case: Merging heterogeneous network measurement data

Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.

So What s the Big Deal?

Increasing Flash Throughput for Big Data Applications (Data Management Track)

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Transcription:

Turn Big Data to Small Data Use Qlik to Utilize Distributed Systems and Document Databases October, 2014 Stig Magne Henriksen

Image: kdnuggets.com From Big Data to Small Data

Agenda When do we have a Big Data problem How to use Qlik for Analyzing Big Data, breaking into small chunks of data that can be analyzed General strategies for handling of Big Data Discussion on how to handle distributed systems Discussion on how Qlik can read data from the MongoDB database Discussion on how the use Qlik to read data from Hortonworks Summary

When do we have a Big Data problem? What happens when the amount of data is so huge that it is not possible store it in a database? Nor is it possible to store it in the memory of a computer Too many bytes(volume) What happens when the rate of change(new sources) of the data is so frequent that a solution created a couple of weeks ago - today is out of date? Too many sources(variety)

When do we have a Big Data problem II? What happens when data from Internet of things and mobile Apps increase immensely? Too high rate(velocity) What happens when a company with 200 branches with different variations on the nearly identical excel spreadsheet? Non scalable analysis

Have to Find the Useful Information Image: thestoragealchemist.com

Why Qlik with Big Data? Flexible Deployment Models In Memory with use of ODBC or OLE-DB Direct Discovery Application (Document) Chaining Combine Big Data and traditional data sources In Memory Direct Discovery Hybrid

Qlik In-Memory Approach Loads compressed data into memory Enables associative search and analysis 100 s millions to billions of rows of data

Qlik Direct Discovery Approach Combines the associative capabilities of the Qlik in-memory dataset with a query model where: The aggregated query result is passed back to a Qlik object without being loaded into the Qlik data model The result set is still part of the associative experience Capability to Drill to Detail records Qlik In-Memory Data Model Batch Load Qlik Application Direct Discovery

Application (Document) Chaining Navigate among Qlik applications Maintain Selections / Context 1) User makes selections in Application 1 2) Click a button to Application Chain 3) Application 2 opened, selections are transferred and applied

Why use them? Distributed systems Advangtages of distributed computed platforms - Parallelize I/O to quickly scan large datasets Cost effiency - Commodity nodes (cheap but unreliable) - Commodity network(might have low bandwith) - Automatic fault tolerance (few admins) - Easier to use(fewer programmers)

Two different approaches - Hortonworks Direct Discovery Can access data from external sources into Qlik Will not load data until it is requested from the app Only meta data is loaded Real time load of data Can access several tables Use ODBC and in memory Access to Hortonworks Can read complex objects Use the Hive interface send SQL to Hive that translate this into MapReduce statements Utilize the power of several servers Result sent back to Qlik No need to define a database View

Qlik and Hortonworks 100 s millions rows into Memory Broad Application to discover new trends Aggregates / Detail Deep Application to confirm and take action Billions of rows via Direct Discovery Direct Discovery Broad Application to discover new trends Deep Application to confirm and take action

Easy to setup Result from working with Hortonworks ODBC connects fine - read of data is straight forward Can do qualified calls via the ODBC (SQL based calls) Direct discovery works best when used on aggregated level HIVE is per definition not suited for interactive loads with many queries hence be careful with frequent Direct Discovery calls

MongoDB - New programming model Object oriented programming - A Document Database - Simple and fast to implement - No Complicated SQL (NOSQL) - Can be much faster than traditional SQL databases

MongoDB II - Can spread the DB across multiple machines - Limited multi-record transactional consistency, hence easier to implement across different machines - Often used in web-applications - Back-end for mobile Apps

Two different approaches - MongoDB Direct Discovery Can access data from external sources into Qlik Will not load data until it is requested from the app Only meta data is loaded Real time load of data Use SIMBA ODBC Access to MongoDB Can read documents from the database Can read complex objects from a document Can read sub levels of each instance in the Collections Can use the SQL language although this is NOSQL Result sent back to Qlik

Qlik and MongoDB SIMBA ODBC Broad Application to discover new trends MongoDB Direct reads Deep Application to confirm and take action

Result from working with MongoDB Easy to setup ODBC connects fine - read of data straight forward Can do qualified calls via the ODBC (SQL based calls, although this is a NOSQL database) Can read complex documents and read data on different levels It is fast to retrieve data

Summary Qlik is well suited for tapping into document database MongoDB - and read data and integrate into already existing analysis It is recommended to use different strategies according to your needs Direct Discovery when reading aggregated data ODBC to read data on more detail level Application chaining to swap between different levels of data ODBC and in memory approach works best with Hortonworks. Hive is too slow use interactive access approaches Big Data need strong visualization tools in this context Qlik is well suited for this task

Image: beautifulinsanity.com Small or Big Data - Result

Thank You