Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect
TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate to actually finding useful business information? Why is Qlik unique in leading the industry in solving Big Data solutions? Demo
TDWI Vancouver Agenda What really is Big Data? Most people think of Hadoop. How do we separate hype from reality? How does that relate to actually finding useful business information? Why is Qlik unique in leading the industry in solving Big Data solutions? Demo
A Brief History of Hadoop Google releases a paper on GFS, based on a distributed search platform called Nutch Cutting joins Yahoo, estimates a billion pg index will cost $500k and $30k/mos to support Hadoop promoted to top level Apache project, predictive search index creation time reduced from 12days to 8hrs A 1400n Yahoo cluster sorts 500GB in 59s. Cloudera launches Yahoo spins remaining Hadoop folks out into Hortonworks Cloudera adds real-time search, based on Lucene, also created by Cutting 3 rd Hadoop World conf attracts 2300 developers, up from 275 in 2010 2005 2008 2011 2013
Example Apache Hadoop or Next-Gen Components HDFS MapReduce Pig Zookeeper Hive HBase Mahout Spark Shark Cassandra Hadoop Distributed File System Processing framework for writing scalable data applications Procedural language that abstracts lower level MapReduce Highly reliable distributed coordination System for querying data on top of HDFS (SQL-like query) Database for random, real time read/write access Scalable machine learning libraries In-memory large-scale data processing 100x faster than Hadoop SQL engine on top of Spark Scalable multi-master database with no single points of failure And on, and on Hadoop
Big Data: Expanding on 3 fronts Data Velocity Real Time Near Real Time PB Data Volume Periodic TB Batch GB Table MB Database Web XML Audio Video Social Data Variety
What is Big Data? Big Data is: Nebulous Big Data is: Really Big or Not Big Data is: Mostly Useless Noise Big Data is: Slow Big Data is: Difficult
Big Data Ecosystem Much More Than Just Hadoop Big Insights & Streams Big Data Appliance HANA Data Visualization, Statistical & In-memory Analytics Big Data Analytic Appliances Splunk > Packaged Mapreduce platforms Massively Parallel Processing Platforms 8 Open source Distributed Processing Frameworks Big data Integration
Who What Why Telecom Financial Services Some uses of Big Data today Usage and Location Analysis Call Detail Records (CDRs) Next Product to Buy (NPTB) Real-time Bandwidth Allocation New Account Risk Screens Fraud Detection Trading Risk Real-Time P&L Portfolio Analysis Operational Excellence Customer Retention Profitability Improve Profit Minimize Risk Utilities Smart Metering Analysis Operational Excellence Retail Manufacturing 360 o Customer View Brand Sentiment Analysis Up Sell/Cross Sell Clickstream Analysis Supply Chain & Logistics Assembly Line QA Proactive Maintenance Increase Revenues Customer Loyalty Brand Awareness Operational Excellence Profitability Source: Gartner 50 Real World Examples of Big Data and Analytics, 2013
TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate to actually finding useful business information? Why is Qlik unique in leading the industry in solving Big Data solutions? Demo
Popular Big Data Myths You need to have Ga-zinga-bytes of data to deploy a Big Data solution Typical Cloudera Cluster is 15-20 nodes, < 10TB of data Hadoop storage is 3-400% cheaper than an EDW Hadoop is all you need Hadoop is an enabling technology that provides the foundation for Big Data solutions Focus today is on data management The RDBMS is dead RDBMS is still critical but not for high volume, low quality analytics QlikView can t handle Big Data Reality is a Human can t handle Big Data It s all about the use case
Big Data is rapidly shifting from how much data you can handle to how quickly you can deliver value Volume of Data is just one, less and less critical factor Context is key and difficult to pinpoint Big Data: Hadoop is designed to support petabytes and beyond Fast Data: Big Data vs. Fast Data vs. Right Data Teradata, SAP HANA, Netezza, Hbase, MongoDB, ParStream, etc Big Data is slow & cheap, Fast Data is neither A Big Data Solution requires components that address both Hadoop is the data system that combines Fast and Big platform QlikView is the platform that supports both scenarios simultaneously
Where Big Data fits today: The new BI architecture Data Accelerator??? Big Data Repository Data Warehouse??? Web data Docs & text data Audio/Video data Machine data Operational systems Unstructured/Semi-structured data Structured data
Big Data comes with big challenges The Big Data bottleneck Reports Data Scientists Big Data Business Users many organizations lack the skills required to exploit big data most of these skills are in short supply and rare in the market at large Source: Gartner Big Data Hype Cycle Report 2013 data science encompasses hard skills
Big Data comes with big challenges Obstacles to Big Data Analytics Organizations are challenged in staffing and training Staffing Training Real-Time License Cost Integration 79% 77% 67% 64% 64% Organizations have trouble finding qualified professionals to manage big data and providing training to those already on board Source: Ventana Research, The Challenge of Big Data Benchmark Research, November 2013
TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate to actually finding useful business information? Why is Qlik unique in leading the industry in solving Big Data solutions? Demo
Insight Comes from Data, in Context Data warehouse Machine data, web data, cloud data Hadoop cluster Google BigQuery Operational systems
Big Data Business Needs Descriptive Analytics Predictive Analytics Prescriptive Analytics DATA Clinical, Claims, Monitoring, others How are we doing? How many claims did we pay today? What might happen in the future? Which of tomorrow s claims might be requesting an Emergency Room (ER) admission? Best course of action given objectives, requirements & constraints What would be effective steps to reduce probability of ER admission?
TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate to actually finding useful business information? Why is Qlik unique in leading the industry in solving Big Data solutions? Demo
Who are we - QlikView What Is QlikView? QlikView is a Business Discovery platform User-driven BI supporting the creation and consumption of dynamic apps for analyzing information QlikView apps allow non-technical users to explore visual views of information and ask streams of questions, through simple interactions such as clicks and taps QlikView s patented software engine dynamically calculates new views of information, instantly, based on user selections
QlikView - A New Kind of Software Company Leader in Business Discovery user-driven BI Broad Base of 28,000 Customers 28,000+ customers in 100 countries 1,500 global partners 1,500 employees across 28 offices in 23 countries No. 1 fastest-growing enterprise technology company (ZDNet) Gartner Magic Quadrant Leader for 3 consecutive years
These are Tools And this is How BI has been done
This is a Platform
Analytical Quotient The Evolution of Business Intelligence Managed Reporting Ad-Hoc Reporting Dashboards / Visualization OLAP / Analysis Associative / Statistical Exploration Predictive QlikView s Sweet Spot Usefulness
What Makes QlikView Unique? 1) Associative Query Language + Full Search *not another query tool. 2) Core Technology: True In-memory, columnar database with built in visualization, analytics, and ELT in a single product. 3) Designed for Heterogeneous & Complex Data (*again not just another query tool) 4) Application / Mobile Design First (Mobile, Desktop, Tablet Design once, consume anywhere)
QlikView s Natural Analytics makes data analysis a natural part of every business process for everyone How traditional BI and visualization tools work QlikView Natural Analytics Limited view and access to data Forced down linear drill paths Need to involve IT to modify What-if and on-the-fly analysis is limited Freedom to explore data from any point in analysis in a dynamic, interactive interface Answer any question on the fly, real-time Easily see connections, and disconnects in data
The Green, The White and The Gray
The Visualization Bottleneck Query Size Tableau Spotfire Big Data Datameer MSTR Analytics Desktop Response Time
Connectivity to every Big Data Source SAP HANA MPP Warehouse NoSQL Databases Hadoop Advanced Analytics Batch Real-time SAP HANA BigQuery
The Big Data Value Chain Hard Disk Drives (HDD) Solid State Storage (SSD) Random Access Memory (RAM) Speed (t/tb) 3300s 1000-300s 1s Price $/TB $ 50 $ 500 $ 4500 Keep data in memory when the value obtained from processing it is high Leave data on disk when it is inactive or the value from processing it is low Value Size
Flexible Big Data deployment models 100 s millions rows into Memory Aggregates / Detail Billions of rows via Direct Discovery Direct Discovery
Combine Big Data and traditional data sources Combine data sources using pure In-Memory Aggregates / Detail EDW Data Data Warehouse
QlikView as a catalyst for implementing Big Data Today s challenge: What to do with Big Data? Who should do it? IT What to do with this? Business How to define requirements?
QlikView as a catalyst for implementing Big Data QlikView gives business users ability to discover with Big Data, not just data scientists IT & Business More Access > More Questions > More Use > Higher ROI of Big Data
QlikView In-Memory approach Loads compressed data into memory Enables associative search and analysis Supports 100 s millions to billions of rows of data In-Memory
QlikView Direct Discovery Approach Combines the associative capabilities of the QlikView in-memory dataset with a query model where: The aggregated query result is passed back to a QlikView object without being loaded into the QlikView data model The result set is still part of the associative experience Capability to Drill to Detail records QlikView In-Memory Data Model Batch Load QlikView Application Direct Discovery
A Hybrid Approach for Tackling Big Data 100% in-memory for: All the necessary (i.e. relevant and contextual) data can fit in-memory Users require only aggregated or summary data, i.e. hourly or daily averages, or record-level detail over a limited time period. Query performance of external source is not satisfactory Direct Discovery for: Data cannot fit in memory and document chaining is not sufficient Users require access to recordlevel of detail stored in a large fact table that will not fit in memory. Network bandwidth limits ability to copy data to QlikView server The Design of Direct Discovery lets you alternate between these approaches with absolutely no change to the application itself
DEMO