A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect



Similar documents
Apache Kylin Introduction Dec 8,

Native Connectivity to Big Data Sources in MSTR 10

Trafodion Operational SQL-on-Hadoop

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Apache Kylin. Open Source Journey

How Companies are! Using Spark

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

Analytics on Spark &

Introducing Oracle Exalytics In-Memory Machine

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Using distributed technologies to analyze Big Data

LEARNING SOLUTIONS website milner.com/learning phone

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

AtScale Intelligence Platform

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

A Scalable Data Transformation Framework using the Hadoop Ecosystem

<Insert Picture Here> Enhancing the Performance and Analytic Content of the Data Warehouse Using Oracle OLAP Option

Dashboard Engine for Hadoop

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Splice Machine: SQL-on-Hadoop Evaluation Guide

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

BIG DATA TRENDS AND TECHNOLOGIES

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Upcoming Announcements

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Big Data Analytics Nokia

How to Enhance Traditional BI Architecture to Leverage Big Data

How To Handle Big Data With A Data Scientist

the missing log collector Treasure Data, Inc. Muga Nishizawa

Session# - AaS 2.1 Title SQL On Big Data - Technology, Architecture and Roadmap

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

}w!"#$%&'()+,-./012345<ya

1. OLAP is an acronym for a. Online Analytical Processing b. Online Analysis Process c. Online Arithmetic Processing d. Object Linking and Processing

Creating a universe on Hive with Hortonworks HDP 2.0

Integrating Apache Spark with an Enterprise Data Warehouse

Lost in Space? Methodology for a Guided Drill-Through Analysis Out of the Wormhole

Oracle Big Data SQL Technical Update

Business Intelligence & Product Analytics

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Apache Trafodion. Table of contents ENTERPRISE CLASS OPERATIONAL SQL-ON-HADOOP

Apache Hadoop: Past, Present, and Future

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Databricks. A Primer

Hortonworks Architecting the Future of Big Data

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Enterprise Operational SQL on Hadoop Trafodion Overview

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

On a Hadoop-based Analytics Service System

Creating Connection with Hive

Business Intelligence in Microservice Architecture. Debarshi bol.com

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Databricks. A Primer

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

IST722 Data Warehousing

MapR: Best Solution for Customer Success

Big Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013

Bringing Big Data to People

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Self-service BI for big data applications using Apache Drill

Spring,2015. Apache Hive BY NATIA MAMAIASHVILI, LASHA AMASHUKELI & ALEKO CHAKHVASHVILI SUPERVAIZOR: PROF. NODAR MOMTSELIDZE

Using Tableau Software with Hortonworks Data Platform

Discovering Business Insights in Big Data Using SQL-MapReduce

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Data Warehouse: Introduction

ORACLE BUSINESS INTELLIGENCE FOUNDATION SUITE 11g WHAT S NEW

Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering

IMPLEMENTING A BUSINESS INTELLIGENCE (BI) PROJECT FOR STRATEGIC PLANNING AND DECISION MAKING SUPPORT

The Future of Data Management

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

SQL on NoSQL (and all of the data) With Apache Drill

Achieving Business Value through Big Data Analytics Philip Russom

INDUS / AXIOMINE. Adopting Hadoop In the Enterprise Typical Enterprise Use Cases

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

OLAP Theory-English version

Data Integration Checklist

Big Data on Google Cloud

KNIME & Avira, or how I ve learned to love Big Data

CRITEO INTERNSHIP PROGRAM 2015/2016

Cost-Effective Business Intelligence with Red Hat and Open Source

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Real-Time Data Analytics and Visualization

Big Data Web Analytics Platform on AWS for Yottaa

Stinger Initiative: Introduction

Large scale processing using Hadoop. Ján Vaňo

Best Practices for Hadoop Data Analysis with Tableau

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

This Symposium brought to you by

Transcription:

A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1

The challenge today 2

Very quickly: OLAP Online Analytical Processing How many beers were ordered in Germany on a yearly basis? How many beers were ordered in Germany on a yearly basis, broken down by months? 3

OLAP Cubes Beer Coffee Water Berlin 30 60 55 Hamburg 40 60 45 Munich 65 85 75 March April May 4

Cubes: Measures and Dimensions Measures (also called Facts) are numeric! Product Name Category Color Sales Amount Name Country Region Sales Target Dimensions are Viewpoints (Perspectives) Date Date Month Year 5

OLAP Cubes Beer Coffee Water Berlin 30 60 55 Hamburg 40 60 45 Munich 65 85 75 March April May 6

That s all great but How about Big Data 7

What is Kylin? kylin / ˈkiːˈlɪn / 麒 麟 --n. (in Chinese art) a mythical animal of composite form Extreme OLAP Engine for Big Data Kylin is an open source Distributed Analytics Engine from (originally from ebay) that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop for extremely large datasets Open Sourced on Oct 1st, 2014 Accepted into incubation November, 2014 Preparing for first Apache release 8

Goals Sub-second query latency on billions of rows ANSI SQL for both analysts and engineers Full OLAP capability to offer advanced functionality Seamless Integration with BI Tools Support for high cardinality and dimensionality High concurrency thousands of end users Distributed and scale out architecture for large data volume 9

Kylin Depends on Hadoop Eco-system Hive Input source, pre-join star schema during cube building MapReduce Aggregate metrics during cube building HDFS Store intermediate files during cube building HBase Store and query data cubes Calcite SQL parsing, code generation, optimization 10

Kylin Architecture Overview Mid Latency - Minutes 3rd Party App (Web App, Mobile ) REST API SQL REST Server Query Engine Rou-ng SQL Tools (BI Tools: Tableau ) JDBC/ODBC SQL Ø Online Analysis Data Flow Ø Offline Data Flow Ø Clients/Users interactive with Kylin via SQL Ø OLAP Cube is transparent to users Low Latency - Seconds Hadoop Hive Star Schema Data Metadata Cube Build Engine (MapReduce ) Data OLAP Cube Cube (HBase) Key Value Data 11 11

Kylin Highlights Extremely Fast OLAP Engine at Scale Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data to seconds ANSI SQL Interface on Hadoop Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions Seamless Integration with BI Tools Kylin currently offers integration capability with BI Tools like Tableau. Interactive Query Capability Users can interact with Hadoop data via Kylin at sub-second latency MOLAP Cube User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records 12

More Highlights Compression and Encoding Support Incremental Refresh of Cubes Approximate Query Capability for distinct Count (HyperLogLog) Leverage HBase Coprocessor for query latency Job Management and Monitoring Easy Web interface to manage, build, monitor and query cubes Security capability to set ACL at Cube/Project Level Support LDAP Integration 13

Cube Designer 14

Job Management 15

Query and Visualization 16

Kylin History and Roadmap 2013 2014 2015 Future Q1, 2015 Sep, 2014 MOLAP Incremental Refresh HOLAP Streaming OLAP JDBC Driver New UI Next Gen Automation Capacity Management In-Memory Analysis (TBD) Spark (TBD) TBD Sep, 2013 Jan, 2014 Initial Prototype for MOLAP Basic end to end POC ANSI SQL ODBC Driver Web GUI ACL Open Source Excel Support more more 17

Open Source n Kylin Site: n http://kylin.io n Twitter: n @ApacheKylin n Source Code Repo: n https://github.com/kylinolap n Google Group: n n 微 信 n Kylin OLAP ApacheKylin 18

$50M $50M in Free Training Fabian Wilckens fwilckens@mapr.com 19