Business Intelligence in Microservice Architecture. Debarshi Basak @ bol.com



Similar documents
Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

LEARNING SOLUTIONS website milner.com/learning phone

Apache Kylin Introduction Dec 8,

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

MS 20467: Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Establish and maintain Center of Excellence (CoE) around Data Architecture

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

East Asia Network Sdn Bhd

Big Data Pipeline and Analytics Platform

Designing Business Intelligence Solutions with Microsoft SQL Server 2012 Course 20467A; 5 Days

Getting Real Real Time Data Integration Patterns and Architectures

Data Integration Checklist

Luncheon Webinar Series May 13, 2013

Business Intelligence for Big Data

SAS Business Intelligence Online Training

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

How to Enhance Traditional BI Architecture to Leverage Big Data

More Data in Less Time

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Real Time Big Data Processing

Managing Data in Motion

Republic Polytechnic School of Information and Communications Technology C355 Business Intelligence. Module Curriculum

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Cisco IT Hadoop Journey

Implementing a Data Warehouse with Microsoft SQL Server

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Orchestrating Distributed Deployments with Docker and Containers 1 / 30

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

IST722 Data Warehousing

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Ganzheitliches Datenmanagement

SQL Server 2005 Features Comparison

Course Outline. Module 1: Introduction to Data Warehousing

Implementing a Data Warehouse with Microsoft SQL Server

DATA INTEGRATION. in the world of microservices

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Designing Business Intelligence Solutions with Microsoft SQL Server 2012

Big Data for Investment Research Management

Implementing a Data Warehouse with Microsoft SQL Server

MDM and Data Warehousing Complement Each Other

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)

Enterprise Data Integration for Microsoft Dynamics CRM

Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning

BIG DATA TRENDS AND TECHNOLOGIES

Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.

Apache Hadoop: Past, Present, and Future

Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777

From Spark to Ignition:

The Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang

SQL Server 2012 Business Intelligence Boot Camp

Big Data Analytics Nokia

Testing Big data is one of the biggest

Designing Self-Service Business Intelligence and Big Data Solutions

EII - ETL - EAI What, Why, and How!

Three Open Blueprints For Big Data Success

SAS BI Course Content; Introduction to DWH / BI Concepts

Implementing a Data Warehouse with Microsoft SQL Server

Monitor Your Key Performance Indicators using WSO2 Business Activity Monitor

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Course 20463:Implementing a Data Warehouse with Microsoft SQL Server

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Building Scalable Big Data Infrastructure Using Open Source Software. Sam William

Linux A first-class citizen in Windows Azure. Bruno Terkaly bterkaly@microsoft.com Principal Software Engineer Mobile/Cloud/Startup/Enterprise

Big Data Success Step 1: Get the Technology Right

Virtualizing Apache Hadoop. June, 2012

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Unlock your data for fast insights: dimensionless modeling with in-memory column store. By Vadim Orlov

Real-time Analytics at Facebook: Data Freeway and Puma. Zheng Shao 12/2/2011

Hadoop and Map-Reduce. Swati Gore

The Inside Scoop on Hadoop

Putting Apache Kafka to Use!

I/O Considerations in Big Data Analytics

70-467: Designing Business Intelligence Solutions with Microsoft SQL Server

GigaSpaces Real-Time Analytics for Big Data

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Big Data & Analytics Reference Architecture

Big Data Analytics - Accelerated. stream-horizon.com

Time-Series Databases and Machine Learning

Ten Things You Need to Know About Data Virtualization

Big Data and Your Data Warehouse Philip Russom

MODERN ENTERPRISE APPS OPERATIONS WITH DC/OS

Developing Business Intelligence and Data Visualization Applications with Web Maps

Providing real-time, built-in analytics with S/4HANA. Jürgen Thielemans, SAP Enterprise Architect SAP Belgium&Luxembourg

<Insert Picture Here> Oracle Retail Data Model Overview

Using distributed technologies to analyze Big Data

Data Warehouse Optimization

Information Builders Mission & Value Proposition

Transcription:

Business Intelligence in Microservice Architecture Debarshi Basak @ bol.com

What can you expect? - Introduction Monolithic days Mapreduce Era Flink Era Operational Aspect

Who am I? Debarshi Basak Software engineer at bol.com Part of Bigdata platform team and online marketing

About bol.com - Leader in Dutch ecommerce Scrum 1000+ employees 40+ scrum teams Young and relaxed You build it. You run it. You love it.

How big is big data at bol.com? - > 11.000.000 products for sale Catalog > 38.000.000 400.000.000 newsletter responses 15.000.000 new clicks every day - 26 node cluster More than 300 jobs a month in production

What is Microservice?

What is Microservice? App Database

What is Microservice? Services Services App Services Services Services Services Database Services

Business Intelligence 101

Business Intelligence 101

Business Intelligence 101 - Analyzing data and presenting actionable items

Business Intelligence 101 - Analyzing data and presenting actionable items - Automated and Continuous Integration with internal and external data sources

Business Intelligence 101 - Analyzing data and presenting actionable items - Automated and Continuous Integration with internal and external data sources - Flexible Analytics

Business Intelligence 101 - ETL - Extract from source Transform the data Load into target data models

Business Intelligence 101 - ETL - - Extract from source Transform the data Load into target data models Business data modelling - Kimball s dimensional modeling technique OLAP cubes

http://i.cmpnet.com/intelligententerprise/images/0809/kimball-u-model.gif https://web.stanford.edu/dept/itss/docs/oracle/10g/olap.101/b10333/cube2.gif

Monolithic days

Evolution of BI Systems (at bol.com) Online systems

Evolution of BI Systems (at bol.com) Online systems Data hub

Evolution of BI Systems (at bol.com) Online systems Replication Data hub

Evolution of BI Systems (at bol.com) Online systems Data hub Replication Data warehouse

Evolution of BI Systems (at bol.com) Online systems Data hub Data warehouse ETL

Evolution of BI Systems (at bol.com) - Easy to implement Complexities are abstracted Data Overheads Latency

Evolution of BI Systems (at bol.com) Online systems

Evolution of BI Systems (at bol.com) Online systems Publish Message Broker

Evolution of BI Systems (at bol.com) Online systems Message Broker Listener Data warehouse

Evolution of BI Systems (at bol.com) Online systems Message Broker Listener Data warehouse

Evolution of BI Systems (at bol.com) - Loss of Messages and Consistency guarantees Database are kind of not made for this Complex implementation Nightmare for operations

Challenges in Microservice Architecture

Challenges in Microservice Architecture - Too many sources Can affect scalability and stability of reports BI cannot scale Extraction logic, transformation operations for each Service Joins.

Hadoop era

History of Hadoop at bol.com Operational experience in hbase, hadoop based tooling - Supplier connector - Recommendation system

Service Definition RPC over HTTP Message Queues Service Bulk Interfaces

Bulk Interfaces t1-productid1 f:price::15,50 t2-productid2 f:price::15,35 t3-productid1 f:price::15,25 t4-productid1 f:price::15,75 t5-productid3 f:price::15,50

Bulk Interfaces t1-productid1 f:price::15,50 t2-productid2 f:price::15,35 t3-productid1 f:price::15,25 t4-productid1 f:price::15,75 t5-productid3 f:price::15,50 We can replay event to get the latest state of the event. This is also known as Event Sourcing Pattern. Similar key design can be found in OpenTSDB

Re-imagine traditional BI on hadoop Source systems Staging-Transformation Loading in reporting tooling

Re-imagine traditional BI on hadoop Source systems Staging-Transformation Loading in reporting tooling

Supplier Service Offers Services Pricing Services Data warehouse

From Queues Supplier Service Offers Services Pricing Services Data warehouse

Supplier Service Offers Services Pricing Services Data warehouse

Supplier Service Offers Services Pricing Services Data warehouse

Supplier Service Offers Services Pricing Services Data warehouse

Supplier Service Offers Services Pricing Services Data warehouse

Automation Supplier Service Integration Unit Offers Services Pricing Services Data warehouse

What kind of jobs we build - Aggregation jobs - - Single service, aggregate on a function key Interface concatenation - Multiple services combined on one/many functional keys.

Automation { "bulk_interface" : "transport_acc_public_v1_transporter_versions", "table_name" : "transporter", "primary_key" : "f:transporter.transporterid", "ctl_name" : "transporter", "service_version" : "v2", "col_map":[ { "hbase_col_name": "f:transporter.transporterid", "ora_col_name":"id", "function_name":"transporter id", "data_type" : "NUMBER" }, { "hbase_col_name": "f:transporter.transportercode", "ora_col_name":"code", "function_name":"transporter code", "data_type" : "VARCHAR2(40)" }, { "hbase_col_name": "f:transporter.transportername", "ora_col_name":"transportername", "function_name":"transporter Name", "data_type" : "VARCHAR2(40)" } ] }

Problem Source systems Staging-Transformation Loading in reporting tooling

Problem Source systems Staging-Transformation Loading in reporting tooling

Problem Source systems Staging-Transformation Loading in reporting tooling

But everything is stream. Nature of data in most of use cases is asynchronous. Clicks are asynchronous Orders are asynchronous Updates are asynchronous In fact, Batch is a bounded stream.

Streaming era

Enter Flink Low entry barrier Java/Scala functional apis. Operational expertise.

Emulating Stream You don t always need queues for stream Streaming HBase tables.

Give a starting point in stream t1-productid1 f:price::15,50 t2-productid2 f:price::15,35 t3-productid1 f:price::15,25 t4-productid1 f:price::15,75 t5-productid3 f:price::15,50

Give next x records t1-productid1 f:price::15,50 t2-productid2 f:price::15,35 t3-productid1 f:price::15,25 t4-productid1 f:price::15,75 t5-productid3 f:price::15,50

Give next x records t1-productid1 f:price::15,50 t2-productid2 f:price::15,35 t3-productid1 f:price::15,25 t4-productid1 f:price::15,75 t5-productid3 f:price::15,50

while(true){ Give next x records } t1-productid1 f:price::15,50 t2-productid2 f:price::15,35 t3-productid1 f:price::15,25 t4-productid1 f:price::15,75 t5-productid3 f:price::15,50

Offers Product Catalog

ProductId Offers ProductId Product Catalog Product-Offer-Join

Offers Product Catalog Product-Offer-Join

Offers Product Catalog Product-Offer-Join

Offers Product Catalog Product-Offer-Join

Offers Product Catalog Product-Offer-Join

Offers Product Catalog Product-Offer-Join

Offers Eventual Consistency ProductId Product Catalog Product-Offer-Join

Offers Product Catalog Sources Product-Offer-Other Sources-Join

Offers Product Catalog Sources Cube or star

Offers Product Catalog Sources Star

Can we automate this?

Can we automate this? Yes, We can.

Can we automate this? cube_builder.from( table("productoffer_tst_public_v1.0_sellingoffers_versions") ).on( key("f:globalid", key -> new StringBuilder(key).reverse().toString()) ).lookup( key("f:globalid"), table("financecategory_tst_public_v1_productfinancecategorycurrents") ).to( table("final_join_version"), table("reverse_index_lookup", key("f:globalid"), columns("f:offerid")), table("final_join_version1", columns("f:sellingofferdata.listprice")) ).build().execute();

Operational Aspect Build

Operational Aspect Build

Operational Aspect Build Docker Registry

Operational Aspect Build Docker Registry Deploy

Operational Aspect Build Docker Registry Deploy

Operational Aspect Build Docker Registry Deploy

Lessons learned - Dedicated team for hadoop Think not tools but how to solve problems Flink can be flinky Frameworks are out there Kylin Think infrastructure too