Production ready hadoop. By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv



Similar documents
Cisco IT Hadoop Journey

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Using Tableau Software with Hortonworks Data Platform

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Cost-Effective Business Intelligence with Red Hat and Open Source

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

The Enterprise Data Hub and The Modern Information Architecture

Real-Time Data Analytics and Visualization

Cisco IT Hadoop Journey

HDP Hadoop From concept to deployment.

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Native Connectivity to Big Data Sources in MSTR 10

SAP and Hortonworks Reference Architecture

Hadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Il mondo dei DB Cambia : Tecnologie e opportunita`

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Traditional BI vs. Business Data Lake A comparison

The Future of Data Management

The Future of Data Management with Hadoop and the Enterprise Data Hub

MapR: Best Solution for Customer Success

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Reference Architecture, Requirements, Gaps, Roles

HDP Enabling the Modern Data Architecture

A Whole New World. Big Data Technologies Big Discovery Big Insights Endless Possibilities

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Oracle Database 12c Plug In. Switch On. Get SMART.

HPE Vertica & Hadoop. Tapping Innovation to Turbocharge Your Big Data. #SeizeTheData

More Data in Less Time

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

Parallel Data Warehouse

Constructing a Data Lake: Hadoop and Oracle Database United!

Big Data Can Drive the Business and IT to Evolve and Adapt

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Big Data and Hadoop for the Executive A Reference Guide

Simplifying Data Governance and Accelerating Real-time Big Data Analysis in Financial Services with MarkLogic Server and Intel

Successfully Deploying Alternative Storage Architectures for Hadoop Gus Horn Iyer Venkatesan NetApp

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Healthcare with MarkLogic Server and Intel

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Cloudera Impala: A Modern SQL Engine for Hadoop Headline Goes Here

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Dashboard Engine for Hadoop

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Data Warehouse Overview. Srini Rengarajan

IST722 Data Warehousing

Tap into Hadoop and Other No SQL Sources

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Trafodion Operational SQL-on-Hadoop

Information Builders Mission & Value Proposition

SQream Technologies Ltd - Confiden7al

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Ten Cornerstones of a Modern Data Warehouse Environment

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Analytics on Spark &

The Principles of the Business Data Lake

Next Gen Hadoop Gather around the campfire and I will tell you a good YARN

The BIg Picture. Dinsdag 17 september 2013

HP Vertica. Echtzeit-Analyse extremer Datenmengen und Einbindung von Hadoop. Helmut Schmitt Sales Manager DACH

Inge Os Sales Consulting Manager Oracle Norway

Actian SQL in Hadoop Buyer s Guide

QlikView Business Discovery Platform. Algol Consulting Srl

Presented by: Jose Chinchilla, MCITP

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Integrating a Big Data Platform into Government:

Saving Millions through Data Warehouse Offloading to Hadoop. Jack Norris, CMO MapR Technologies. MapR Technologies. All rights reserved.

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

The Next Wave of Data Management. Is Big Data The New Normal?

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Please give me your feedback

Making Sense of the Madness

Real-Time Enterprise Management with SAP Business Suite on the SAP HANA Platform

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

2009 Oracle Corporation 1

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

Best Practices for Hadoop Data Analysis with Tableau

The big data revolution

IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop

Architectures for Big Data Analytics A database perspective

BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?

Large scale processing using Hadoop. Ján Vaňo

How To Handle Big Data With A Data Scientist

Self-service BI for big data applications using Apache Drill

Big Data and Apache Hadoop Adoption:

Open source Google-style large scale data analysis with Hadoop

SAP HANA. SAP HANA Performance Efficient Speed and Scale-Out for Real-Time Business Intelligence

Understanding the Value of In-Memory in the IT Landscape

Transcription:

Production ready hadoop By Deepak Rao Na,onal Head Datawarehousing Bajaj Finserv

Agenda! Data in today s BFSI world! Modern Data Lake! Use cases & prototyping! Big data impact in BFSI! Thank you!!

Defini8on of data Today *all* types of data should be available for analysis Structured Semi structured Unstructured Ability to analyze transactional data with semi & unstructured data is key to modern analytics Big data can help bring customer & market data from various formats to one place, data lake

What is a data lake? Storage repository that holds a vast amount of data in its native form. Most commonly, Hadoop based solution Large data pool of schema on read or schema less data A place to store unlimited amounts of data in any format inexpensively Allows collection of data needed by multiple departments in one place

Tradi8onal data warehouse Current state of a data warehouse DATA SOURCES ETL DATA WAREHOUSE OLTP ERP CRM LO B Star schemas, views other read- optimized structures INCREASING DATA VOLUME NO N- RELATIONAL DATA INCREASE IN TIME STALE REPORTING Increase in variety of data sources Increase in data volume Increase in types of data Rigid transformations can no longer keep pace with changing data Ensuring availability with costly appliances is a myth Inability to transform large volumes Always on EDW redesign mode Reports become invalid or unusable Delay in preserved reports increases Users begin to innovate to relieve starvation

Modern Data lake DATA SOURCES EXTRACT AND LOAD DATA LAKE OLTP ERP CRM LO B Low latency data access NON- RELATIONAL DATA FUTURE DATA SOURCES All data sources (internal & external) are valid source for data lake Extract and load data No need to archive for performance!! Data always stored in native format & always online Performance & availability based on Large scale commodity infrastructure Schema on Read or Schema less systems are the future on large scale analysis Users discover published data sets/ services using familiar tools

BFL data lake framework Diverse data source Data layer External data Real Hme Data acquisihon Sourcing & transformahon layer Low latency queries Query Engine SQL/NoSQL Storage layer Rich analyhcs on diverse data sets PresentaHon layer REST Services BI Tools (Tableau) SAS, R, ML ODBC/JDBC access Large scale data consumers Access layer Control Cockpit Parquet/ Orc Large, historical & diverse data Tooling Layer AuthorizaHon AuthenHcaHon ConfiguraHon Scheduling Monitoring AdministraHon

Future of Analy8cs is not just SQL queries

Increasing complexity & value of analy8cs

Maturity model of Analy8cs Technology: + Algorithms +Machine Learning How can we make it happen? SimulaHons, RecommendaHons Prescrip,ve Analy,cs Technology: + StaHsHcs What can happen? Predict Sales, demands, senhments & paxerns Technology: + hadoop Why did it happen? Historical trends, Causality, Mining Predic,ve Analy,cs Diagnos,c Analy,cs Technology: TradiHonal EDW Database & Reports What happened? Ad- hoc & Cube reports Descrip,ve Analy,cs

How about some performance numbers? During SQL test, Apache drill was ~2x faster than commercial RDBMS. During ETL test, we were able to get a network throughput of 600GB per hour consistently with MapR NFS. 20X Compression (Raw data vs ORC) in storage. Note: Above numbers are from our internal tests & the same is not in Production yet. Cluster config tested: MapR M5, 6 node cluster, each with (12 cores, 64 GB RAM, 4TB SATA, 2*10GBPS network)

Statutory warning fools rush in where angels fear to tread - From An essay on criticism by Alexander Pope, 1709 Big data is exciting, but (as always) its also a minefield of failures if not thought well. To avoid big data adoption failure: 1) Start small, if you fail, fail quick & get back on track wiser 2) Choose partners carefully 3) Be agile in building projects, no single projects beyond 2 months delivery 4) Build roadmap to add use case incrementally

Prototype based Big data adop8on Clearly defined use cases with expected outcome & timeline Prototype before going on a journey Build small teams to solve problems Example use case: Problem statement: Reduce computation time for customer segmentation Approach: Build distributed in-memory based solution on hadoop. Results: Customer segmentation execution time was reduced from 8 hours to 15 sec

What s happening globally in BFSI? Trends:

We re witnessing big data everywhere, literally What did to commercial emails & what did to proprietary Unix Today, we are at the tipping point where RDBMS in Analytics & Warehouse solutions is challenging the domination of Today: There are no major big data driven Financial firms in India. Tomorrow: Big data driven Financial services is an eventuality in India, (It s when, not if )

Thank you!!! Change, before you have to. - Jack Welch