BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES



Similar documents
Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Information Builders Mission & Value Proposition

Big Analytics: A Next Generation Roadmap

The Future of Data Management

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

BIG DATA TRENDS AND TECHNOLOGIES

HDP Hadoop From concept to deployment.

Native Connectivity to Big Data Sources in MSTR 10

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

The Enterprise Data Hub and The Modern Information Architecture

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Big Data Big Data/Data Analytics & Software Development

How To Use Big Data For Business

Are You Ready for Big Data?

Using Tableau Software with Hortonworks Data Platform

Real Time Big Data Processing

Data Integration Checklist

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Analyzing Big Data with AWS

Introducing Oracle Exalytics In-Memory Machine

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Luncheon Webinar Series May 13, 2013

Business Intelligence for Big Data

Ganzheitliches Datenmanagement

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Microsoft SQL Server 2012 with Hadoop

The Future of Data Management with Hadoop and the Enterprise Data Hub

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

White Paper: Datameer s User-Focused Big Data Solutions

Tap into Hadoop and Other No SQL Sources

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

BEYOND BI: Big Data Analytic Use Cases

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Getting Started Practical Input For Your Roadmap

Are You Ready for Big Data?

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Sunnie Chung. Cleveland State University

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Big Data at Cloud Scale

How To Handle Big Data With A Data Scientist

Microsoft Big Data. Solution Brief

Copyright 2014, Neudesic. All rights reserved.

DATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases

The BIg Picture. Dinsdag 17 september 2013

Modernizing Your Data Warehouse for Hadoop

BIG DATA SYSTEM DEVELOPMENT: AN EMBEDDED CASE STUDY WITH A GLOBAL OUTSOURCING FIRM

Big Data and Your Data Warehouse Philip Russom

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

MapR: Best Solution for Customer Success

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Azure Data Lake Analytics

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

This Symposium brought to you by

Big Data Can Drive the Business and IT to Evolve and Adapt

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

How To Use Big Data For Telco (For A Telco)

Hadoop. for Oracle database professionals. Alex Gorbachev Calgary, AB September 2013

The 4 Pillars of Technosoft s Big Data Practice

HDP Enabling the Modern Data Architecture

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Big Data Web Analytics Platform on AWS for Yottaa

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Big Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe May, 2013

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

So What s the Big Deal?

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Open source Google-style large scale data analysis with Hadoop

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Achieving Business Value through Big Data Analytics Philip Russom

IBM Big Data Platform

Artur Borycki. Director International Solutions Marketing

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Big Data Analytics Nokia

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

Big Data for everyone Democratizing big data with the cloud. Steffen Krause Technical

SAP and Hortonworks Reference Architecture

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Dell Information Management solutions

Transcription:

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2

Agenda Big Data Challenges Big Data Reference Architectures Case Studies Tips for Designing Big Data Solutions 3

Big Data Challenges UNSTRUCTURED STRUCTURED HIGH MEDIUM LOW Archives Docs Business Apps Media Social Networks Public Web Data Storages Machine Log Data Sensor Data Complexity Velocity Variety Volume Archives Scanned documents, statements, medical records, e-mails etc.. Docs XLS, PDF, CSV, HTML, JSON etc. Business Apps CRM, ERP systems, HR, project management etc. Media Images, video, audio etc. Social Networks Twitter, Facebook, Google+, LinkedIn etc. Public Web Wikipedia, news, weather, public finance etc Data Storages RDBMS, NoSQL, Hadoop, file systems etc. Machine Log Data Application logs, event logs, server data, CDRs, clickstream data etc. Sensor Data Smart electric meters, medical devices, car sensors, road cameras etc. 4

Big Data Analytics Traditional Analytics (BI) vs Big Data Analytics Focus on Descriptive analytics Diagnosis analytics Predictive analytics Data Science Data Sets Supports Limited data sets Cleansed data Simple models Causation: what happened, and why? Large scale data sets More types of data Raw data Complex data models Correlation: new insight More accurate answers 5

Big Data Analytics Use Cases Low Latency Reliability Real Time Intelligence Consumers Intelligent Agents Volume Performance Data Discovery Business Reporting Data Quality Self Service Data Scientists/ Analysts Business Users 6

Big Data Analytics Reference Architectures Architecture Drivers: Volume Sources Throughput Latency Extensibility Data Quality Reliability Security Self-Service Cost Reference Architectures: Extended Relational Non-Relational Hybrid 7

Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Structured ETL Data Warehouses Query & Reporting Web Browsers Semi- Structured Messaging Data Marts OLAP Cubes Native Desktop Unstructured API/ODBC Operational Data Stores Advanced Analytics Mobile Devices Replication Web Services 8

Extended Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Structured ETL Data Warehouses Query & Reporting Web Browsers Semi- Structured Messaging Data Marts OLAP Cubes Native Desktop Unstructured API/ODBC Operational Data Stores Advanced Analytics Mobile Devices Replication Web Services Key components affected with Big Data challenges 9

Non-Relational Reference Architecture Data Sources Integration Data Storages Analytics Presentation Structured ETL NoSQL Databases Query & Reporting Web Browsers Semi- Structured Messaging Distributed File Systems Map Reduce Native Desktop Unstructured API Search Engines Mobile Devices Advanced Analytics Web Services Key components introduced with non-relational movement 10

Extended Relational vs. Non-Relational Architecture Architecture Drivers Large data volume Self service (ad hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault tolerance Low latency (near real time) Low cost Skills availability Extended Relational Non Relational 11

Extended Relational vs. Non-Relational Architecture Architecture Drivers Large data volume Self service (ad hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault tolerance Low latency (near real time) Low cost Skills availability Extended Relational Non Relational 12

Extended Relational vs. Non-Relational Architecture Architecture Drivers Large data volume Self service (ad hoc reporting) Unstructured data processing High data model extensibility High data quality and consistency Extensive security Reliability and fault tolerance Low latency (near real time) Low cost Skills availability Extended Relational Non Relational 13

Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 14

Big Data Analytics Use Cases Real Time Intelligence Consumers Intelligent Agents Performance Volume Data Discovery Business Reporting Data Scientists Business Users 15

Data Discovery: Non-Relational Architecture Data Sources Integration Data Storages Analytics Presentation Structured ETL NoSQL Databases Query & Reporting Web Browsers Semi- Structured Messaging Distributed File Systems Map Reduce Native Desktop Unstructured API Search Engines Mobile Devices Advanced Analytics Web Services 16

Big Data Analytics Use Cases Real Time Intelligence Consumers Intelligent Agents Data Discovery Business Reporting Data Quality Self Service Data Scientists Business Users 17

Business Reporting: Hybrid Architecture Data Sources Integration Data Storages Analytics Presentation Structured ETL Relational DWH/DM SQL Query & Reporting Web Browsers Semi- Structured Messaging Distributed File Systems Map Reduce Native Desktop Unstructured API Search Engines Mobile Devices Advanced Analytics Web Services Extended Relational components Non-relational components 18

Big Data Analytics Use Cases Low Latency Reliability Real Time Intelligence Consumers Intelligent Agents Data Discovery Business Reporting Data Scientists Business Users 19

Lambda Architecture Source: 20

Case Study #1: Usage & Billing Analysis Business Goals: Provide visual environment for building custom mobile application Charge customers based on the platform they are using, number of consumers applications etc. Business Area: Cloud based platform for building, deploying, hosting and managing of mobile applications 21

Architectural Decisions Architecture Drivers: Volume (> 10 TB) Sources (Semi-structured - JSON) Throughput (> 10K/sec) Latency (2 min) Extensibility (Custom metrics) Data Quality (Consistency) Reliability (24/7) Security (Multitenancy) Self-Service (Ad-Hoc reports) Cost (The less the better ) Constraints (Public Cloud) Trade-off : // Extended Relational Non-Relational Extensibility + Data Quality + Self-Service + Extended Relational Architecture Extensibility via Pre allocated Fields pattern 22

Solution Architecture Technologies: Amazon Redshift Amazon SQS Amazon S3 Elastic Beanstalk Jaspersoft BI Professional Python 23

Case Study #2: Clickstream for retail website Business Goals: Build in-house Analytics Platform for ROI measurement and performance analysis of every product and feature delivered by the e-commerce platform; Provide the ability to understand how end-users are interacting with service content, products, and features on sites; Do clickstream analysis; Perform A/B Testing Business Area: Retail. A platform for e-commerce and collecting feedbacks from customers 24

Architectural Decisions Architecture Drivers: Volume (45 TB) Sources (Semi-structured - JSON) Throughput (> 20K/sec) Latency (1 hour) Extensibility (Custom tags) Data Quality (Not critical) Reliability (24/7) Security (Multitenancy) Self-Service (Canned reports, Data science) Cost (The less the better ) Constraints (Public Cloud) Trade-off : // Extended Relational Non- Relational Volume/Scalability +/ + Throughput + + Self-Service + +/ Non Relational Architecture Reporting via Materialized View pattern Extensibility + 25

Solution Architecture Technologies: Amazon S3 Flume Hadoop/HDFS, MapReduce HBase Oozie Hive Node 1 Node 2 Node N 26

Tips for Designing Big Data Solutions Understand data users and sources Discover architecture drivers Select proper reference architecture Do trade-off analysis, address cons Map reference architecture to technology stack Prototype, re-evaluate architecture Estimate implementation efforts Set up devops practices from the very beginning Advance in solution development through small wins Be ready for changes, big data technologies are evolving rapidly 27

Clients include: Leading global Product and Application Development partner founded in 1993 3,300+ employees across North America, Ukraine and Western Europe Thousands of successful outsourcing projects! SaaS/Cloud Solutions. Mobility Solutions. UX/UI BI/Analytics/Big Data. Software Architecture. Security 28

Thank You! SoftServe US Office One Congress Plaza, 111 Congress Avenue, Suite 2700 Austin, TX 78701 Tel: 512.516.8880 Contacts Serhiy Haziyev: shaziyev@softserveinc.com Olha Hrytsay: ohrytsay@softserveinc.com 29