Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012



Similar documents
ANALYTICS STRATEGY: creating a roadmap for success

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

Big Data Defined Introducing DataStack 3.0

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Big Data and Big Data Modeling

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Big Data and Analytics in Government

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Next-Generation Cloud Analytics with Amazon Redshift

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Parallel Data Warehouse

IBM Data Warehousing and Analytics Portfolio Summary

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

ROME, BIG DATA ANALYTICS

Traditional BI vs. Business Data Lake A comparison

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

SEIZE THE DATA SEIZE THE DATA. 2015

ANALYTICS BUILT FOR INTERNET OF THINGS

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Intel s Big Data Journey

BIG DATA TRENDS AND TECHNOLOGIES

In-Memory Analytics for Big Data

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Introducing Oracle Exalytics In-Memory Machine

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Choosing The Right Big Data Tools For The Job A Polyglot Approach

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Moving Large Data at a Blinding Speed for Critical Business Intelligence. A competitive advantage

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Tap into Hadoop and Other No SQL Sources

BIG DATA What it is and how to use?

NextGen Infrastructure for Big DATA Analytics.

Using In-Memory Computing to Simplify Big Data Analytics

Are You Ready for Big Data?

Advanced Big Data Analytics with R and Hadoop

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Big Data Er Big Data bare en døgnflue? Lasse Bache-Mathiesen CTO BIM Norway

Delivering new insights and value to consumer products companies through big data

High-Performance Analytics

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

Il mondo dei DB Cambia : Tecnologie e opportunita`

How To Handle Big Data With A Data Scientist

NoSQL for SQL Professionals William McKnight

Information Architecture

IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Testing 3Vs (Volume, Variety and Velocity) of Big Data

G-Cloud Big Data Suite Powered by Pivotal. December G-Cloud. service definitions

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Architecting for the Internet of Things & Big Data

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Making Sense of Big Data in Insurance

Maximizing Returns through Advanced Analytics in Transportation

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Real-Time Big Data Analytics SAP HANA with the Intel Distribution for Apache Hadoop software

III JORNADAS DE DATA MINING

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

SAP and Hortonworks Reference Architecture

Big Data & Analytics. Counterparty Credit Risk Management. Big Data in Risk Analytics

Evolution to Revolution: Big Data 2.0

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

How To Scale Out Of A Nosql Database

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

HADOOP SOLUTION USING EMC ISILON AND CLOUDERA ENTERPRISE Efficient, Flexible In-Place Hadoop Analytics

Big Data Zurich, November 23. September 2011

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Luncheon Webinar Series May 13, 2013

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Building your Big Data Architecture on Amazon Web Services

The Future of Data Management

Big Data. Fast Forward. Putting data to productive use

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Foundations of Business Intelligence: Databases and Information Management

Driving Peak Performance IBM Corporation

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Big Data. White Paper. Big Data Executive Overview WP-BD Jafar Shunnar & Dan Raver. Page 1 Last Updated

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

Big Data and Your Data Warehouse Philip Russom

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Data Centric Computing Revisited

Harnessing the power of advanced analytics with IBM Netezza

INVESTOR PRESENTATION. Third Quarter 2014

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Trafodion Operational SQL-on-Hadoop

SAP Real-time Data Platform. April 2013

Achieving Business Value through Big Data Analytics Philip Russom

Focus on the business, not the business of data warehousing!

Transcription:

Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012

Between the dawn of civilization and 2003, the human race created 5 exabytes of data Now we generate that every 2 days Total amount of global data is expected to grow to 2700 exabytes during 2012, up 48% from 2011 1 Exabyte = 1,000,000 Tb 2

Big Data Defined Techniques and technologies that make handling data at extreme scale affordable. VARIETY Structured -> Semi-structured -> Unstructured VOLUME Terabytes -> Exabytes VELOCITY Batch -> Streaming Data Source: Forrester Research, ctoforum.org 3

Evolution of Analytics Descriptive Predictive Prescriptive 1990s 2000s Late 2000s 2010s What happened? Standard Reporting Why did it happen? Query / Drill down What could Happen? Simulation What should I be doing? Optimization 4

How is Big Data Analytics Different? BIG DATA ANALYTICS Complex Algorithms / Linear Programming Experimental, Ad Hoc Mostly Semi- Structured External + Operational 10s of TB to 100's of PB's Mathematics Workload Variety Sources Volumes TRADITIONAL BI Addition (Aggregation) Repetitive Structured Operational GBs to 10s of TBs 5

The Big Data Lifecycle Manage Insight Enrich Source: hadoop.apache.org; Microsoft.com; ibm.com 6

Manage Data ANY DATA, ANYWHERE, ANY SIZE 12345894597573629009890467382 3458945975736290098904673 945975736290098904673 8945975736290098 Relational Non-Relational Streaming Data Movement Source: hadoop.apache.org; Microsoft.com; ibm.com 7

ENRICH by Combining and Refining! Refine Discover Combine Source: Microsoft.com, oracle.com, ibm.com 8

Insight Anywhere, Any Device, Any User ANY DATA, ANYWHERE (DEVICES), ALL USERS Source: Microsoft.com, oracle.com, ibm.com 9

BIG DATA REQUIRES AN END-TO-END APPROACH INSIGHT Self-Service Collaboration Corporate Apps Devices F(x) ENRICH Discover Combine Refine MANAGE Relational Non-relational Analytical Streaming Source: Microsoft.com, ibm.com 10

Product Proliferation We are spoilt for choice in the marketplace 11

12 Source: Product Logos of Big Data Companies

Aggregate Oriented DB Enterprise Data Warehouse In-Memory Stores Hadoop Source: Product Logos of Big Data Companies 13

ENTERPRISE DATA WAREHOUSES Requires referential integrity and structured data - lack of flexibility and agility Analytics and aggregation using OLAP Shared-nothing MPP Architecture enable massive scale out architecture Volume Variety Ingestion Velocity Processing Velocity Analytics Complexity Best suited for Analytics using structured data Key considerations include Data Quality/Governance, structuring data, segmenting analytics workloads 14

HADOOP Volume Variety Ingestion Velocity Processing Velocity Analytics Complexity Java-based open-source framework Hadoop Core MapReduce and HDFS Structuring delayed until analytics performed Flexibility as business grows/evolves Flexibility to build complex algorithms/models for analytics purposes Only option for Petabyte Range Best suited for batch-oriented analytics Works best when it s possible to design analytics algorithms as scatter-gather Key considerations: HDFS- file size, map-reduce algorithm., sequential file processing, data distribution 15

IN-MEMORY STORES Volume Variety Ingestion Velocity Processing Velocity Analytics Complexity Maintains data in-memory and SSD Leverages shared-nothing architecture to provide scalability In memory Databases (IMDB) row or column oriented schema In-memory Data grids (IMDG) keyvalue and de-normalized IMDB: Best suited for real-time analytics on structured data. Used for specialized data marts as well as for OLTP needs Key considerations: Data organization, parallel query IMDG: Suited for fast key-based data access patterns or processing. Key considerations: data distribution, keydefinition, data-process co-location 16

AGGREGATE ORIENTED DB Volume Variety Ingestion Velocity Processing Velocity Analytics Complexity Highly scalable and available distributed data-stores De-normalized data structures, data organised as Aggregates. Data saved as key-value, documents or columns Enable faster read/writes on aggregates Best suited for analytics on semistructured data where access patterns that can be bound in a key Key considerations: data distribution, aggregate structure, key-definition, data-process co-location 17

Product Category Comparison Volume Variety Ingestion Velocity Processing Velocity Analytics Complexity Enterprise Data Warehouse Hadoop In-Memory Stores Aggregate- Oriented DB Specific product selection will depend on an assessment of data and analytics requirements 18

ADVANCED PHYSICAL PORTFOLIO OPTIMIZATION in te gra tion cov er age pre vis i bil i ty Aditya Gandhi COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 19

CHALLENGE Making the next buck is harder Constantly changing environment Decisions are narrow or historical COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 20

CHALLENGE Vast but un-captured information Increasing volume / complexity Coarse-grained operations COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 21

CONCEPT Toolset like a chess simulator Takes in current state of the board Provides best actions to take COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 22

Framework Markets Price forecasts Forward Curves Volatilities Costs and Tariffs Asset Characteristics Commodity In Commodity Out Transport Storage Processing Plants Beginning positions Storage Inv In transit Inventory Exch Imbalance Optimization User Actions TARGET TRANSACTIONS: Mkt Optimization formulates the optimal shape of transactions based on target portfolio and beg positions EXECUTED TRANSACTIONS: Exogenous and endogenous constraints and factors cause deviation from plan COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 23

Retail Analytics Pankaj Jain COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 24

Aspects of Retail Analytics Market Basket Analytics Credit and Loyalty Card Analytics Shopper Insight Store Location Data Geo Demographics Category Segmentation Product Affinity Brand Knowledge Lifestyle and Life Stage Segmentation Brand Awareness Impulse Shopping Sociology Income/Education Infrastructure Customer Segmentation Loyalty Store Location Store Size Store Format Competitive Analysis COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 25

Retail Analytics Business Problems When will customer visit the store next? How much money will customer spend during the next visit? How many customers are price sensitive? How can I find gaps in the product range? How do I balance my product range across store formats? Do my shoppers buy across range? What should be delisted to introduce new product COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 26

Analytics Lifecycle Insight Summarized Data Template Reports Rapid Analysis POS & Other Data Poor Structure Volume Inconsistent Organized Data Volume Segmented Continuously Improved Processed Data Segmentation Complex Algorithm Attributes Enrich COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 27

Business Outcome Big Data Small Insights Effective Promotions and Communication o Over 8% increase in Steadfast customer and 5% more sales o Over 80% acceptance of offers o Over a million $ growth in the category Over 60% growth in the range with higher repeat sales and new customers due to Range analysis. Addition of three new aerated drinks increased the sales of that category by 12%. Overall higher consistent business growth. COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 28

Conclusion Big data has more dimensions than just "Big" Lifecycle is critical Choose your product and platform wisely Big data analytics is lot more insightful than just analytics o Big Data Small Insights o Ask the right question Ramp up your college statistics and mathematics! COPYRIGHT 2011 SAPIENT CORPORATION CONFIDENTIAL 29

Thank You! 30