DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY



Similar documents
Big Data and Your Data Warehouse Philip Russom

Bringing Big Data into the Enterprise

Achieving Business Value through Big Data Analytics Philip Russom

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Big Data Explained. An introduction to Big Data Science.

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Evolving Data Warehouse Architectures

Big Data on Microsoft Platform

PDF PREVIEW EMERGING TECHNOLOGIES. Applying Technologies for Social Media Data Analysis

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Are You Ready for Big Data?

The Future of Data Management

Integrating a Big Data Platform into Government:

Are You Ready for Big Data?

How To Use Big Data For Business

Luncheon Webinar Series May 13, 2013

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

A New Era Of Analytic

Ganzheitliches Datenmanagement

Sunnie Chung. Cleveland State University

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Architecting for the Internet of Things & Big Data

Parallel Data Warehouse

Transforming the Telecoms Business using Big Data and Analytics

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data Zurich, November 23. September 2011

Data Warehouse design

Getting Started Practical Input For Your Roadmap

How To Turn Big Data Into An Insight

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

How to Enhance Traditional BI Architecture to Leverage Big Data

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

In-Database Analytics

Business Intelligence for Big Data

Big Impacts from Big Data UNION SQUARE ADVISORS LLC

The big data business model: opportunity and key success factors

Outline. What is Big data and where they come from? How we deal with Big data?

Using Data Mining and Machine Learning in Retail

How To Create A Data Science System

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

The Future of Data Management with Hadoop and the Enterprise Data Hub

USING BIG DATA FOR INTELLIGENT BUSINESSES

Taking Data Analytics to the Next Level

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Bringing the Power of SAS to Hadoop. White Paper

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

Navigating Big Data business analytics

This Symposium brought to you by

Hexaware E-book on Predictive Analytics

Big Data Technologies Compared June 2014

Choosing The Right Big Data Tools For The Job A Polyglot Approach

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

The Internet of Things and Big Data: Intro

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Agile Business Intelligence Data Lake Architecture

How To Handle Big Data With A Data Scientist

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

DEMYSTIFYING BIG DATA. What it is, what it isn t, and what it can do for you.

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Big Data Analytics Nokia

VIEWPOINT. High Performance Analytics. Industry Context and Trends

The Future of Business Analytics is Now! 2013 IBM Corporation

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Francois Ajenstat, Tableau Stephanie McReynolds, Aster Data Steve e Wooledge, Aster Data

Il mondo dei DB Cambia : Tecnologie e opportunita`

Customized Report- Big Data

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

ANALYTICS CENTER LEARNING PROGRAM

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

COMP9321 Web Application Engineering

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Big Data and Healthcare Payers WHITE PAPER

Navigating the Big Data infrastructure layer Helena Schwenk

A Professional Big Data Master s Program to train Computational Specialists

Using Tableau Software with Hortonworks Data Platform

Information Architecture

Tap into Hadoop and Other No SQL Sources

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Big Data and Analytics: Challenges and Opportunities

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

INVESTOR PRESENTATION. First Quarter 2014

Testing Big data is one of the biggest

Next-Generation Cloud Analytics with Amazon Redshift

Big Data Defined Introducing DataStack 3.0

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

So What s the Big Deal?

Comprehensive Analytics on the Hortonworks Data Platform

HDP Hadoop From concept to deployment.

Big Data and Data Science: Behind the Buzz Words

Transcription:

Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

Definition Types of data Structured Semi-structured Unstructured Why Big Data is important Sources of Big Data Levels of Big Data Use cases for Big Data Big Data analytics Data mining Predictive analysis NOSQL and Big Data Landscape The new business intelligence architecture How to prepare for Big Data Pitfalls Conclusions Agenda InfoModel, LLC. 2013 2

Big Data Definition Big data consists of high-volume, high-velocity, high-variety and high value data and processes that demand cost-effective, innovative forms of information processing for enhanced insight and decision making Source: modified from Gartner Glossary InfoModel, LLC. 2013 3

Big Data in the Past A decade ago, Big Data was: A scalability problem A performance problem Added to that was the difficulty of making sense of it That is where today s Big Data and Big Data Analytics come into play InfoModel, LLC. 2013 4

Why Now? Why can we achieve this now? The four-minute mile syndrome Nobody could do it till Roger Bannister did it Now lots of us can do it (!) Before we didn t have: The hardware technology The software systems The data management systems The thought processes InfoModel, LLC. 2013 5

Sources of Big Data Streaming data (e.g., stock market) Video archives Large-scale e-commerce Social and professional networks Internet text and documents Internet search indexing Call detail records Web logs RFID Medical records Sensor networks Social networks Military surveillance Astronomy Video and music archives Atmospheric science Genomics, biogeochemical Biological & other complex data Interdisciplinary scientific research InfoModel, LLC. 2013 6

Big Data Support Technologies The emergence of commodity servers NOSQL file management systems and Hadoop Inverted column databases In memory databases and analytics Convergence of machine learning and data mining Management of structured and unstructured content Support of Hadoop, Map:Reduce by major RDBMS vendors InfoModel, LLC. 2013 7

Types of Data Structured having a fixed and external structure (external to the data structure itself) Semi-structured having a structure imbedded in the data. Instances contain data values and metadata. The structure may vary but still needs to be planned and modeled Unstructured having no known structure. Often transformed to structured or semi-structured data for processing. The structure may vary but still needs to be planned and modeled InfoModel, LLC. 2013 8

Big Data and Business Analytics These levels affect data structure, data access and scale Level 4 Roll Your Own Level 2 Level 3 Unknown, unstructured or semistructured. Processed directly or at small to large scale Unknown, unstructured. Transformed to structured. Processed at small to large scale Level 1 Known, structured. Processed at small to large scale. Adapted from McKinsey InfoModel, LLC. 2013 9

Sample Use Cases by Data Level Level 1: Pricing: targeted price setting Campaign lead generation Customer experience Pricing based on customer value Level 2: Impact of marketing on sales Market basket to determine risk Next product to buy (NPTB) Cross channel integration Level 3: Fraud prevention Discount targeting based on location, likelihood-to-leave, web analytics Level 4: Targeted advertising, right landing page Pricing and targeted advertising, right price and landing page Credit line management Adapted from McKinsey InfoModel, LLC. 2013 10

Business Analytics Solutions used to build analytical, historical models and simulations to create scenarios, understand current status and predict future states Business analytics includes: Data mining, predictive analytics, applied analytics and statistics, and is delivered as an application suitable for a business user. Big Data Analytics is the convergence of Big Data and Business Analytics Big Data Big Data Analytics Business Analytics Without Big Data Analytics, big data is just a lot of data InfoModel, LLC. 2013 11

Should You Kill Your Data Warehouse? See Forbes, 8/24/2011 [Maybe don t see!] InfoModel, LLC. 2013 12

Try This Query Try this query on semi-structured or unstructured data on NOSQL or other multi-structured data environment Give me a breakdown of sales revenue and volume by household by month, order it by the org unit that sold the product and the org unit that owned the product, summarize it from product type, to product subgroup and product group, for the last 5 years In a DW containing this data, this query can be run efficiently and fairly easily coded in SQL How do you do this on enormous quantities of semi-structured or unstructured data using existing technologies? InfoModel, LLC. 2013 13

Forms of Analytics Traditional BI and OLAP Will stay Consumers already use these Consumers will add to them Traditional BI and OLAP Well known and required. Works well with most EDWs. Many levels and styles of BI Big Data Analytics Discovery oriented Shows value in Big Data Can leverage new platforms: e.g., Analytics DB Undergoing strong acceptance by consumer Advanced SQL Well-known SQL-based tools/ techniques. Can result in long, complex SQL statements to gather, aggregate and model data Predictive Analytics Data mining/statistics to understand the past and predict future events. Requires special tools and rock stars. New Analytic Methods/Tools Visualization, artificial intelligence, natural language processing. Analytical DB functions: inverted column DBs, DW appliances, MapReduce, etc. InfoModel, LLC. 2013 14

Data Mining The use of mathematical algorithms to find hidden relationships in the data It can be used to: Find rules or approaches that worked well in the past Identify dependencies or relationships between things Segment or classify customers based on how well they match something you care about Group and cluster things that are similar to each other Spot and identify anomalies buried in the data Text Source: James Taylor InfoModel, LLC. 2013 15

Techniques Used to Mine the Data Just as the popularity of new tools is exploding, so are the capabilities in data mining Data-mining techniques fall into four major categories: Classification such as targeted marketing Association such as market basket analysis Sequencing those who bought this bought that Clustering developing conclusions using space and distance NOTE: In Hadoop, querying and mining can be done through Hive, Mahout and Pig InfoModel, LLC. 2013 16

Predictive Analytics Applys mathematical techniques to historical data to build a future analytic model. It predicts: How likely something is to be true Its likely value The likely sequence For instance, instead of: Finding dependencies true in historical data, find dependencies likely to be true in the future Grouping customers based on historical similarities, group them on likelihood that they will behave similarly in the future Some regard data mining as the first step in predictive analytics Some use the terms synonymously Source: James Taylor InfoModel, LLC. 2013 17

Uses of Predictive Analysis Its major uses are to: Improve efficiency Reduce risk Increase profitability Examples: First case: Professional sports Moneyball Who should guard LeBron? What are individual players really worth to the team? Second case: banking Customers are using a new free business checking system for personal checks as well, increasing the cost of those accounts Will it be more profitable to pay them to leave Third case: 7% of customers account for 43% of revenue What should we offer them? InfoModel, LLC. 2013 18

Who Uses Big Data Data Scientists / Data Teams Knowledge workers BI consumers Decision makers at all levels of the business InfoModel, LLC. 2013 19

InfoModel, LLC. 2013 20

The Big Data Analytics Environment Big Data Analytics Complex analysis of structured data Analysis of irregularly structured data in Hadoop Social sentiment and social network analysis Big Data Enterprise Data Warehouse Environment Traditional Reporting and Analysis Traditional Data Warehouse Environment RDBMSs Appliance NOSQL HADOOP DW Mart Data Integration Real-time Analytics RYO data Streams Sensors Events Docs XML/JSON Files Cloud Tables OLAP Web Logs Consumers InfoModel, LLC. 2013 21

Means to Achieving Value in Big Data Create integrated, analytic sandboxes Use Hadoop is a complement to previous systems, not a replacement Derive data from Big Data as it is needed Less emphasis on pre-aggregation and pre-summarization As has been said since the opening days of client-server, send the function to the data Not the data to the function (as in some vertical DBMSs today) Learn to use parallel, distributed, commodity servers Use Big Data for staging and well as a live archive Virtualize Big Data For reuse across multiple analytical applications For easy access to the data when it is needed InfoModel, LLC. 2013 22

Preparing for Big Data (BD) Define the business objectives Big Data (BD) will yield business advantages But not without business involvement Understand and prepare the data for BD as in any environment It is just about slamming the data to some humongous staging area Data modeling is here to stay, but new methods are needed Costs and technology frustrations will increase But business advantages will go up as well Get the right staff Both BD and Analytics are new skills Organizations will need to hire, train, and learn accordingly Source the right suppliers and technology BD Analytics will be mainstream; not just for giant web firms Tools and platforms will improve so there will be less coding Plus improvement in scalability, performance, real time availability Expect Hadoop and other Big Data infrastructure to become common Hadoop will not replace anything Data Warehousing and BI will continue InfoModel, LLC. 2013 23

Pitfalls Potential pitfalls that can trip up organizations on big data analytics initiatives include: Absence of clear business purpose Jettisoning data management principles and practices Absence of internal analytics skills (you need rock stars) The high cost of hiring experienced analytics professionals High costs of the new infrastructure (hardware and software) Challenges in integrating Hadoop systems and data warehouses Selecting the right vendors who offer software connectors across and to Big Data technologies InfoModel, LLC. 2013 24

Conclusions Big Data must deliver Business Value That is the sine qua non of Big Data Analytics Reporting, analysis and OLAP will stay You also need discovery analytics, predictive analysis and data mining Plan your entry into big data and implement it in sensible increments Be clear up front on the business goals Select key sources (data from Web, other systems, social networks) You will have to make some upgrades: Add new BI/DW technologies Train your staff Change is inevitable Give the business what it needs Discovery analytics to understand change, find opportunities Broader, more complete views of all relevant entities (e.g., customer) Analytics targeting your industry and your organization s specific needs and unique collection of big data InfoModel, LLC. 2013 25