Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com



Similar documents
UNIFY YOUR (BIG) DATA

Investor Presentation. Second Quarter 2015

ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Understanding Data Warehouse Needs Session #1568 Trends, Issues and Capabilities

INVESTOR PRESENTATION. Third Quarter 2014

INVESTOR PRESENTATION. First Quarter 2014

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

IBM Data Warehousing and Analytics Portfolio Summary

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

IST722 Data Warehousing

Data Refinery with Big Data Aspects

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA What it is and how to use?

Data Warehouse: Introduction

Advanced In-Database Analytics

The Future of Data Management

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Teradata Unified Big Data Architecture

Data Warehousing Systems: Foundations and Architectures

Evolution to Revolution: Big Data 2.0

Big Data and Your Data Warehouse Philip Russom

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

An Oracle White Paper October Oracle: Big Data for the Enterprise

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Introducing Oracle Exalytics In-Memory Machine

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Teradata s Big Data Technology Strategy & Roadmap

Using Tableau Software with Hortonworks Data Platform

How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

Harnessing the power of advanced analytics with IBM Netezza

W H I T E P A P E R B u s i n e s s I n t e l l i g e n c e S o lutions from the Microsoft and Teradata Partnership

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

A Study on Integrating Business Intelligence into E-Business

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Part 22. Data Warehousing

Transforming the Telecoms Business using Big Data and Analytics

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Challenges for Data Driven Systems

Il mondo dei DB Cambia : Tecnologie e opportunita`

Strategies For Setting Up Your Organisation For Success With Big Data. Kevin Long Business Development Director Teradata

OLAP & DATA MINING CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Big Data Can Drive the Business and IT to Evolve and Adapt

NEWLY EMERGING BEST PRACTICES FOR BIG DATA

An Oracle White Paper June Oracle: Big Data for the Enterprise

A very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect

SAP Database Strategy Overview. Uwe Grigoleit September 2013

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Analytic Platforms: Beyond the Traditional Data Warehouse

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

The Big Data Paradigm Shift. Insight Through Automation

DATA WAREHOUSING AND OLAP TECHNOLOGY

Big Data Explained. An introduction to Big Data Science.

EVERYTHING THAT MATTERS IN ADVANCED ANALYTICS

Are You Ready for Big Data?

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

Big Data Integration: A Buyer's Guide

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

The Future of Data Management with Hadoop and the Enterprise Data Hub

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

Data Warehousing and OLAP Technology for Knowledge Discovery

How To Handle Big Data With A Data Scientist

An Oracle White Paper September Oracle: Big Data for the Enterprise

Hur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER

HDP Hadoop From concept to deployment.

Week 13: Data Warehousing. Warehousing

MDM and Data Warehousing Complement Each Other

Big Data and Analytics 21 A Technical Perspective Abhishek Bhattacharya, Aditya Gandhi and Pankaj Jain November 2012

Parallel Data Warehouse

ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION

Intro to Big Data and Business Intelligence

NoSQL for SQL Professionals William McKnight

Methods and Technologies for Business Process Monitoring

USING BIG DATA FOR INTELLIGENT BUSINESSES

Harnessing the Value of Big Data Analytics

Data Warehouse Optimization

Applied Business Intelligence. Iakovos Motakis, Ph.D. Director, DW & Decision Support Systems Intrasoft SA

Processing and Analyzing Streams. CDRs in Real Time

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

The Potential of Big Data in the Cloud. Juan Madera Technology Consultant

Sterling Business Intelligence

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Big Data and Your Data Warehouse Philip Russom

The Internet of Things and Big Data: Intro

The 3 questions to ask yourself about BIG DATA

Big Data Technologies Compared June 2014

VIEWPOINT. High Performance Analytics. Industry Context and Trends

OLAP Theory-English version

Big Data? Definition # 1: Big Data Definition Forrester Research

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

How To Scale Out Of A Nosql Database

A New Era Of Analytic

Transcription:

Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation bhashyam.ramesh@teradata.com

Trend Too much information is a storage issue, certainly, but too much information is also a massive analysis issue. Source: Gartner s Report Volume of Data Complexity Of Analysis Velocity of Data - Real-Time Analytics Variety of Data - Cross-Analytics

Reality: Massive Data Growth Complex, Unstructured Relational 2,500 exabytes of new information in 2012 with digital content as the primary driver Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 zettabytes this year Source: An IDC White Paper. As the Economy Contracts, the Digital Universe Expands. May 2009.

Big Data Structured and Unstructured > Data with structure and data with application imposed structure > SQL and static ERD; non SQL and dynamic ERD 10x 100x of today s data warehousing Projects to 5-50 EB (10 19 ) by 2015 Need to separate the useful from the useless Shared nothing parallel analysis

Strategic Opportunity Data is widely available; what is scarce is the ability to extract wisdom from it. Hal Varian, Chief Economist, Google The Unmet Need!

Complexity Model Complexity Query Complexity Concurrency

Data Silos Limit Business Value Limited Business Value Subject-specific questions Simple data model. Star schema, OLAP Many common tables necessary in each mart such as products, store, transactions Sales What is the sales by product Marketing What is the inventory for product X? 31 Sales 29 Inventory Business Value IT Development Duplicated Data

Integrated Data Enables Superior Value Differentiated Business Value Combining the environments requires only incremental work for each new subject area Complex data model. 100s of entities and relations - snowflake Enables new cross-functional insights that can t be achieved with separate data marts; new differentiated decisions 70 Combined Sales and Inventory Which product sales can be increased by 20% in what stores? Cannot answer questions about supply chain capability or about Marketing s projections 31 Sales 29 Inventory Business Value IT Development Shared Data

Integrated Data Enables Superior Value Differentiated Business Value Very complex data model. Thousands of entities and relations. Spans across all subject areas. Very large tables. Needed for complex questions Combined Sales, Inventory, Manufacturing, Supply chain Can manufacturing support sales projections 189 70 31 29 43 43 64 Business Value IT Development Sales InventoryMarketingMarketing Financial

Integrated Data Enables Superior Value Differentiated Business Value Combined Sales, Inventory, Manufacturing, Supply chain, Marketing, Financial What is the profitability of a customer? 277 189 31 29 43 64 64 Business Value IT Development Sales InventoryCustomerFinancial Financial

Complexity - Query and Analytics Complex query plans > 128 way joins > Enormously large set of solution possibilities Comprehensive query operations > Joins > Aggregations > OLAP (rank, window analytics) > Time Series Analysis Scalable No Fat-Node bottleneck Millions of queries Data loading throughput and latency

Workload Management Workload Priorities Actual Server Utilization H R H R M L L M R Real time 47% H Tactical 45% M Loads 5% L DSS Queries 1%

Five Stages of Analytic Evolution ACTIVATING MAKE it happen! Data Volume OPERATIONALIZING WHAT IS happening? PREDICTING WHAT WILL happen? Event-based triggering REPORTING WHAT happened? ANALYZING WHY did it happen? Analytical modeling grows Continuous update and time-sensitive queries become important Increase in ad hoc analysis Primarily batch and some ad hoc reports Batch Ad Hoc Analytics Event driven Continuous updates, tactical queries Workload Sophistication Complexity (schema, workload), concurrency, availability, scope

Extreme Problem Segmentation Data Volumes 100x Scalable Data Mart Low Low Source: IDC; Gartner; Teradata Analysis General Purpose Data Mart Enterprise Data Warehouse Complexity Number of users Data Integration Mixed Workload Availability AEDW Extreme

Debt<10% of Income Yes Good Credit Risks Yes NO Income>$40K Bad Credit Risks NO NO Debt=0% Yes Good Credit Risks In-Database Data Mining Optimization More Models and More Business Value Desktop and Server Analytic Architecture Sample Data Results In-Teradata Analytic Architecture Results Enabling Mining In-database Database Processing from Hours to Minutes Data Mining Process from Days to Hours

OLAP Architectures Various Amounts of Data Being Moved Exec utive Power Users Mark eting Multi-Dimensional View Analyst Physical Cube (MOLAP) 10110111011 01010011101 10110111011 10110111011 01010011101 10110111011 01010011101 1101 1001 0101 1011 1 0 0 1 Hybrid Approach (HOLAP) 10110111011 01010011101 10110111011 Data Warehouse OLAP Engine (ROLAP) CUSTOMER CUSTOMER NUMBER CUSTOMER NAME CUSTOMER CITY CUSTOMER POST CUSTOMER ST CUSTOMER ADDR CUSTOMER PHONE CUSTOMER FAX ORDER ORDER NUMBER ORDER DATE STATUS ORDER ITEM SHIPPED QUANTITY SHIP DATE ORDER ITEM BACKORDERED QUANTITY ITEM ITEM NUMBER QUANTITY DESCRIPTION

OLAP Optimization Results Landline Communications Provider 38 dimensions/24 measures with 5 years of history Maintenance: 13 hours to 3 minutes Cube size: 22.4 GB to <10GB Detail: Month to Daily 300 > Add 39 th dimension: Wire Center Response Comparison: OLAP Server In-Database Processing 250 200 150 100 50 0 5 Canned 5 Canned with Wire Center 6 Interactive 6 Interactive with Wire Center

Reality: New Machine-Generated Data Non-relational and relational data outside of the EDW Data Types Outside of the Enterprise Data Warehouse 53% of Companies Struggle Analyzing Data Types Not in the Traditional Data Warehouse Source: Analytics Platforms Beyond the Traditional Data Warehouse, Survey of 223 companies. BeyeNetwork 2010

Reality: Advanced Iterative Analytics New investigations require both standard SQL and MapReduce Analytics on non-relational, multi-structured, machinegenerated data Analytics that need to scale to big data sizes Analytics that require reorganization of data into new data structures graph, time & path analysis Analytics that require fast, adaptive iteration A new generation of data scientists require support for new analytic processes including Python, R, C, C++, Java & SQL. In our survey 53% of respondents said they perform business analysis on data not contained within an RDBMS. Nearly two-thirds of them were using hand coded programs. - Colin White & Merv Adrian, Analytics Platforms: Beyond the Traditional Data Warehouse. BeyeNetwork 2010

LinkedIn World s Largest Professional Networking Website 100+ million members across 200 countries A new member joins LinkedIn every second and 50% of members are outside the U.S. Executives from all Fortune 500 companies are LinkedIn members. LinkedIn s products critically dependent on analytic-intensive algorithms for traversing the social graph, user-profile analysis

Unstructured Data Unstructured Schema Multi-Structured Data Store as objects of any kind: > Key-value pair (hash table) > Serialized objects > Graph databases > Document files > Blobs > BigTable (GFS)

Need for SQL-MapReduce Combination Business Question Determine the product, user and amount of time in which individuals 1.View an advertisement 2.Possibly view other pages or advertisements before buying the product advertised 3.Purchase that original product for which they saw the advertisement Analytics Question Events exist in multiple rows in the database, for each user How do we attribute a purchase back to a specific ad within a 30 day period?

Manage & Analyze Multi-structured Data Bind the structure to the data at runtime Long strings of encoded page clicks, sessions, and actions Entry points to a website tracked by cookie strings Aster Data e.g. One stock order split into 100s of transactions over days/weeks Social connections and influencers indicated by communication flow New Big Data Raw formats: Lengthy text strings, binary, blobs, social graphs Rapid updates, data refreshes: Online click stream, stock orders, social connections/friends Wide tables with highly descriptive textual strings e.g. ACH transactions, Service/Customer Support records, insurance claims High volume: Embedded processing to eliminate data movement

Multi-structured, Big Data Analytics Example: Pattern Matching Analysis Discover patterns in rows of sequential data Weblogs Smart Meters {user, page, time} Click 1 Click 2 Click 3 Click 4 {device, value, time} Reading 1 Reading 2 Reading 3 Reading 4 {user, product, time} Sales Transactions Purchase 1 Purchase 2 Purchase 3 Purchase 4 {stock, price, time} Stock Tick Data Tick 1 Tick 2 Tick 3 Tick 4 {user, number, time} Call Data Records Call 1 Call 2 Call 3 Call 4 Call 4 SQL and MapReduce Approach Single-pass of data Linked list sequential analysis Gap recognition Traditional SQL Approach Full Table Scans Self-Joins for sequencing Limited operators for ordered data ebiz, Media/Ent Telecomm Financial Government >Click stream Analysis >Lifecycle Marketing >Revenue Attribution >Calling Patterns >Signal Processing >Forecasting >Trade Sequences >Pairs Trading >Fraud Detection >Pattern Detection >Fuzzy Matching >Inference Analysis Example: Graph Analysis Discover links and degree of influence between objects SQL and MapReduce Approach Single-pass of data Looping through all nodes Direct Relationship Social Link Person Indirect Relationship Traditional SQL Approach Full Table Scans Self-Joins for all possible paths ebiz, Media/Ent Telecomm Financial Government >Social Media >Crowd Sourcing >Viral Delivery >Ad optimization >Influencers >Calling Groups >Churn Detection >Predictive Modeling >Social Pairing >Fund Movement >Stress Triggers >Churn Detection >Follow the Money >Collusion Detection >Pattern Matching >Network Analysis

Other Applications Dependency Analysis Traffic Analysis and Optimization Task Optimization Clustering Graph Mining Scheduling Routing Logistics Shortest Path Location Based Services Semantic Web.

Different Analytics For Different Types of Data Strategic & Operational Intelligence Big Data Insight Ad Hoc /OLAP Predictive Analytics Spatial/ Temporal Active Execution Pattern Analysis Path Analysis Graph Analysis SQL Analytics SQL-MapReduce Analytics Structure Active Enterprise Data Warehouse Data Analytic Platform Multi-Structure CRM SCM ERP Trans 3 rd Party Web logs Text Social media Machine data

Teradata Analytical Ecosystem Overview Flexible, Integrated Analytics 2650 1650 6XXX/5XXX 560 2650 6XXX/5XXX Aster Data

Summary Analytics is a competitive differentiator Big Issue Systems must be able to manage different workloads Mine relationship that exist in multi-structured data