The Lab and The Factory



Similar documents
Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Big Data and Trusted Information

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Are You Ready for Big Data?

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for the Internet of Things & Big Data

Exploiting Data at Rest and Data in Motion with a Big Data Platform

The 4 Pillars of Technosoft s Big Data Practice

Understanding traffic flow

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Integrating a Big Data Platform into Government:

IBM Data Warehousing and Analytics Portfolio Summary

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

EMC ADVERTISING ANALYTICS SERVICE FOR MEDIA & ENTERTAINMENT

A Whole New World. Big Data Technologies Big Discovery Big Insights Endless Possibilities

Managing Data in Motion

RESEARCH REPORT. The State of Streaming Big Data Analytics: 2014 Survey Results

MDM and Data Warehousing Complement Each Other

Modern Data Warehouse

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Big Analytics: A Next Generation Roadmap

Embracing the Cloud, Mobile, Social & Big Data

Are You Ready for Big Data?

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

A Tipping Point for Automation in the Data Warehouse.

Architecting your Business for Big Data Your Bridge to a Modern Information Architecture

BIG DATA-AS-A-SERVICE

Ganzheitliches Datenmanagement

Getting Started Practical Input For Your Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

Big Data and Analytics in Government

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

How To Use Big Data For Business

The Analytics Value Chain Key to Delivering Value in IoT

Big Data Executive Survey

How the oil and gas industry can gain value from Big Data?

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Big Data Integration: A Buyer's Guide

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Big Data Analytics Nokia

Real Time Big Data Processing

The 3 questions to ask yourself about BIG DATA

Traditional BI vs. Business Data Lake A comparison

Azure Data Lake Analytics

The Enterprise Data Hub and The Modern Information Architecture

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

With the Emergence of Big Data, Where do Relational Technologies Fit? Donna Burbank President, DAMA Rocky Mountain Chapter

Industry Impact of Big Data in the Cloud: An IBM Perspective

Oracle Big Data Discovery Unlock Potential in Big Data Reservoir

Getting Real Real Time Data Integration Patterns and Architectures

CONNECTING DATA WITH BUSINESS

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

White Paper. How Streaming Data Analytics Enables Real-Time Decisions

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Parallel Data Warehouse

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Data Analytics Roadmap Energy Industry

IoT Analytics: Four Key Essentials and Four Target Industries

A Strategic Approach to Unlock the Opportunities from Big Data

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

CIO Roundtable - Big Data

Blueprints for Big Data Success

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Big Data and Your Data Warehouse Philip Russom

SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON

ANALYTICS BUILT FOR INTERNET OF THINGS

BIG DATA THE NEW OPPORTUNITY

Unified Batch & Stream Processing Platform

Luncheon Webinar Series May 13, 2013

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

How telcos can benefit from streaming big data analytics

YOU VS THE SENSORS. Six Requirements for Visualizing the Internet of Things. Dan Potter Chief Marketing Officer, Datawatch Corporation

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Technology Enablement

Big Data Are You Ready? Thomas Kyte

Transcription:

The Lab and The Factory Architecting for Big Data Management April Reeve DAMA Wisconsin March 11 2014 1

A good speech should be like a woman's skirt: long enough to cover the subject and short enough to create interest. Winston Churchill 2

April Reeve Twenty five years doing data oriented stuff Data Management disciplines Data Integration, Data Governance, Data Modeling, Data Quality, Business Intelligence, Master Data Management, Data Conversion, Data Warehousing, Enterprise Content Management, Big Data Management Currently implementing Data Governance programs and developing Big Data Strategies for Life Sciences and Financial Services organizations Certifications Certified Data Management Professional (DAMA) Certified Data Governance and Stewardship Professional (DGSP) Certified Business Intelligence Professional (CBIP) Certified in Enterprise Governance of IT (ISACA) Certified Information Systems Auditor (ISACA) Masters degree in Financial Management (financial risk management, derivatives pricing, corporate finance) Book Managing Data in Motion Data Integration Best Practice Techniques and Technologies 3

Agenda Big Data The Data Scientist environment for predictive analytics the Lab Operationalizing predictions the Factory How does it fit with legacy data management architecture? 4

Analytics Maturity From Data to Information on Demand 5

More than just about data volume, smart big data strategies also consider the velocity, variety, and complexity of information New insights on customers, products, and operations Velocity Volume Contextual and location-aware delivery to any device Variety Complexity Documents Transactional Data Smart Grid Images Audio Text Video Volume: data volumes approaching multiple petabytes Velocity: data being generated and ingested for analysis in real-time Variety: tabular, documents, e-mail, metering, network, video, image, audio Complexity: different standards, domain rules, and storage formats per data type Gartner March 2011 6

Big Data Goal: More, Faster, Better Data for Purpose Area Latency Enrichment Query Purpose Analytics Result Revolution No time to read. In-memory is the new DB Tagging is the new Transformation Federated Query is the new ETL Purposeful View is the new Master Predictive is the new Reactive Trigger Action is the new Decision Support 7

Predictive Analytics The Data Scientist chooses Internal and External data (lots of it!) and throws into an Analytical Sandbox The Data Scientist identifies patterns in the data and develops predictive models of behavior involving combining historical information concerning a customer and real time data flows 8

What is Data Science? Data Science refers to the scientific method: The scientist (Data Scientist) develops a hypothesis (model of behavior) Using a large amount of historical data and statistical analysis, the Data Scientist attempts to prove that the model is accurate for predicting behavior 9

Leveraging Big Data for Action Predictive Analytics 10

Leveraging Big Data for Action The organization develops software which populates models using historical customer information and installs into the operational reporting environment Real time processing combines customer information with a real time data stream, which can trigger automatic processes and alerts 11

Leveraging Big Data for Action Streaming Data / Extreme Transaction Processing 12

Big Data Analytics Architecture In Big Data management we need: A Lab or Sandbox environment that is very dynamic and can be used by the Data Scientists to throw in or throw away massive amounts of structured and unstructured data against which to do analysis, find patterns and insights, and develop models An operational Information Factory with all the good production processes we ve learned around data access security and high volume efficiency to produce insight and trigger action on an on-going basis. This Factory also needs to be able to process structured data, unstructured data, and data streams, thus requiring a Big Data architecture that will include, among other things: relational and NoSQL databases, unstructured data stores, and in-memory databases, as well as the ability to process and trigger action. 13

New Data Hubs The Analytical Sandbox & NoSQL Data Stores Structured BI Reporting Environment ETL DW ALL data fed into Hadoop Data Store Hadoop Data Store Data Preparation and Enrichment Exploratory Analytic Environment Analytic Sandbox 14

Data Latency Spectrum Use Case Time Interval Ultra low latency messaging < 100 microseconds Extreme transaction processing < 1 millisecond Streaming data analysis; no intermediate persistence < 100 milliseconds Real time event characterization < 1 second Complex event processing; near real-time dashboards < 30 seconds Operational dashboard < 5 minutes Intraday analysis < 2 hours Daily rollup 24 hours Recent historical analysis 8 days Medium-term historical analysis 13 months Long-term historical analysis 5 years 15

Considerations in Organizing People The Lab In their search for new insights, data scientists write enormous quantities of code. But it is not designed to meet commercial standards for scalability, security, and stability. You create and support commercial-grade code in the factory. The Factory The [Factory] requires many more people with a wider variety of skill sets, a more rigid environment, and different sorts of metrics. To be clear, creativity and experimentation are important in the factory, but you must not expect more than incremental thinking and production-oriented solutions. From Article by Thomas C. Redman and Bill Sweeney in Harvard Business Review 16

Big Data Analytics Architecture 17

Contact Information April Reeve EMC Consulting Enterprise Information Management Practice April.Reeve@emc.com +1 (201) 396-1831 @Datagrrl on Twitter Blog - http://infocus.emc.com/april_reeve/ Book - Managing Data in Motion Data Integration Best Practice Techniques and Technologies 18

THANK YOU 19