The Data Reservoir: Architecture, Best Practices and Governance

Similar documents
The Data Reservoir as an enabler of differentiating Analytics initiatives

The Data Reservoir. 10 th September Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Informa4on Solu4ons

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Ganzheitliches Datenmanagement

Exploiting Data at Rest and Data in Motion with a Big Data Platform

The Future of Business Analytics is Now! 2013 IBM Corporation

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Industry Impact of Big Data in the Cloud: An IBM Perspective

How the oil and gas industry can gain value from Big Data?

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Data Governance for Regulated Industries

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Big Data & Analytics. The. Deal. About. Jacob Büchler jbuechler@dk.ibm.com Cand. Polit. IBM Denmark, Solution Exec IBM Corporation

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Customer Cloud Architecture for Big Data and Analytics, Version 1.1

The Enterprise Data Hub and The Modern Information Architecture

Integrating a Big Data Platform into Government:

Data Wrangling: From the Wild to the Lake

Customer Cloud Architecture for Big Data and Analytics

BIG DATA STRATEGY. Rama Kattunga Chair at American institute of Big Data Professionals. Building Big Data Strategy For Your Organization

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Demystifying Big Data Government Agencies & The Big Data Phenomenon

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Smarter Analytics. Barbara Cain. Driving Value from Big Data

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Are You Big Data Ready?

Deploying Big Data to the Cloud: Roadmap for Success

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

How to avoid building a data swamp

IRMAC SAS INFORMATION MANAGEMENT, TRANSFORMING AN ANALYTICS CULTURE. Copyright 2012, SAS Institute Inc. All rights reserved.

How To Create A Data Science System

Accelerate your Big Data Strategy. Execute faster with Capgemini and Cloudera s Enterprise Data Hub Accelerator

Big Data Analytics Nokia

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

Transforming Industries with Data & Analytics

Transitioning to a Data Driven Enterprise - What is A Data Strategy and Why Do You Need One?

How to Run a Successful Big Data POC in 6 Weeks

Dansk IT Big Data i de største danske banker

IBM Software Enabling business agility through real-time process visibility

IBM 2010 校 园 蓝 色 加 油 站 之. 商 业 流 程 分 析 与 优 化 - Business Process Management and Optimization. Please input BU name. Hua Cheng chenghua@cn.ibm.

Turn your information into a competitive advantage

IBM Big Data in Government

Getting Started Practical Input For Your Roadmap

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Big Data Use Case Deep Dive 5 Game Changing Use Cases for Big Data

IBM Data Warehousing and Analytics Portfolio Summary

Unleashing the Power of Business Intelligence

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

How to Leverage Big Data in the Cloud to Gain Competitive Advantage

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Luncheon Webinar Series May 13, 2013

How To Use Big Data For Business

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Business Data Authority: A data organization for strategic advantage

IBM Software Delivering trusted information for the modern data warehouse

HDP Hadoop From concept to deployment.

This Symposium brought to you by

JOURNAL OF OBJECT TECHNOLOGY

BIG DATA WITHIN THE LARGE ENTERPRISE 9/19/2013. Navigating Implementation and Governance

End Small Thinking about Big Data

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

Business Intelligence and Analytics: Leveraging Information for Value Creation and Competitive Advantage

Data Integration Checklist

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Patterns of Information Management

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Data lisää älyä tiedosta

Building a Data Quality Scorecard for Operational Data Governance

Making Sense of Big Data in Insurance

IBM BPM Solutions Addressing the Enterprise Business Process Management

MDM and Data Warehousing Complement Each Other

IT Workload Automation: Control Big Data Management Costs with Cisco Tidal Enterprise Scheduler

Big Data for Banking. Kaleem Chaudhry Senior Director, Sales Consulting, ASEAN. Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Business Intelligence for the Chief Data Officer

Safe Harbor Statement

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

The Future of Data Management

DATA VISUALIZATION: CONVERTING INFORMATION TO DECISIONS DAVID FRONING, PRINCIPAL PRODUCT MANAGER

The Big Data Revolution: welcome to the Cognitive Era.

ENTERPRISE BI AND DATA DISCOVERY, FINALLY

Bruhati Technologies. About us. ISO 9001:2008 certified. Technology fit for Business

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Implementation of Big Data and Analytics Projects with Big Data Discovery and BICS March 2015

Get Ready for Big Data with IBM System z

Traditional BI vs. Business Data Lake A comparison

Three Open Blueprints For Big Data Success

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Data Refinery with Big Data Aspects

Big Data and Big Data Governance

IBM Big Data Platform

April 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner.

The Future of Data Management with Hadoop and the Enterprise Data Hub

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

Cisco IT Hadoop Journey

Transcription:

The Reservoir: Architecture, Best Practices and Governance Jo Ramos, Distinguished Engineer, Chief Architect for Information Integration & Governance 2015 IBM Corporation

Agenda for Today How to become a data driven analytics organization Modern Architecture Considerations The Reservoir & Governance 1

Three major shifts in our industry is becoming the world s new natural resource Social, mobile and access to data are changing how individuals are understood and engaged The emergence of cloud is transforming IT and business processes into digital services 500 million DVDs worth of data is generated daily 80% of individuals are willing to trade their information for a personalized offering 85% of new software is being built for cloud 1 trillion connected objects and devices by 2015 84% of millennials say social and user-generated content has an influence on what they buy 25% of the world's applications will be available in the cloud by 2016 80% of the world s data is unstructured 5 minutes: response time users expect once they have contacted a company via social media 72% of developers say cloud-based services or APIs are central to the applications they are designing

Monetization Drivers volumes variety Real-time execution Cost of data storage BI & Analytics visibility Increasing value of data Underutilized resource

driven analytics platform capabilities move from information to insights and from reporting to action Cognitive: What did I learn, what s best? Real-Time Decision Management: What action should I take? What is the Next Best Action? High Prediction / Simulation: What will happen? Uses data to forecast based on complex algorithms or rules; forecasting of customers buying behavior. Technological complexity Evaluation: Why did it happen? Thoroughly analyzes data to support important decisions / understand root causes for unusual observations; evaluation of campaign responses to understand changes in customer behavior. Mining: Why did it happen? Looks for patterns in the data to explain a not yet understood observation; understand why certain customers show a certain behavioral pattern. Low Low Business value / impact High Monitoring: What is happening now?looks for trigger information in the data indicating need for action; monitoring live campaigns and making optimization decisions as necessary. Reporting: What happened? Slices and dices data to create transparency on campaign performance and financial or quality outcomes.

Multi-structured Mashups provide the Greatest Enterprise Value Systems of Record Structured data from operational systems 20% of all data generated Systems of Insight Diverse data types that combine structured and unstructured data for business insight Systems of Engagement that connects companies with their customers, partners and employees 80% of all data generated Structured Small Clearly formatted Quantitative Objective Logical Puzzle Repeatable linear Warehouses Transaction data ERP Electronic Health Records Mainframe OLTP System Advanced Analytics Context Accumulation Enterprise Integration Hadoop, Streams, Spark Audio Documents Images RFID Emails Sensors Social Video Unstructured Big Language based Qualitative Subjective Intuitive Mystery Exploratory dynamic Web Logs Traditional Sources New Sources

DATA DRIVEN ARCHITECTURE

A growing data demand and organizational tensions Lines of Business IT Organization Agility Access Freedom Any kinds of data Powerful Analysis & Visualization Knowledge Worker CDO Security Privacy Regulatory Compliance Standards Application Developer Scientists seeking data for new analytics models. Marketer seeking data for new campaigns. Fraud investigator seeking data to understand the details of suspicious activity.

Modern Architectures: What to aim for We need IT solutions that are like Legos Agile, flexible, adaptive, efficient Able to give fast responses to business needs Taking advantage of cloud, mobile, social, localization, Big IT should not be like pouring cement Rigid, hardcoded business processes Monolithic Expensive to change Out of sync with business needs

Modern Architecture What to aim for Simplification of the IT environment Eliminate redundancies Reduce cost Decouple systems Decouple transactions Fast, easy reuse Faster timeto-market Explore opportunities React to problems Generate insights from information in real time Use insights to improve customer experience, anticipate facts 9

Modern Architecture Enables better Customer Engagement Social Profiles Preferences Activities Sensors Geolocation Movement Events Internet of Things Transactional Customer info Transaction history Product rules Analytics Next Best Action Expert systems Future trends Big Processes Tasks & milestones Integration Monitoring & SLAs Smart Channels Mobile & Web SoLoMo Context-enriched Context-Aware Experiences Personalized services Real-time Intelligence Moments of Truth Opportunistic offers 10

Analytics Lifecycle Search & Survey (understand what data is available) Online Collaboration Workspace Analytics Discovery (application development) Deploy & Consume (application & workflow) IT - Ope rati ons

The Reservoir Built to extract value from the data. Managed, Trusted and Governed

Big Lakes or Swamps? As we bring data together, are we creating a data swamp? No one is sure of the origin or purity of data. No one can find the data they need. No one knows what data is present and if it is being adequately protected. How do we build trust in big data? Need trust both to share and to consume data. Need understanding of quality, origin and ownership of data. Need classification of data to govern and protect it. Need timely, reliable data feeds and results. All built on secure and reliable infrastructure.

IBM s Lake Lake Services Lake Repositories Information Management and Governance Fabric Lake IBM s Lake = Efficient Management, Governance, Protection and Access.

Users supported by the Lake Analytics Teams Information Curator Governance, Risk and Compliance Team Enterprise IT Systems of Record Systems of Engagement Lake Services Lake Repositories Line of Business Teams New Sources Systems of Automation Other Lakes Information Management and Governance Fabric Lake (System of Insight) Lake Operations

The Lake Subsystems (Services) Analytics Teams Information Curator Governance, Risk and Compliance Team Enterprise IT Systems of Record Self-Service Access Catalogue Systems of Engagement New Sources Enterprise IT Exchange Lake Repositories Self- Service Access Line of Business Teams Systems of Automation Other Lakes Information Management and Governance Fabric Lake (System of Insight) Lake Operations

Considerations for a well-managed and governed data lake 9 Access to raw data to develop new production analytics. 5 Curation of all data to define meaning and classifications Business-led 6 information governance and management 4 Effective interchange of data and insight with other systems. 2 Catalog of data, ownership, meaning and permitted usage 10 Moderated, viewbased self-service access to data and analytics for line of business. 3 No direct access to repositories 8 -centric Security 7 Active monitoring and management of data 1 Multiple repositories organized based on source and usage; hosted on appropriate data platforms for workload.

View from the user community - fraud Detect and prevent fraud Develop new fraud models Conform to regulations Investigate Fraud Case

repositories support multiple zones Raw Interaction Catalog Interfaces Descriptive Enterprise IT Interaction Deposited Historical Harvested Context Lake Repositories View-based Interaction Published Information Integration & Governance Lake

Big data needs a variety of repositories for cost, access and performance reasons Raw Interaction Catalog Interfaces Descriptive CATALOG SEARCH INDEX INFORMATION VIEWS Enterprise IT Interaction Deposited Historical Harvested Context OPERATIONAL LOG AUDIT HISTORY DATA DATA INFORMATION WAREHOUSE CONTENT ASSET HUB HUB DEEP DATA ACTIVITY CODE HUB HUB Lake Repositories View-based Interaction Published EXPORT AREA OBJECT CACHE DATA MARTS Information Integration & Governance Lake

lake logical architecture Decision Model Analytics Tools Management Information Curator Governance, Risk and Compliance Team Enterprise IT System of Record Applications Systems of Engagement New Sources Third Party Feeds Third Party APIs Internal Sources Systems of Automation Other Systems Of Other Insight Lakes Enterprise Service Bus Deploy Real-time Decision Models Events to Evaluate Notifications Information Service Calls Out In Deploy Real-time Decision Models Enterprise IT Interaction APIs Continuous Analytics STREAMING ANALYTICS EVENT CORRELATION Publishing Feeds Ingestion INFORMATION BROKER BROKER CODE HUB Access SAND BOXES Information Service Calls Descriptive Deposited Historical Harvested Context Published STAGINGAREAS Deposit Secure Access CATALOG OPERATIONAL HISTORY INFORMATION WAREHOUSE CONTENT HUB OPERATIONAL GOVERNANCE HUB EXPORT AREA Deploy Decision Models Raw Interaction SEARCH INDEX LOG DATA ASSET HUB OBJECT CACHE Lake INFORMATION VIEWS DEEP DATA ACTIVITY OFFLINE ARCHIVE HUB AUDIT DATA DATA MARTS Information Integration & Governance Understand Information Sources Secure Access CODE HUB MONITOR Lake Repositories Advertise Information Source Catalog Interfaces View-based Interaction Secure Access Search SAND WORKFLOW Feedback BOXES Access Refine GUARDS Understand Compliance Understand Information Sources Search Requests Curation Interaction Information Service Calls Deposit Access Report Requests Management Report Compliance Line of Business Applications Analytical Insight Applications Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Lake Operations

Differing user perspectives Provision Sand Boxes. Sand Box Search for, locate and download data and related artifacts. Define governance policies, rules and classifications. Monitor compliance. Information Governance Catalogue View lineage (business and technical) and perform impact analysis. Add additional insight into data sources through automated analysis. Develop data management models and implementations. Curation of Metadata about Stores, Models, Definitions Stores Stores Stores

Governance Rules Defined for each classification for each situation Sensitive information masked here Personal information masked here

Integrated Metadata Lineage (Traceability) Where does this data come from? Why is this data incorrect? Why is this data incomplete? Can I trust this value? Impact Analysis Where is this element used? What happens if I change this? Optimization Where is the redundancy? How can I make this run more efficiently? Understanding What does this mean? How is this used? Control Why is this parameter set to this value? Who made this change? I can change this to meet new business requirements

Secure access to the data lake s data The data lake s security is assured with this combination of business processes and technical mechanisms. curation for security and protection access approval by subject area owner Well defined access points centric security access Isolated repositories Access monitoring and logging Security analytics and investigation Security audit and review ADMINISTRATION AT RUNTIME REVIEW

Building a data lake The first step in creating the lake is to establish the following: The information integration and governance components, The staging areas for integration, The catalog, The common data standards. The build out of the lake then proceeds iteratively based on the following processes: Governance of a data lake subject area. Managing an information source. Managing an information view. Enabling analytics. Maintaining the data lake infrastructure.

z zz z z z z Questions?

Thank You