The Data Reservoir as an enabler of differentiating Analytics initiatives



Similar documents
The Data Reservoir. 10 th September Mandy Chessell FREng CEng FBCS Dis4nguished Engineer, Master Inventor Chief Architect, Informa4on Solu4ons

IBM Big Data in Government

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Ganzheitliches Datenmanagement

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Patterns of Information Management

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

MDM and Data Warehousing Complement Each Other

Knowledgent White Paper Series. Developing an MDM Strategy WHITE PAPER. Key Components for Success

Big Data Analytics Nokia

Luncheon Webinar Series May 13, 2013

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How to avoid building a data swamp

Transforming Industries with Data & Analytics

Integrating a Big Data Platform into Government:

Beyond the Single View with IBM InfoSphere

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Master Data Management What is it? Why do I Care? What are the Solutions?

JOURNAL OF OBJECT TECHNOLOGY

Master Data Management

SAP Agile Data Preparation

What's New in SAS Data Management

IBM Analytics Make sense of your data

Front cover. Smarter Analytics: Driving Customer Interactions with the IBM Next Best Action Solution. Redguides for Business Leaders

Big Data Integration: A Buyer's Guide

VIEWPOINT. High Performance Analytics. Industry Context and Trends

Business Data Authority: A data organization for strategic advantage

Customer Cloud Architecture for Big Data and Analytics, Version 1.1

Whitepaper Data Governance Roadmap for IT Executives Valeh Nazemoff

How To Use Big Data For Business

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

IBM Software Integrating and governing big data

Harness the value of information throughout the enterprise. IBM InfoSphere Master Data Management Server. Overview

Exploiting Data at Rest and Data in Motion with a Big Data Platform

LOG INTELLIGENCE FOR SECURITY AND COMPLIANCE

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Enabling Data Quality

Applying Business Architecture to the Cloud

The Principles of the Business Data Lake

Customer Cloud Architecture for Big Data and Analytics

How the oil and gas industry can gain value from Big Data?

The Future of Data Management

Advanced In-Database Analytics

IBM InfoSphere Guardium Data Activity Monitor for Hadoop-based systems

Data Virtualization and ETL. Denodo Technologies Architecture Brief

The Enterprise Data Hub and The Modern Information Architecture

IBM SmartCloud Application Performance and Monitoring. RTView for APM Webinar

Business-driven governance: Managing policies for data retention

DRIVING THE CHANGE ENABLING TECHNOLOGY FOR FINANCE 15 TH FINANCE TECH FORUM SOFIA, BULGARIA APRIL

A discussion of information integration solutions November Deploying a Center of Excellence for data integration.

IBM BigInsights for Apache Hadoop

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

BRIDGE. the gaps between IT, cloud service providers, and the business. IT service management for the cloud. Business white paper

3/13/2008. Financial Analytics Operational Analytics Master Data Management. March 10, Looks like you ve got all the data what s the holdup?

The Way to SOA Concept, Architectural Components and Organization

Avoiding Web Services Chaos with WebSphere Service Registry and Repository

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

Introduction to Glossary Business

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

BIG DATA THE NEW OPPORTUNITY

WHITE PAPER SPLUNK SOFTWARE AS A SIEM

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

IBM Solution Framework for Lifecycle Management of Research Data IBM Corporation

White Paper. Thirsting for Insight? Quench It With 5 Data Management for Analytics Best Practices.

From Lab to Factory: The Big Data Management Workbook

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

IBM 2010 校 园 蓝 色 加 油 站 之. 商 业 流 程 分 析 与 优 化 - Business Process Management and Optimization. Please input BU name. Hua Cheng chenghua@cn.ibm.

Hadoop in the Hybrid Cloud

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

Agile Business Intelligence Data Lake Architecture

Turn your information into a competitive advantage

IBM Analytics Prepare and maintain your data

Build a Streamlined Data Refinery. An enterprise solution for blended data that is governed, analytics-ready, and on-demand

End Small Thinking about Big Data

The IBM Solution Architecture for Energy and Utilities Framework

Predictive Marketing for Banking

Big Data must become a first class citizen in the enterprise

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

IBM InfoSphere BigInsights Enterprise Edition

How to Run a Successful Big Data POC in 6 Weeks

Big Data, Integration and Governance: Ask the Experts

MDM for the Enterprise: Complementing and extending your Active Data Warehousing strategy. Satish Krishnaswamy VP MDM Solutions - Teradata

Data Wrangling: From the Wild to the Lake

The Next Generation of Security Leaders

Framework for Data warehouse architectural components

Integrating Netezza into your existing IT landscape

THOMAS RAVN PRACTICE DIRECTOR An Effective Approach to Master Data Management. March 4 th 2010, Reykjavik

API Management Introduction and Principles

Architecture & Experience

A WHITE PAPER By Silwood Technology Limited

IBM Data Warehousing and Analytics Portfolio Summary

The Future of Business Analytics is Now! 2013 IBM Corporation

Beyond the Data Lake

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

SOA, Cloud Computing & Semantic Web Technology: Understanding How They Can Work Together. Thomas Erl, Arcitura Education Inc. & SOA Systems Inc.

Three Open Blueprints For Big Data Success

Transcription:

Mandy Chessell CBE FREng CEng FBCS Distinguished Engineer, Master Inventor Chief Architect, Solutions The Reservoir as an enabler of differentiating Analytics initiatives 3 rd March 2015

Agenda Changing landscapes Analytics Lifecycles reservoir overview Questions Looks like you ve got all the data what s the holdup? 2

Architecture for the Future CHANGING INFORMATION LANDSCAPES 3

Knowing your customers enables you to serve them better High-value, dynamic - source of competitive differentiation How? Interaction data - Email / chat transcripts - Call center notes - Web Click-streams - In person dialogues Why? Attitudinal data - Opinions - Preferences - Needs and Desires 4 Descriptive data - Attributes - Characteristics - Relationships - Self-declared info - (Geo)demographics Who?? Traditional Behavioral data - Orders - Transactions - Payment history - Usage history What?

Knowing your customers enables you to serve them better High-value, dynamic - source of competitive differentiation How? Why? Interaction data - Email Analysis / chat transcripts of - Call Channel center Interaction notes - Web Click-streams Methods - In person dialogues Attitudinal data - Opinions Analysis of - Preferences Feedback and - Needs and Desires Interaction Content? Descriptive data - Attributes - Characteristics Master - Relationships Management - Self-declared info - (Geo)demographics Behavioral data - Orders - Transactions from - Payment history - Usage history Systems of Record 5 Who? Traditional What?

The broadening scope of analytics Operational Store Warehouse Pattern Discovery for Analytics Reporting Marts flow in one direction; analytics operating on data extracted from real-time operations 6

The broadening scope of analytics Operational Store Warehouse Pattern Discovery for Analytics Reporting Marts SOA Master Management Hub Master data management creates a synchronization point for customer identity and key demographic information. SOA simplifies distribution of data between operational systems. 7

The broadening scope of analytics Operational Store Warehouse Pattern Discovery for Analytics Reporting Marts SOA Master Management Hub Hadoop Hadoop provides cheap storage and processing to increase the amount of data and the type of data that can be processed in a cost-effective manner. Streaming analytics enables high-speed in memory analytics Customer Conversations, Web, Social Media, Log files, Sensors and real-time events 8

The broadening scope of analytics Pattern Discovery for Analytics Operational Store Warehouse Search For SAND BOXES Analyze Values Reporting SOA Master Management Hub Reservoir Hadoop Streaming Analytics 9 Adding in a business desire for real-time analytics, self service data and increasing regulations relating to individual privacy, it becomes necessary to have a well- defined, managed and governed approach to information architecture. We call this the data reservoir.

blues & skills issues A disproportionate portion of the time spent in analytics project is about data preparation: acquiring/preparing/formatting/normalizing the data In addition to raw data, augmented data/analytical assets can significantly speed up the analytics process and partially bridge the talent gap 10

Business scenarios we see Subject matter experts want access to their organization s data to explore the content, select, control, annotate and access information using their terminology with an underpinning of protection and governance. Scientists seeking data for new analytics models. Marketeer seeking data for new campaigns. Fraud investigator seeking data to understand the details of suspicious activity. Day-to-day activity. Requiring ad hoc access to a wide variety of data sources. Supporting analysis and decision making. Using the subject matter experts terminology. Providing the flexibility of spreadsheets that can scale to large volumes, a wide variety of information types whilst protecting sensitive information and optimizing data storage and provisioning. 11

The interesting dilemma A man goes into a jewellers and buys an expensive watch Is it fraud in which case the bank must stop it Threat Is it money-laundering in which case the bank must report it Obligation Does he have an expensive trophy partner in which case perhaps he would be interested in a loan? Has he just won the lottery should the bank improve the services offered? Opportunity The same event is of interest by different departments. There is major overlap in the data required to answer the question. It may not be possible to determine the answer with just the information in the channel - Previous or subsequent activity is required It is all a matter of coordination and timing 12

Application Groupings Systems of Engagement Characterised by: Availability requirements Performance Skills Rate of change / Stability Systems of Record Systems of Insight 13

A growing demand Business Teams want Open access to more information More powerful analysis and visualization tools IT Teams are Concerned about cost. Concerned about governance and regulatory requirements. 14

Architecture for the Future THE DATA RESERVOIR 15

How do we access information? Access in place Up-to-date information Cost-effective Slower access path Remote Access Reformatting Make a local copy Specially formatted for use case Local access Local control Local cost Potentially stale values How much information? How rapidly is it changing? How frequently is it accessed? How much transformation is required to consume the information? When is the information available? Who owns the information? How easily can it be changed? 16 16

How does the data reservoir support analytics development? 1 2 Advertise Catalog Reservoir 17

How does the data reservoir support analytics development? 3 Discover 1 2 Advertise Catalog 4 Reservoir Provision 18

How does the data reservoir support analytics development? 3 5 Discover Explore 1 2 Advertise 4 Provision Catalog Sandbox Reservoir 19

How does the data reservoir support analytics development? 3 5 6 Discover Explore Deploy 1 2 Advertise Catalog Sandbox 4 Reservoir Provision 20

Active decision making in real-time 2 3 Facts, Recent Events, Options Context 1 3 Activity Decision Action Decision Input, Actions and Outcomes 4 5 Feedback 5 SOA 1. An activity occurs that calls for a decision. 2. The context from the activity is past to the decision process. 3. The decision process augments the context with stored information and runs the decision model. 4. One of more actions are recommended to the activity. 5. The activity feeds back the results. Used to tune the model over time. 21

How does the data reservoir support data distribution? 3 Access Catalog 1 Reservoir Provision 2 Distribute 22

Big Lakes or Swamps? As we collect data Can we preserve clarity? Do we know what we are collecting? Can we find the data we need? Are we creating a data swamp? How do we build trust in big data? Do we know what data is being used for? 23

"The need for increased agility and accessibility for data analysis is the primary driver for data lakes," said Andrew White, vice president and distinguished analyst at Gartner. "Nevertheless, while it is certainly true that data lakes can provide value to various parts of the organization, the proposition of enterprise wide data management has yet to be realized." 24 http://www.gartner.com/newsroom/id/2809117

The Reservoir Reservoir Services Reservoir Repositories Management and Governance Fabric Reservoir Reservoir = Efficient Management, Governance, Protection and Access. 25

reservoir connects to many types of systems Decision Model Analytics Tools Management Enterprise IT Deploy Real-time Decision Models Deploy Real-time Decision Models Service Calls Access Deposit Deploy Decision Models Line of Business Systems of Engagement System of Record New Sources Third Party Feeds Third Party APIs Internal Sources Enterprise Service Bus Events to Evaluate Notifications Service Calls Out In Search Requests Curation Interaction Service Calls Deposit Access Report Requests Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs 26 Reservoir Management Reservoir Operations

reservoir supports real-time and batched ingestion of data Decision Model Analytics Tools Management Enterprise IT Deploy Real-time Decision Models Deploy Real-time Decision Models Service Calls Access MANUAL REQUEST Deposit Deploy Decision Models Line of Business Systems of Engagement SCHEDULED EXTRACT System of Record New Sources Third Party Feeds Third Party APIs Internal Sources Enterprise Service Bus Events to Evaluate Notifications Service Calls Out In INFORMATION SERVICE CALL CHANGE DATA CAPTURE EXTERNAL FEED Search Requests Curation Interaction Service Calls Deposit Access Report Requests Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs 27 QUEUED MESSAGE Reservoir Management Reservoir Operations

refineries provide data movement, preparation, governance Decision Model Analytics Tools Management Enterprise IT Systems of Engagement System of Record New Sources Third Party Feeds Third Party APIs Internal Sources Enterprise Service Bus Deploy Real-time Decision Models Events to Evaluate Notifications Service Calls Out In Deploy Real-time Decision Models Enterprise IT Interaction Continuous Analytics STREAMING ANALYTICS EVENT CORRELATION Service Interfaces Publishing Feeds Ingestion Service Calls Access Deposit Deploy Decision Models Reservoir Repositories Understand Sources Search Requests Curation Interaction Service Calls Deposit Access Report Requests Line of Business Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs 28 Reservoir Management Reservoir Operations

Big data needs a variety of repositories for cost, access and performance reasons Decision Model Analytics Tools Management Enterprise IT Deploy Real-time Decision Models Deploy Real-time Decision Models Service Calls Access Deposit Deploy Decision Models Line of Business Systems of Engagement System of Record Master and Reference New Sources Enterprise Service Bus Third Party Feeds Third Party APIs Internal Sources Events to Evaluate System-level Notifications (Pre-Archive) Service Calls All types of data Out In Structured and Optimized Descriptive Shared Operational Deposited Historical Harvested CATALOG CONTENT HUB OPERATIONAL HISTORY DEEP DATA INFORMATION WAREHOUSE SEARCH INDEX ASSET HUB INFORMATION VIEWS ACTIVITY HUB AUDIT DATA OFFLINE ARCHIVE CODE HUB Reservoir Repositories View-based Interaction Published OBJECT CACHE SAND BOXES REPORTING DATA MARTS Search Requests Curation Interaction Service Calls Deposit Access Report Requests Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs 29 Reservoir Management Reservoir Operations

Like a well-run library, the data reservoir has a catalog Decision Model Analytics Tools Management Curator Governance, Risk and Compliance Team Enterprise IT Systems of Engagement Deploy Real-time Decision Models Deploy Real-time Decision Models Service Calls Access Deposit Deploy Decision Models Understand Sources Catalog Interfaces Advertise Source Understand Compliance Understand Sources Report Compliance Line of Business System of Record New Sources Third Party Feeds Third Party APIs Internal Sources Enterprise Service Bus Events to Evaluate Notifications Service Calls Out In Descriptive Shared Operational Deposited Historical Harvested CATALOG SEARCH INDEX CONTENT ASSET HUB HUB OPERATIONAL HISTORY DEEP DATA INFORMATION WAREHOUSE INFORMATION VIEWS ACTIVITY CODE HUB HUB AUDIT DATA OFFLINE ARCHIVE Reservoir Repositories View-based Interaction Published OBJECT CACHE SAND BOXES REPORTING DATA MARTS Search Requests Curation Interaction Service Calls Deposit Access Report Requests Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs 30 Reservoir Management Reservoir Operations

Differing user perspectives Provision Sand Boxes. Sand Box Search for, locate and download data and related artifacts. Define governance policies, rules and classifications. Monitor compliance. Governance Catalogue View lineage (business and technical) and perform impact analysis. Add additional insight into data sources through automated analysis. 31 Develop data management models and implementations. Curation of Metadata about Stores, Models, Definitions Stores Stores Stores

governance provides the mechanism for building trust Decision Model Analytics Tools Management Enterprise IT Systems of Engagement Deploy Real-time Decision Models Deploy Real-time Decision Models Service Calls Access Deposit Deploy Decision Models Understand Sources Line of Business System of Record New Sources Third Party Feeds Third Party APIs Internal Sources Enterprise Service Bus Events to Evaluate Notifications Service Calls Out In Reservoir Search Requests Curation Interaction Service Calls Deposit Access Report Requests Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs Integration & Governance INFORMATION BROKER CODE HUB STAGING AREAS OPERATIONAL GOVERNANCE HUB MONITOR WORKFLOW GUARDS Management Reservoir Operations 32

governance delivers Governance Standards Protection Compliance Requirements Usage Privacy Policy Administration Identification Architecture Lifecycle Quality Retention Disposal Policy Implementation Policy Enforcement Dependencies Values Quality Supply Chain Integrity Policy Monitoring Understanding of the information that an organization has Confidence to share and reuse information Protection from unauthorised use of information Monitoring of activity around the information Implementation of key business processes that manage information Tracking the provenance of information Management of the growth and distribution of their information 33

Three interlocking lifecycles of information governance Policy Metadata Operations Policy Development Policy 34

Classification Schemes Classification is at the heart of information governance. It characterizes the type, value and cost of information, or the mechanism that manage it. The design of the classification schemes is key to controlling the cost and effectiveness of the information governance program. Business Classifications Business classifications characterize information from a business perspective. This captures its value, how it is used, and the impact to the business if it is misused. Resource Classifications Resource classifications characterize the capability of the IT infrastructure that supports the management of information. A resource's capability is partly due to its innate functions and partly controlled by the way it has been configured. Activity Classifications Activity classifications help to characterize procedures, actions and automated processes. Semantic Classification Semantic classification identifies the meaning of an information element. The classification scheme is a glossary of concepts from relevant subject areas. These glossaries are industry specific and they are shipped with our industry models. The semantic classifications are defined at two levels: Subject area classification Business term classification 35

Policy support inside the Governance Catalogue Principle Principle Principle Policy Policy Policy Implications Implications Actioned by Governance Rule Governs Classification Classification Classified by Metadata Description Implications Implemented by Describes Governance Rule Governance Rule Implementations Modelled Metadata Implementations Asset Deployed to, Executed by, Monitored by Asset 36

Governance Rules Defined for each classification for each situation Sensitive information masked here Personal information masked here Decision Model Analytics Tools Management Curator Governance, Risk and Compliance Team Deploy Real-time Decision Models Deploy Real-time Decision Models Service Calls Access Deposit Deploy Decision Models Understand Sources Advertise Source Understand Compliance Report Compliance Enterprise IT Systems of Engagement System of Record New Sources Enterprise Service Bus Third Party Feeds Third Party APIs Internal Sources Events to Evaluate Notifications Service Calls Out In Reservoir Understand Sources Search Requests Curation Interaction Service Calls Deposit Access Report Requests Line of Business Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Systems of Insight Other Systems Of Other Insight Reservoirs Management Reservoir Operations 37

Integrated Metadata Lineage (Traceability) Where does this data come from? Why is this data incorrect? Why is this data incomplete? Can I trust this value? Impact Analysis Where is this element used? What happens if I change this? Optimization Where is the redundancy? How can I make this run more efficiently? Understanding What does this mean? How is this used? Control Why is this parameter set to this value? Who made this change? I can change this to meet new business requirements 38

The Governance Ecosystem Policy and Standards Curation Refineries Exception Management Reporting and Audit Governance is built on metadata management, curation of information resources, policy aware runtimes such as MDM, Server, Guardium and Optim, Workflow and Reporting. 39

is delivered in appropriate forms for consumers Decision Model Analytics Tools Management Curator Governance, Risk and Compliance Team Enterprise IT Systems of Engagement Deploy Real-time Decision Models Deploy Real-time Decision Models Service Calls Access SAND BOXES Deposit Deploy Decision Models Raw Interaction Understand Sources Catalog Interfaces Advertise Source Understand Compliance Understand Sources Report Compliance Line of Business System of Record New Sources Enterprise Service Bus Third Party Feeds Third Party APIs Internal Sources Events to Evaluate Notifications Service Calls Out In Reservoir Repositories View-based Interaction Access and Feedback Published OBJECT CACHE SAND BOXES REPORTING DATA MARTS Search Requests Curation Interaction Service Calls Deposit Access Report Requests Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs 40 Reservoir Management Reservoir Operations

Virtualization hides the complexity of the information landscape Search and View Values Add Insight Define Views Create APIs Browse Sources Analyze Values Provision Virtualization 41

Building a data reservoir Integration & Governance INFORMATION BROKER CODE HUB STAGING AREAS OPERATIONAL GOVERNANCE HUB MONITOR WORKFLOW GUARDS The data reservoir needs governance and change management to ensure that information is protected and managed efficiently. The first step in creating the reservoir is to establish the information integration and governance components, the staging areas for integration, the catalog, the common data standards. The build out of the reservoir then proceeds iteratively based on the following processes: Governance of a data reservoir subject area. Managing an information source. Managing an information view. Enabling analytics. Maintaining the data reservoir infrastructure. 42

reservoir logical architecture Decision Model Analytics Tools Management Curator Governance, Risk and Compliance Team Enterprise IT Systems of Engagement System of Record New Sources Third Party Feeds Third Party APIs Internal Sources Enterprise Service Bus Deploy Real-time Decision Models Events to Evaluate Notifications Service Calls Out In Deploy Real-time Decision Models Enterprise IT Interaction Continuous Analytics STREAMING ANALYTICS EVENT CORRELATION Service Interfaces Publishing Feeds Ingestion Service Calls Access SAND BOXES Descriptive Shared Operational Deposited Historical Harvested Deposit CATALOG CONTENT HUB Deploy Decision Models Raw Interaction OPERATIONAL HISTORY DEEP DATA INFORMATION WAREHOUSE SEARCH INDEX ASSET HUB INFORMATION VIEWS ACTIVITY HUB AUDIT DATA OFFLINE ARCHIVE Understand Sources CODE HUB Catalog Interfaces Reservoir Repositories Advertise Source View-based Interaction Access and Feedback Published OBJECT CACHE SAND BOXES REPORTING DATA MARTS Understand Compliance Understand Sources Search Requests Curation Interaction Service Calls Deposit Access Report Requests Report Compliance Line of Business Analytical Insight Simple, ad hoc Discovery and Analysis Reporting Consumers of Insight Other Systems Of Other Insight Reservoirs 43 Integration & Governance INFORMATION BROKER CODE HUB STAGING AREAS OPERATIONAL GOVERNANCE HUB Reservoir MONITOR WORKFLOW GUARDS Management Reservoir Operations

The data reservoir Reservoir Services As organizations experiment with analytics they discover: Creating new analytics requires access to historical data from many systems. This data includes valuable and sensitive data that is core to the organization s operation. Hadoop is a flexible platform for storing many types of data but is not necessarily fast enough for the production deployment of some analytics. needs to be reformatted and copied onto a specialist analytics platforms such as Netezza. A data reservoir provides: Single extraction of data from operational systems and distribution to multiple analytics platforms. Cataloguing and governance of the data in the analytics platforms Reservoir Repositories Management and Governance Fabric Reservoir Simple interfaces for the line of business to access the information they need. Reservoir = Efficient Management, Governance, Protection and Access. 44

z zz z z z z Questions? 45

Architecture for the Future PRODUCT MAPPINGS 46

Systems interfacing with the Reservoir System/Subsystem Name Mobile and other channels System of Record New Sources Other Lakes Description These are operational applications that support the interaction with people such as customers, suppliers, employees. The data reservoir may supply key data values and analytical insight to a high-speed cache for these applications to improve the performance of simple lookups. The data reservoir is able to refresh this cache after an outage. These are operational applications that are driving an organization s daily business. They supply information to the data reservoir that describes this daily operation and its associated master data. They receive analytical insights and other derived information such as micro-segmentation and alerts. New sources describe information outside of the business data managed by the system of record applications. This may be log files from customer interactions, or information from third parties such as social media services and data providers. This data reservoir may be exchanging information with other data lakes, swamps or reservoirs either owned by this organization, or part of a cloud deployment or owned by an external party. Decision Model Management Curator Governance, risk and compliance platform Decision model management describes the systems used by data scientists and business analysts as they configure analytics models and rules to execute inside the data reservoir. This is where the advanced analytics and data mining is managed from. The team need access to samples of the data, formatted for analysis tools, with sufficient performance capacity to handle intense, lumpy workloads from the mining and testing processing. An individual or group of people in the organization that have information sources to share. The information governance, risk and compliance platform is a reporting environment used to demonstrate compliance in industry regulations and business policies. Some of these policies apply to the management of information. These policies are defined/managed in the data reservoir (operational governance component) and can be monitored by this external application. Line of Business These are applications designed to provide reports, search and simple analytics capabilities that are under the control of the lines of business. The interfaces they use are designed to be self-service with simple configuration to define new insight. 47

Reservoir Services Components Summary Component Description Product Pattern Ingestion ingestion is where data from the information sources is loaded into the data reservoir. This data is treated as reference data (read only) by the processes in the data reservoir. The data ingestion component is responsible for validating the incoming data, transforming relevant structured data to the data reservoir format and routing it to the appropriate data reservoir repositories. InfoSphere Server Broker Publishing Feeds Publishing feeds is responsible for distributing data from the data reservoir repositories to systems outside of the reservoir. This includes other data reservoirs and the operational systems of record. InfoSphere Server Broker Real-time Interfaces Real-time interfaces (a) provide services to access data in the data reservoir repositories and (b) provide real-time interfaces for querying data outside of the reservoir. These interfaces may be services or SQL style interfaces. InfoSphere MDM, Server ( Services Director) Service Real-time Analytics Real-time analytics provides complex event processing and real-time analytics based on the activity within the organization, and externally. InfoSphere Streams Streaming Analytics Node Raw Interaction Raw data interaction provides access to most of the data (security permitting) in the data reservoir for advanced analytics. It is responsible for masking sensitive personal information where appropriate. InfoSphere Server; GaianDB/InfoSphere Federation Server InfoSphere Big Insights Provisioning Catalog Interfaces The catalog interfaces provide information about the data in the data reservoir. This includes details of the information collections (both repositories and views), the meaning and types of information stored and the profile of the information values within each information collection. InfoSphere Server (InfoSphere Governance Catalog) Identification View-based Interaction Provides access to data in the data reservoir (subject to security permissions) for line of business teams that wish to perform ad hoc queries, search, simple analytics and data exploration. The structure of this information has been simplified and it is labeled using business relevant terminology. InfoSphere Server; GaianDB/InfoSphere Federation Server; InfoSphere Explorer Service; Provisioning; Search Node Reporting Marts The reporting data marts provide departmental/subject oriented data marts targeted at line of business reports. base Mart 48

Reservoir Repositories Harvested, Descriptive and Deposited Type Name Description Product Pattern Harvested Operational History A repository providing a historical record of the data from the systems of record. base Operational Status Node Warehouse A repository optimized for high speed analytics. This data is structured and contains a correlated and consolidated collection of information. Pure for Analytics; Industry Models Warehouse Deep A repository holding a copy of most of the data in the data reservoir. It provides a place where raw data can be landed for analysis. The data may be annotated, linked and consolidated in deep data. may be mapped to data structures after it is stored so effort is spend as needed rather than at the time of storing. This repository is designed for flexibility, supporting both for high volumes and variety of data. InfoSphere Big Insights; Industry Models Map-Reduce Node Audit A repository used to keep a record of the activity in the data reservoir. It is used for auditing the use of data and who is accessing it, when and for what purpose. InfoSphere Big Insights Event Node Descriptive Catalog A repository and applications for managing the catalog of information stored in the data reservoir. InfoSphere Server; Industry Models Identification Views Definitions of simplified subsets of information stored in the data reservoir repositories. These views are created with the information consumer in mind. Relational base; InfoSphere MDM; InfoSphere Federation Server Virtual Collection Deposited collections that have been stored by the data reservoir information users. These information collections may contain new types of information, analysis results or notes. InfoSphere Big Insights Physical Collection 49

Reservoir Repositories Shared operational data Type Name Description Product Pattern Shared Operational Asset Hub A repository for slowly changing operational master data (information assets) such as customer profiles, product definitions and contracts. This repository provides authoritative operational master data for the real-time interfaces, realtime analytics and for data validation in data ingestion. It is a reference repository of the operational MDM systems but may also be extended with new attributes that are maintained by the reservoir. When this hub is taking data from more than one operational system, here may also be additional quality and deduplication processes running that will improve the data. These changes are published from the asset hub for distribution both inside and outside the reservoir. InfoSphere MDM Advanced Edition Asset and Asset Hub Activity Hub A repository for storing recent activity related to a master entity. This repository is needed to support the real-time interfaces and real-time analytics. It may be loaded through the data ingestion process and through the real-time interfaces. However, many of its values will have been derived from analytics running inside the data reservoir. InfoSphere MDM Custom Domain Hub; Industry Models Activity and Activity Hub Code Hub A repository of common code tables and mappings used for joining information sources to create information views. InfoSphere Reference Management Hub (RDM) Code and Code Hub Content Hub A repository of documents, media files and other content that has been managed under a content management repository and is classified with relevant metadata to understand its content and status. Filenet 50

integration and governance components This component provides the control of the information movement and consumption within the data reservoir (more details follows ) Name Description Product Pattern Broker The runtime server environment for running the integration processes (such as the information deployment process) that move data in and out of the data reservoir and amongst the components within the reservoir. InfoSphere Server Broker Code Hub A repository managing code tables and code table used in the internal management of the data reservoir. InfoSphere Reference Management Hub (RDM) Code and Code Hub Staging Areas A server supporting the staging areas used to move information around the data reservoir. base or InfoSphere Big Insights or WebSphere MQ Staging Area Operational Governance Hub A repository and applications for managing the information flow and information governance within the data reservoir. This information node supports the metadata services. InfoSphere Server Governance Node Monitor A mechanism to monitor the overall function and responsiveness of the data reservoir to assure consistent working. InfoSphere Server Probe and Monitoring Workflow A server running stewardship processes that coordinate the work of individuals responsible for fixing any problems with the data in the data reservoir. WebSphere Business Process Manager Agile Process and Application Node Guards Capability to control access to information. Guardium plus user access directory Guard 51

Architecture for the Future REFERENCE MATERIAL 52

Architecture for a New Era of Computing A high level description of the Big and Analytics Reference Architecture http://www.redbooks.ibm.com/redbooks.nsf/redbookabstracts/redp501 2.html?Open 53

Taking the Journey to IBM Cognitive Systems Describes how an organization should prepare for cognitive computing Includes an example roadmap of solutions to develop key skills and capabilities. http://www.redbooks.ibm.com/abstracts/redp5043.html?open 54

Next Best Action Redguide The NBA Redguide is a customer guide to the solution. It is suitable for the C-suite executives. It explains the value of the solution. It describes the solution s architecture using the same diagrams as we have just covered. It also has examples of case studies from different industries. http://www.redbooks.ibm.com/abstracts/redp4888.html?open 55

Ethics for Big and Analytics Context for what purpose was the data originally surrendered? For what purpose is the data now being used? How far removed from the original context is its new use? Consent & Choice What are the choices given to an affected party? Do they know they are making a choice? Do they really understand what they are agreeing to? Do they really have an opportunity to decline? What alternatives are offered? Reasonable is the depth and breadth of the data used and the relationships derived reasonable for the application it is used for? Substantiated Are the sources of data used appropriate, authoritative, complete and timely for the application? Owned Who owns the resulting insight? What are their responsibilities towards it in terms of its protection and the obligation to act? Fair How equitable are the results of the application to all parties? Is everyone properly compensated? Considered What are the consequences of the data collection and analysis? Access What access to data is given to the data subject? Accountable How are mistakes and unintended consequences detected and repaired? Can the interested parties check the results that affect them? http://www.ibmbigdatahub. com/whitepaper/ethics-bigdata-and-analytics 56

Staying Ahead in the Cyber Security Game http://www- 01.ibm.com/common/ssi/cgibin/ssialias?subtype=WH&infotyp e=sa&appname=swge_ti_se_us EN&htmlfid=TIL14103USEN&attac hment=til14103usen.pdf#loade d 57

Industry Models and Big Whitepaper on the use of our industry models with big data. 58

Roles within the Reservoir Governor; appoint an individual to coordinate the definition of policies related to information governance and their implementation. Steward; appoint an individual to coordinate the manual activity necessary to monitor and verify that an information collection is meeting agreed quality levels. Create user interfaces and access rights to involve this individual in information quality processes such as the exception management process. Quality Analyst; appoint an individual to monitor and analyze the state of the information flowing through the information supply chain. Integration Developer; maintaining the data movement functionality in, around and out of the data reservoir. Infrastructure Operator; appoint an individual responsible for starting, maintaining, and monitoring the systems that support the information supply chain. 10001 01011 01101 Scientist; appoint an individual to analyze the information that the organization is collecting in order to understand patterns of success. Business Analyst; appoint an individual to analyze the way people are working, understand where the processes can be improved, and define new procedures, rules, and requirements for the IT systems. Owner; appoint an individual to be the owner of the information collection who is responsible and accountable for ensuring it is capable of supporting the organization s activities. Auditor; appoint an individual or team of individuals to review key aspects of how the organization is actually operating and compare it with agreed processes. Worker; appoint individuals who are responsible for the manual steps in the core business activity. Create user interfaces and access rights to provide these individuals access to the information supply chain through the information processes. Curator; appoint an individuals who are responsible maintaining the catalog of information in the data reservoir 59

60