Mastering Big Data Steve Hoskin, VP and Chief Architect INFORMATICA MDM October 2015
Agenda About Big Data MDM and Big Data The Importance of Relationships Big Data Use Cases
About Big Data Big Data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. Big Data is also sometimes used as a generic label for the Hadoop frameworks that allow for the processing and management of these data sets
Need for Hadoop
HDFS is not the only place for Big Data
Big Data Stack without Big Data? Don t need really Big Data to be able to gain the benefits of the Big Data Stack Scalable batch execution environment Reduced database costs Open source projects provide capabilities Machine Learning Graph Analytics
Downsides? Numerous distributions with frequent releases Steep learning curves Solutions only work for the 80% - how to build the rest? Currently better suited to analytics than operational use cases Its just tooling still need to build/buy business solutions So many choices, potential dead-ends Requires different hardware deployment
A Master Data Management Primer Data Acquisition Data Quality and Enrichment Authoring Matching & Deduplication Relationship Discovery and Management Survivorship - aka Golden Record and Best Version of Truth Search Workflow Governance Real-time consumption Publishing and consumption
Hadoop is a good fit for these MDM functions Today Data Acquisition Data Quality and Enrichment Authoring Matching & Deduplication Relationship Discovery and Management Survivorship - aka Golden Record and Best Version of Truth Search Workflow Governance Operational consumption Publishing
Intelligent Layers of Big Data Catalog, Relate & Score Big Data Catalog
Intelligent Layers of Big Data Organize, Fix & Enrich Trusted Reference Data Catalog, Relate & Score Big Data Catalog
Intelligent Layers of Big Data De-dup, Enrich & Relate Big Data Relationship Management Organize, Fix & Enrich Trusted Reference Data Catalog, Relate & Score Big Data Catalog
Intelligent Layers of Big Data Big Data Consumption and Analytics De-dup, Enrich & Relate Big Data Relationship Management Organize, Fix & Enrich Trusted Reference Data Catalog, Relate & Score Big Data Catalog
What data do we have, and how useful is it? Content Inference Sensitive Data Tracking Stewardship Smart Suggestions Crawl Index Cluster Classify Relate Infer Semantics Catalog of Data Assets Relationships Quality Score Statistics Rules Glossary Ratings All IT Repositories Applications, Business Semantics 3rd Party BI, Modeling, Big Data User Ratings, Feedback, Operational Stats
Big Data Quality Make Sense of Big Data Ingest Deliver
Big Data Quality Make Sense of Big Data 10110101 10010011 Explore: Identify common patterns Find outliers Help ask the right questions Ingest Deliver
Big Data Quality Make Sense of Big Data 10110101 10010011 Explore: Identify common patterns Find outliers Help ask the right questions Recommend: Suggest actions based on the data Recommend the next best step Predict outcomes Ingest Deliver
Big Data Quality Make Sense of Big Data 10110101 10010011 Explore: Identify common patterns Find outliers Help ask the right questions Recommend: Suggest actions based on the data Recommend the next best step Predict outcomes Learn: From system recommendations From user actions From data itself Ingest Deliver
Big Data Quality Make Sense of Big Data 10110101 10010011 Explore: Identify common patterns Find outliers Help ask the right questions Recommend: Suggest actions based on the data Recommend the next best step Predict outcomes Learn: From system recommendations From user actions From data itself Ingest Deliver
Relationships & Social MDM
Relationships & Social MDM John Q. Jones 1 John Quincy Jones Jonathan Quincy Jones Location Product Customer Single Person View ASSERTED Account
Relationships & Social MDM John Q. Jones 1 John Quincy Jones Jonathan Quincy Jones Location Product Customer Single Person View ASSERTED Account Purchase History Claims Product Reviews Complaints Payment 2 Family & Business Relationship Transactional Data Social Data 360 0 View of Person Relationships OBSERVED
Relationships & Social MDM John Q. Jones 1 John Quincy Jones Jonathan Quincy Jones Location Product Customer Single Person View ASSERTED Account Purchase History Claims Product Reviews Complaints Payment RFM Calculation Fraud Detection Product Sentiment Customer Churn 2 Family & Business Relationship Transactional Data Social Data 360 0 View of Person Relationships OBSERVED 3 Customer Segmentation Churn Prediction Sentiment Analysis Fraud Management Complete View of Person Interactions and Predictions DERIVED
Relationships & Social MDM John Q. Jones 1 John Quincy Jones Jonathan Quincy Jones Purchase History Claims Product Reviews RFM Calculation Fraud Detection Product Sentiment Governance Visualization Prediction Location Product Customer Single Person View ASSERTED Account Complaints Payment Customer Churn 2 Family & Business Relationship Transactional Data Social Data 360 0 View of Person Relationships OBSERVED 3 Customer Segmentation Churn Prediction Sentiment Analysis Fraud Management Complete View of Person Interactions and Predictions DERIVED Social MDM
MDM Relationships Add Value 17
Common Graphs in MDM Organizational Hierarchy Social Network Product Hierarchy
Relate Business Entities in MDM Vertex/Node Party, Product, Claims, Complaints etc. BE s Edges Relationship (Accident, Bad Service etc.) Relationship
MDM Graph Database Asserted Data Customer BE Party BE Sales Person BE Product BE Observed Data Observed Data MDM GRAPH Transaction Data Relationship Social Data Derived Data Prediction
Big Data MDM Use Cases
Use Cases Financial Services Fraud Detection Risk & Portfolio Analysis Investment Recommendations Retail & Telco Proactive Customer Engagement Location Based Services Media & Entertainment Online & In-Game Behavior Customer X/Up-Sell Manufacturing Connected Vehicle Predictive Maintenance Healthcare & Pharma Predicting Patient Outcomes Total Cost of Care Drug Discovery Public Sector Health Insurance Exchanges Public Safety Tax Optimization Fraud Detection
Large Insurance Company Customer Intelligence Example 1200+ Input Files 718 Million Records 7Use Cases 10 Nodes Hadoop Cloudera Informatica Datameer Business need Challenge Solution and results 360 degree view of consumers for marketing, planning, and analytics Discover and mine relationships Create highly targeted and individualized marketing programs Rich data environment across organizational business units, comprised of many source systems across various platforms Providing a consistent enterprise view of data across business units Seven use cases with increasing complexity Provides single platform to house customer and prospect data from disparate sources Provides for rapid intake of new data sources (structured and unstructured) Eliminates data intake and append bottleneck Empowers Analysts to explore all data elements Increases processing power for statistical analysis
Fraud & Intelligence System Use Case Unrelated Events? MDM can be leveraged to build linear scalable Fraud Management system that provides link analysis, data clustering and also offers very best search and match against large data volume
Fraud & Intelligence System Use Case Unrelated Events? Or Fraud MDM can be leveraged to build linear scalable Fraud Management system that provides link analysis, data clustering and also offers very best search and match against large data volume
Questions? 25