Enterprise Information Catalog Self Service Data Discovery through Enterprise Information Catalog
Safe Harbor The information being provided today is for informational purposes only. The development, release and timing of any Informatica product or functionality described today remain at the sole discretion of Informatica and should not be relied upon in making a purchasing decision. Statements made today are based on currently available information, which is subject to change. Such statements should not be relied upon as a representation, warranty or commitment to deliver specific products or functionality in the future.
Agenda Value Proposition Features and Functionality Demo Product Architecture Questions
Value Proposition
Market Trends Driving Next-Gen Metadata Management More Data, Many Systems Dark Data Increasing Regulations From Truth to Trust
Live Data Map Ø Knowledge graph of enterprise metadata assets Metadata Services Indexed catalog of extracted metadata System Characteristics Massive Scale High Availability Extensible High Load & Search Performance Data profile & statistical information Enrichment of content Discovered data domains Derived and discovered relationships Human input & behavior Classification, clustering Open APIs
A Common Foundation for Data Intelligence Applications Enterprise Information Catalog Unified view into enterprise data assets Secure@Source Enterprise-wide visibility into sensitive data risks Intelligent Data Lake.. Self-Service data preparation on big data Universal Metadata Services + Storage Layer Live Data Map Knowledge graph of enterprise metadata assets
Live Data Map: Foundation for Data Intelligence Data Discovery Sensitive Data Tracking Stewardship & Governance Smart Suggestions Ø Ø Ø Exploration Semantic Search Relationship Discovery Map Live Data Map Knowledge Relationships Graph of all enterprise Rules data assets EIC Catalog Glossary Statistics Ratings Ø Ø Ø Recommendations 360 degree views User Ratings All#Informatica# Repositories 3rd#party# BI,#Modeling,# Big#Data,#RDBMS Applications,# Business# glossary#&#context# User#Ratings,#Feedback,# Operational#Stats
Enterprise Information Catalog: Vision Enterprise Information Catalog enables Business and IT users realize the full potential of their enterprise data assets by providing a unified metadata view that includes technical metadata, business context, user annotations, relationships, data quality and usage
Enterprise Information Catalog Ø Unified view into enterprise information assets Business-user oriented solution Semantic search with dynamic facets Data lineage Change impact Relationships discovery High level data profiling Data domains Custom attributes with business classifications Broad metadata source connectivity Big data scale
With Big Data come bigger questions Where is data of this type? How did it get here? What data was used to create this attribute? Is my report using right data? Who owns this dataset? Is this dataset good for my analysis?
Enterprise Information Catalog Powered By Live Data Map 1 Data Classification Use Machine Learning to tag semantics to data elements in data sets ü Domain Discovery ü Smart Domains Roadmap Without associating semantics to technical assets in the catalog, it will be useless. However, we also don t have an army of people who can perform those associations for us. 2 Data Discovery Google for Enterprise Metadata : Large scale distributed metadata index ü Semantic Search ü Dynamic Facets We have 20000+ databases and no idea what is in them. 3 Data Governance Extract and Infer Lineage Relationships ü Lineage ü Business Glossary Integration Roadmap BCBS 239: Principle 9 - A bank should develop an inventory and classification of risk data items which includes a reference to the concepts used to elaborate the reports. INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Enterprise Information Catalog Powered By Live Data Map 4 Broad Connectivity Scanners for DB, DWH, ETL, BI, Big Data, Applications and more ü Purpose Built Connectors ü Scanner SDK Roadmap We have data management software from multiple vendors in our IT environment. Need to see metadata from all for an endto-end picture 5 Big Data Scale Deployed on Hadoop and internally uses Titan, Solr and Spark ü Parallel Metadata Ingestion ü 24X7 Availability Mission critical system with 1 Billion Objects in 4 Applications and growing. 6 Crowdsourced Annotations Typed and Free Form User Annotations ü Custom Attributes ü Business Classifications Leverage Wisdom of the Crowds: Enrich datasets with tribal knowledge making classifications, comments and more available to everyone in the organization. INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Features and Functionality 1
Data Discovery Data discovery through a powerful search engine to find relevant data Advanced keyword search with token matching to find the most relevant data assets in the catalog Search Auto-Complete provides suggestion as user types into the field Intelligent Facets are provided based on the search results allowing users to narrow the search to most relevant data assets
Data Lineage Interactively trace data origin through summarized lineage views for business users A simplified view of lineage that highlights the end points and not the transformations in between Drill down to expand any lineage path to see more details INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
360 Relationship Views Discover related datasets by uncovering relationships Get a 360 Degree View of data asset using the relationship view. Includes related tables, views, domains and reports Expand relationship circles to get more details on relationship types and objects. INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Integrated Data Profiling Statistics Understand data quality statistics before using data sets for analysis Profiling Statistics are available in data asset views Detailed Profiling Statistics including value distributions, patterns, data types and data domains for Columns INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Custom Attributes: Create and Assign Leveraging Wisdom of Crowds Define custom attributes in Live Data Map Administrator and choose the applicable data assets. Users assign value to the attributes in the search results or asset views. INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Custom Attributes - Search and Filter Leveraging Wisdom of Crowds Custom Attribute values are searchable so that users can get to annotated data assets quickly You can also choose to filter the search results by values of custom attributes INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Business Classifications Provide Business Context Extract Classifications from Business Glossary..and use it to annotate datasets in EIC INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Enterprise Information Catalog DEMO
Live Data Map - Architecture Applications EIC Data Governance Data Security IDL Informatica and 3 rd Party REST API Services Search Smart Tags Admin Lineage Job Management Relationships Evolution Scheduler Scanners Plugins Data Profiler Plugin Inference Analyzers Ingestion Service Processing Data Profiling Engine Hadoop Grid (Yarn) Titan MRS PWH HBASE Storage
Big Data Scale Designed for Large Scale Internally uses a fully managed Hadoop cluster to support enterprise scale deployments Can also be deployed on an existing Hadoop cluster Graph technologies to store and query large enterprise knowledge graphs High Load and Search Performance Supports Parallel Metadata Ingestion to quickly update the catalog with multiple sources High Speed Indexing to provide the most updated catalog content to the users. Distributed indexes to provide unmatched search performance over millions of data assets. High Availability Fault tolerance and High availability to provide 24X7 Catalog uptime INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE
Supported Connectivity Big Data Cloudera Navigator Hive(Cloudera/Hortonworks/MapR) Informatica Informatica Powercenter Informatica BDM User Scanner Business Glossary Classifications Business Intelligence IBM Cognos SAP Business Objects Database Oracle DB2 DB2 for z/os SQL Server Sybase JDBC Teradata Netezza Cloud Salesforce Tableau
Enterprise Information Catalog Benefits Catalog all data and processing assets in the enterprise ü ü All Data Sources: Databases, DW, Big Data, ETL Tools, BI reports and more. Data Sources for All: Make Data Source Discovery available to everyone including data scientists, data analysts, enterprise architects and developers, reducing data silos and increasing collaboration. Find and Explore the most relevant datasets for your data needs ü ü ü Find Available Datasets: Data Discovery through a powerful search engine to find relevant data from the catalog. Explore Data Potential: Understand underlying semantics and correlations across datasets to quickly explore data potential. Increase Productivity: Spend less time searching for data and more time and more time analyzing it. Enrich datasets by capturing and sharing context across the organization ü ü Leverage Wisdom of the Crowds: Enrich datasets with tribal knowledge making classifications, comments and more available to everyone in the organization. Provide Business Context: Increase IT-Business collaboration by providing the right business context to technical data assets. INFORMATICA CONFIDENTIAL DO NOT DISTRIBUTE 2 6
Questions & Answers Gaurav Pathak G. Srinivasa Raghavan Sanjeev Cherian Vamsi Krishna Darren Wrigley Director, Product Management Director, Development Director, Development Senior Manager, Development Product Specialist
User Groups Informatica User Groups are a great way for you to invest in your professional development and learn about new Informatica offerings. Local Chapter Leaders manage each IUG online and via in person meetings Network and Socialize Find and share content, best practices & tips Learn about the latest technologies and solutions from Informatica Discover how colleagues and peers use Informatica https://network.informatica.com/welcome/ LEARN MORE AT IW16 : Go to the Solutions Expo Informatica Pavilion / Ecosystem & Innovation Area: Talk to regional user group leaders Learn about meeting plans Join your regional user group When: Monday 6:00pm 8:30pm Tuesday 10:45am 2:15pm Wednesday 10:30am 1:45pm Where: Moscone West Hall Level One
Enterprise Information Catalog Addressing Key User Challenges Data Analysts IT Data Architects I can t easily discover and explore data without IT help I have no insight into what data sets are related or where the data come from I need a self-service environment where I can find data & information about it easily I have to deliver data assets to my business users The business wants an easy way for consumers to discover and use existing data assets in the organization I need a single catalog for the organization reducing data silos I have too much redundant data sprawl I want a single place for all users to discover and enrich enterprise data assets.
LDM Administrator
Add Catalog Resource Define Connection Settings Provide Metadata Extract and Profiling settings Schedule Scanner Runs 3
Resource Library See configured resources and schedules in the Library view with Filters. 3
Connection Assignments Connection Assignment through an easy to use interface 3
Custom Attributes: Create and Assign Define custom attributes in Live Data Map Administrator and choose the applicable data assets. Users assign value to the attributes in the search results or asset views.
Reusable Configurations Create Reusable Configurations for DIS settings used in Profiling. 3
Monitor Tasks Monitor Task Status and View Logs