Informatica Platform v10 for: Next Generation Analytics Cloud Modernization Data Archiving Presented by Ilya Gershanov 12.05.2016
Contents Introducing Informatica Next Generation Analytics Tinkoff Bank Success Story Video Cloud Modernization Data Archiving Q&A
Business Applications Data Stores
Business Applications A Better Way? Data Stores
Informatica v10 Intelligent Data Platform Data Intelligence Data Infrastructure Ingest Transform Validate Cleanse Master Secure Mask Archive Vibe Virtual Data Machine Business Applications Map Once. Deploy Anywhere. Data Stores
Disruptive Technology Trends Addressed by Informatica v10 Cloud Interaction Predictive Pervasive On-Premise Transaction Historical Perimeter Computing Data Analytics Security
Becoming A Data Ready Enterprise To Unleash Potential Need Timely, Relevant and Secured Data Everywhere
Informatica The #1 Independent Leader in Data Integration 267 325 391 456 501 650 784 812 948 1 1,048 bln. 1.1 1,100 bln. Founded: 1993 Headquarters: Redwood City, CA Annual Total Revenue ($ millions) Executives: Anil Chakravathy (CEO), Lou Attanasio (CRO), Doug Barnett (CFO), Jim Davis (CMO), Ansa Sekharan (EVP Support), Amit Walia (CPO) 2005-2015 Total Revenue CAGR = 17% 2015 Revenue: $1.1 billion Partners: Over 500 Customers: Over 6,000 4,500 customers using Informatica Cloud Customers in 82 countries, Informatica offices in 26 countries Ranked #1 in TNS Customer Loyalty rankings for 10 consecutive years 300 billion+ transactions per month Employees: Over 3,620 Technology Leadership: Gartner positions Informatica in leaders quadrant for Data Integration, Data Quality, MDM Customer Data, Integration Platform as a Service (ipaas), Structured Data Archiving and Application Retirement, and Data Masking 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 * A reconciliation of GAAP and non-gaap results is provided in the Appendix section, as well as on Informatica s Investor Relations website. 8
Proven Technology Leadership Enterprise Data Integration Cloud Data Integration Data Quality Data Masking Data Archiving Master Data Management
Introducing Informatica Next Generation Analytics Tinkoff Bank Success Story Video Cloud Modernization Data Archiving Q&A
What s changing in BI and Analytics? Classical analytics Next generation analytics Dashboards Reports KPI s Batch oriented Structured data Rear view mirror Predictive maintenance Fraud detection Operational Intelligence Real-time & streaming Structured & Unstructured data Forward looking
Driven by IT Driven by Business Next Generation Analytics Use Cases A phased approach to implementing Big Data initiatives Relational, Mainfram e Docume nts and Emails First Pilot(s) Data Warehouse Optimization Intelligent Data Lake Real-Time Operational Intelligence Fraud Detection Customer X/Up- Sell Predictive Maintenance Social Media, Web Logs Lower Total Cost of Care Machine Device, Cloud What is Hadoop? How does it work? Lower IT cost Lower Infrastructure Cost Starting point for real Big Data Analytics Full use of the Hadoop ecosystem Added Business Value Public Safety
How Does Informatica Fit in the New Analytics Stack? Analytical Applications Informatica Platform v10 Infrastructure: Data Warehouses, Data Lakes, Hadoop, NoSQL x Data Integration Data Quality & Governance Data Security
Use case : DWH optimization Business Class Economy High performance platform High value data Small/mid volumes (aggregated) Low latency/many users Structured data Commodity platform All types of data (structured & unstructured) Large volumes of data (detail level) Complex analytics / fewer users
Traditional Data Warehouse (DWH) Architecture Informatica Added Value Enterprise Applications Informatica Business Intelligence Extract Transform Load Transform Query OLTP Data Warehouse Operational Data Stores (ODS) 15
Added value from DWH Optimization Technology benefits result in business value Achieved by off-loading data processing and data storage to cheap and linearly scalable Hadoop Expensive an/or not scalable DWH resources are freed-up Technology Benefit: Linear scalability with predictable cost Total cost of ownership reduction Ability to use semi- and unstructured data-sources Business Benefit: Increase ROI from current investments Improve quality of service, meet critical SLAs Improved user experience (faster, more interactive queries) Better analysis (fresher, higher quality, larger volumes) 16
DWH Optimization. Scalable Compute Scalable data processing at low cost for all data processing Relational, Mainframe Informatica Business Intelligence Documents and Emails MPP/ Social Media, Web Logs Extract Transform Load Profile Match Machine Device, Cloud Netezza, SQL Server, Oracle, SAS 17
DWH Optimization. Scalable Storage Off-loading historic and detailed data to Hadoop DWH Detailed Historic Data # A B 2016 Aggregated Data # A B Σ Detailed via External Tables # A B 20xx Business Intelligence MPP/ # A B # A B # A B 2005 2015 2016 Archived Detailed Historic Data Netezza, SQL Server, Oracle, SAS 18
Connectivity (Mainframe, Cloud, Web Services Persistent Data Masking for sensitive data Self-service Date Profiling, Data Integration US Bank DWH Optimization Improve processing capacity for AML The Challenge. AML could not process required volumes The Result Mainframe SR (VSAM, IMS) Oracle DBMS Siebel CRM SalesForce AML PWX Metadata Manager ILM Business Glossary Informatica Big Data Edition Identify changes Audit and Control (Compliance) Data Harmonization and Transformation, Aggregation Big Data Edition End-to-end Data Lineage Business Glossary to support Data Governance Data Analyst Able to run AML an full data-set Able to quickly adjust AML algorithms to changing requirements Because of: Achieved linear scalability for data ingestion and AML via parallel execution in Hadoop Migrated COBOL AML application logic to PowerCenter 19
DWH Optimization is a great solution Requirements where you need Informatica Requirements Data Ingestion Unstructured Data Scalable, support all sources/targets, operate in all modes from batch to real-time Parse and transform Data Quality Investment Protection Skills Availability Evaluate, improve data quality in Hadoop Application must continue running after Hadoop upgrade or move to another distribution Must be easy to staff no mater skilled Big Data developers are scarce 20
Next Gen. Analytics. Beyond DWH optimization All this new data let s just spin up a Hadoop cluster. The sandbox is up experiments are so much fun!!! Now all we have to do is ingest and analyze the data Oops! So many issues with data just hand-code! Biz wants more insights let s put it in the data lake! No real business value no ROI we are STUCK! Need MPP/Hadoop developers where are they? STOP! Business cannot use it! How do we operationalize the results? Reuse?
Next Gen Analytics. Data Lake is the Answer Transaction Public Cloud Social Media Web Data Governance Analytics Query Access, Visualization Statistics Discovery Zone Data Preparation Databases Not Only SQL Hadoop MPP Appliances
Informatica v10 for the Data Lake Data Integration Data Governance Data Security Simple Visual Environment Optimized Execution & Flexible Deployment Dynamic schemas & Templates 100 s of Pre-built Transforms, Connectors & Parsers Data Quality & Profiling 360 Relationship Views Universal Metadata Catalog with End-to-end Data Lineage Self-service Collaboration Tools Business Glossary Self-Service Sensitive Data Discovery & Classification Proliferation Analysis Risk Assessment Non-intrusive Data Masking
Taking Big Data Governance to the Next Level Rich Metadata Foundation for Agile, Data-Driven Applications Data Discovery Sensitive Data Tracking Stewardship & Governance Smart Suggestions Exploration Semantic Search Relationship Discovery Live Data Map Map Knowledge Relationships Graph Rules Catalog EICof all enterprise data assets Glossary Statistics Ratings Recommendatio ns 360 degree views User Ratings All Informatica repositories 3rd party BI, Modeling, Big Data, RDBMS Applications, Business Glossary & context User ratings, Feedback, Operational stats
Next Gen. Analytics - Security Track and Protect Sensitive Private Big Data Secure data on the fly : Dynamic Data Masking (Sr. Analyst) Original Values 5992-9989-1333-5429 3724-6743-8000-2421 National ID Credit Card Blocking (IT Administrator) Masked Values xxxx-xxxx-xxxx-0093 xxxx-xxxx-xxxx-7658 (Offshore Support) Masked Values 1234-6789-1000-4422 Informatica Dynamic Data Masking policy 2233-6789-3456-5555
Do you really want to use 10 startups or ACQUIRE INGEST TRANSFORM SECURE MASTER GOVERN BLEND CONSUME Weblogs Informatica Platform v10 Data Mining Device data Dashboards Files Social Hand-coding Hand-coding Hand-coding Hand-coding Hand-coding TRADITIONAL INFRASTRUCTURE BIG DATA INFRASTRUCTURE Applications Relational Files
Why Informatica for Next Generation Analytics? Easy start ready to use libraries for data integration and data quality, out of the box connectors to data sources and targets Available skillset utilize 100000s Informatica developers available worldwide. No special skills necessary (Hadoop) Informatica as abstraction layer. No need to know Hadoop. It will talk to Hadoop for you and tell it what needs to be done! Performance and scalability data processing happens in Hadoop not Informatica grid Ease of use and support visual development, self-documenting, release and metadata management Protect your investment in case Hadoop stack changes. (TEZ, Spark, Flink to come. What is next?) 27
Introducing Informatica Next Generation Analytics Tinkoff Bank Success Story Video Cloud Modernization Data Archiving Q&A
Tinkoff Bank Data Lake The Challenge. Enrich DWH data with semi- and unstructured Clickstream Application Logs External Datasets Metadata Manager Gate 1. RAW 2. Processed (ODD) 3. Business Model (DDS) 4. Data Marts Big Data Edition Analytics, Reporting, Business Applications, Data Marts, Data Governance DWH DWH Core Data Marts The Result Enabled big data integration processes including visual development, out of the box connectivity to sources and targets including Hadoop and Pivotal Greenplum Database Enabled end-to-end big data metadata management (DBMS, Hadoop, ETL, BI) Achieved desired adoption rate by internal data consumers 29
Cloud is Disrupting IT Applications Data $204B Public Cloud in 2016 Compute Storage Source: Gartner Many applications and workloads are moving to the cloud
Hybrid Cloud is Common Approach AWS Redshift Azure SQL Azure Cloud + On-Premise Legacy RDBMS Legacy RDBMS ERP & On-Premise Apps Traditional Data Warehouse
Lift and Shift your Workloads Microsoft Azure Use Case Summary: Moving on-premises databases, systems and/or data warehouse to AWS/MS Azure-based workloads Azure SQL Amazon RDS Azure SQL Data Warehouse Amazon Redshift Cloud On premise Firewall Other Databases Your Data Integration Platform On-premise Data Warehouse 32
Hybrid App Integration / Data Warehousing Use Case Summary: Load multiple data sources from cloud and/or on premise to AWS/MS Azure using Informatica Cloud Social Media Logs IoT Azure SQL Microsoft Azure Amazon RDS Amazon Redshift Azure SQL Data Warehouse Analytics Tools Cloud On premise ERP, On-Premise Apps Firewall Legacy RDBMS Your Data Integration Platform On-Premise Data Warehouse 33
4,500 ipaas Customers 70+ OEMs Over 1000 customers 300B Transactions per month 130% growth yoy >1M Integration jobs/processes per day
25K Active Citizen Integrators 150+ ipaas Connectors 98% Renewal Rate >100 ipaas Integration Templates
Informatica Leadership Among Leading Analysts
Informatica Cloud Portfolio Cloud Data Integration Cloud Application Integration Cloud Test Data Management Data as a Service Cloud Customer 360
Introducing Informatica Next Generation Analytics Tinkoff Bank Success Story Video Cloud Modernization Data Archiving Q&A
CIO s Challenge with Legacy Applications 2 OF 3 CIOs SAY THEIR ORGANIZATIONS DO NOT HAVE A SINGLE VIEW OF LEGACY SYSTEM DATA FOR COMPLIANCE REPORTING SOURCE: NCC survey companies with over 50 IT staff 39
Annual Cost for Maintaining Legacy Applications Approximately how much does it cost your organization annually to maintain its legacy applications? (Percent of respondents, N=232) 35% 30% 29% 25% 23% 20% 19% 15% 14% 10% 5% 8% 6% 0% Less than $100,000 annually $100,000 to $500,000 to $499,999 annually $999,999 annually $1 million to $4.999 million annually $5 million or more annually Don t know Source: ESG Research Report, 2011 Data Management, Survey, August 2011.
IDC Research Maintaining legacy servers is a costly proposition Paying for licenses that are hardly ever used is costlier Management and Admin. are the real issue!
Why Organizations Keep Legacy Applications Running For which of the following reasons does your organization keep legacy applications running? (Percent of respondents, N=232, multiple responses accepted) Users still access data for reporting 59% We have plans to migrate the data to a new application Users do not want the application to be retired Regulatory compliance reasons (i.e., we are required by law to retain the data) 36% 35% 39% 0% 10% 20% 30% 40% 50% 60% 70% Source: ESG Research Report, 2011 Data Management, Survey, August 2011.
Informatica Data Archive Benefits Connectivity and Discovery Single platform to connect and retire a wide variety of applications, technologies and platforms Packaged application metadata templates and accelerators Integrated metadata discovery for unknown applications and data models Lower Costs and Improve Productivity Reduce data footprint by up to 98% Archive once, access everywhere Users access archive data with flexible access options Meet Compliance Requirements Automate audit reports with archive validate Streamline retention management with an integrated compliance manager Improve ediscovery processes with key word search Generate Retention Expiry Reports Key-word Search
INFORMATICA DATA ARCHIVE ADVANTAGES Secure can be read but not modified Highly compressed (93% in BZWBK) Scalable designed for large data volumes No maintenance costs for old systems Google like search Tools for finding keys and relationships between the tables Retention policy Single engine for all data sources Optimal use of disc space Possibility of assigning new tasks to the personnel responsible for maintenance of old systems 44
ARCHITECTURE Production and Legacy Databases Custom Apps Optimized File Archive Informatica Data Discovery BI / Reporting / SQL Tools Extract to XML or CSV Cloud ODBC/JDBC Archive and Retire Store Access 45
COMPRESSION RATES 46
Informatica Leadership Among Leading Analysts 2014 Gartner Magic Quadrant for Structured Data Archiving and Application Retirement 2015 Gartner Magic Quadrant for Structured Data Archiving and Application Retirement