Ganzheitliches Datenmanagement

Similar documents
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Integrating a Big Data Platform into Government:

The Future of Data Management

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Artur Borycki. Director International Solutions Marketing

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Luncheon Webinar Series May 13, 2013

Data Integration Checklist

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

How to avoid building a data swamp

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Informatica Platform v10 for: Next Generation Analytics Cloud Modernization Data Archiving. Presented by Ilya Gershanov

SAP and Hortonworks Reference Architecture

HDP Hadoop From concept to deployment.

The Enterprise Data Hub and The Modern Information Architecture

Mastering Big Data. Steve Hoskin, VP and Chief Architect INFORMATICA MDM. October 2015

How To Use Big Data For Business

The Future of Data Management with Hadoop and the Enterprise Data Hub

Taming the Elephant with Big Data Management. Deep Dive

How the oil and gas industry can gain value from Big Data?

Using Tableau Software with Hortonworks Data Platform

Decision Ready Data: Power Your Analytics with Great Data. Murthy Mathiprakasam

Big Data Analytics Nokia

Cisco IT Hadoop Journey

Session 0202: Big Data in action with SAP HANA and Hadoop Platforms Prasad Illapani Product Management & Strategy (SAP HANA & Big Data) SAP Labs LLC,

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Data Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015

VIEWPOINT. High Performance Analytics. Industry Context and Trends

How to Run a Successful Big Data POC in 6 Weeks

Unified Batch & Stream Processing Platform

From Lab to Factory: The Big Data Management Workbook

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Data Virtualization. Paul Moxon Denodo Technologies. Alberta Data Architecture Community January 22 nd, Denodo Technologies

Cloud Ready Data: Speeding Your Journey to the Cloud

Safe Harbor Statement

Roadmap Talend : découvrez les futures fonctionnalités de Talend

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

More Data in Less Time

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

The Data Reservoir as an enabler of differentiating Analytics initiatives

Cloudera Enterprise Data Hub in Telecom:

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

HDP Enabling the Modern Data Architecture

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Extend your analytic capabilities with SAP Predictive Analysis

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Data Virtualization A Potential Antidote for Big Data Growing Pains

PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

#TalendSandbox for Big Data

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Oracle Big Data SQL Technical Update

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Data Integration Hub

Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization

Native Connectivity to Big Data Sources in MSTR 10

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Building Your Big Data Team

Getting Real Real Time Data Integration Patterns and Architectures

Teradata s Big Data Technology Strategy & Roadmap

Informatica and our product strategy

Data Refinery with Big Data Aspects

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

How To Create A Data Science System

Hadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

Il mondo dei DB Cambia : Tecnologie e opportunita`

Bringing the Power of SAS to Hadoop. White Paper

A TECHNICAL WHITE PAPER ATTUNITY VISIBILITY

Architecting for the Internet of Things & Big Data

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

Market Overview: Big Data Integration

Modern Data Architecture for Predictive Analytics

Investor Presentation. Second Quarter 2015

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Next-Generation Cloud Analytics with Amazon Redshift

Hadoop Trends and Practical Use Cases. April 2014

Talend Big Data. Delivering instant value from all your data. Talend

Traditional BI vs. Business Data Lake A comparison

Big Data for Investment Research Management

How To Create A Business Intelligence (Bi)

Reference Architecture, Requirements, Gaps, Roles

Azure Data Lake Analytics

This Symposium brought to you by

Augmented Search for IT Data Analytics. New frontier in big log data analysis and application intelligence

Tableau s Place in a Big Data Architecture DAMA, Tableau User Group Meeting November 13, 2014

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Information Builders Mission & Value Proposition

Cloud First Does Not Have to Mean Cloud Exclusively. Digital Government Institute s Cloud Computing & Data Center Conference, September 2014

Transcription:

Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos

The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist Data Analyst Analytic Apps, Enterprise Apps, etc. Takes too long to get the data Can I trust the data? We have a lot of sensitive data Data Steward Data Engineer Business Too many one off projects Hard to build and maintain Many compliance requirements Increase Customer Loyalty Improve Fraud Detection Reduce Security Risk Social Media, Web Logs Laboratory (insights) Factory (actions) Improve Predictive Maintenance Machine Device, Cloud Data Lakes, DW, DM, NoSQL Increase Operational Efficiency

Criteria for Successful Big Data Projects Relational, Mainframe Documents and Emails Data Modeler Data Scientist Data Analyst Data Steward Data Engineer Business Analytic Apps, Enterprise Apps, etc. Self-Service Autonomy Operational Agility Increase Customer Loyalty Improve Fraud Detection Reduce Security Risk Social Media, Web Logs Laboratory (insights) Factory (actions) Improve Predictive Maintenance Machine Device, Cloud Data Lakes, DW, DM, NoSQL Increase Operational Efficiency

Introducing Informatica Big Data Management Relational, Mainframe Documents and Emails Social Media, Web Logs Machine Device, Cloud Data Modeler Big Data Integration Data Scientist Data Analyst Analytic Apps, Enterprise Apps, etc. Big Data Governance Data Steward Data Lakes, DW, DM, NoSQL Data Engineer Big Data Security Business Increase Customer Loyalty Improve Fraud Detection Reduce Security Risk Improve Predictive Maintenance Increase Operational Efficiency

The 3 Pillars of Informatica Big Data Management Big Data Integration Simple Visual Environment Optimized Execution & Flexible Deployment Dynamic schemas & Templates 100 s of Pre-built Transforms, Connectors & Parsers Big Data Governance Data Quality & Profiling 360 Relationship Views Universal Metadata Catalog with End-to-end Data Lineage Business Glossary Self-service Collaboration Tools Big Data Security Sensitive Data Discovery & Classification Proliferation Analysis Risk Assessment Non-intrusive Data Masking

Big Data Integration

Data Warehouse Optimization Phase 1 Phase 2 Relational, Mainframe 1. Offload data & ELT processing to Hadoop Data Warehouse BI Reports & Apps Documents and Emails 2. Batch load raw data (e.g. transactions, multi-structured) 6. Move high value curated data into data warehouse Social Media, Web Logs 3. Replicate changes & schemas for relational data 5. Parse & prepare (e.g. ETL, data quality) data for analysis Machine Device, Cloud 4. Collect & stream real-time machine data 7

Abstracting complexity and protecting investments Develops universal mapping Source Filter Sorter Aggregator Σ Target Design Workload & Resource Mgmt Informatica YARN Purpose built Execution Engines Native Informatica Engine map/reduce Blaze

Big Data Quality & Governance

Data Profiling on Hadoop 1. Profiling Stats: Min/Max Values, NULLs, Data Types, etc. 2. Frequency distribution 3. Value Drill-Down

Hadoop Data Domain Discovery Finding functional meaning of Hadoop Data 1. Leverage INFA rules/mapplets to identify functional meaning of Hadoop data Sensitive data (e.g. SSN, Credit Card number, etc.) Liability and Compliance risk? PHI: Protected Health Information PII: Personally Identifiable Information Scalable to look for/discover ANY Domain type 2. View/share report of data domains/sensitive data contained in Hadoop. Ability to drill down to see suspect data values.

Governance & Metadata Management 12

Metadata Manager Architecture Consolidated Metadata Catalog Data Lineage Business Glossary Business Glossary Desktop 3rd party BI Metadata Reports Metadata Repository Metadata Bookmarks Mainframe ERP Database Flat Files Data Modeling BI Tools Custom 13

End-to-End Hadoop Lineage Display lineage information about data which has been loaded or extracted from Hadoop Display lineage about map/reduce and Blaze jobs generated by Informatica Big Data Management Will show transformations Connect source to target systems Works with all supported distributions 14

End-To-End Lineage with Informatica and Cloudera Data Source Data Prep on Hadoop via Informatica Hive HQL Target BI/Analytic App https://www.youtube.com/watch?v=rf63wfn8kik 15

Data Intelligence Live Data Map 2015 Informatica. Proprietary and Confidential 16

Live Data Map: Foundation for Data Intelligence Data Discovery Sensitive Data Tracking Stewardship & Governance Smart Suggestions Exploration Semantic Search Relationship Discovery Live Data Map Map Knowledge Relationships Graph of all enterprise Rules EIC Catalog data assets Glossary Statistics Ratings Recommendations 360 degree views User Ratings All Informatica Repositories 3rd party BI, Modeling, Big Data, RDBMS Applications, Business glossary & context User Ratings, Feedback, Operational Stats

Big Data Security

Persistent Data Masking 1. Users load data into Hadoop - Masked or Unmasked 2. Security analyst uses Sensitive Data Discovery to scan and discover where sensitive data exists in Hadoop 3. Sensitive data is masked using Persistent Masking and moved to Analytics or Test Environments within the Hadoop instance or in a separate instance 4. Persistently masked data is queried by BI Analysts HBASE 1 Hadoop. HDFS 2 Hive Sensitive Data Discovery Persistent Data Masking Persistent Data Masking BI & Analytic Layer Query, Reporting, Data Mining, Predictive Analytics 3 4 Hadoop Analytics Environment Hadoop Test Environment

Dynamic Data Masking In-line Proxy Server Delivers Seamless Security Layer for Hive and Hadoop* Values Presented: BLAKE JONES KING Role-based anonymization and real-time prevention Values Presented: BL**** JO**** KI**** Business user application screen Private Information Stored Dynamic Data Masking Layer applies real-time HQL rewrites to mask returned result set Application screens and tools used by production support, DBAs, Outsourced or unauthorized workforce BLAKE JONES KING Hadoop (2)Select substring(name,1,2) *** from table1

Project Sonoma The intelligent Data Lake

A sample Data Lake architecture IT Data Scientists Analysts, Business Users Relational, Mainframe Documents and Emails Integrate Systems Operationalize Discovery results Monitor and manage Discover Data Profile Data Combine data Aggregate Develop Patterns Standardized reports Statistics Social Media, Web Logs Swamp Pool Reservoir Machine Device, Cloud Landing Zone Discovery Consumption 22

Project Sonoma The intelligent Data Lake Self-Service for Analysts Search & Discover Prepare & Publish Governance for IT Usage tracking & monitoring Lineage & Security Operate at scale DATA METADATA Self-Service Data Discovery (Portal) Raw Data Prepare (Rev) Live Data Map IT Monitoring & Tracking Published Data Sets DATA BI & Analytics Project Sonoma Demo at Informatica World 2015 23

Informatica + Hadoop The best out of two worlds SOURCE Best of Informatica 20 years DI innovations TARGET Databases, Files Best of Open Source scalable distributed computing Analytics Teams Servers & Mainframe INGEST Prepare Refine Govern DELIVER Backend DBs Social Batch Replicate Stream Archive MDM Batch Services Events Topics Analytics & Op Dashboards Sensor data EDW Mobile Apps 24

Q&A Visit us at: Thank You! BARC BI&Big Data Forum, Halle 5, Stand B36:

Big Data Edition Trial Sandbox 60 day free trial Available for Cloudera 5.0 and Hortonworks HDP 2.1.3 1 node cluster Sample data/mappings, documentation, videos Mappings can be transferred/reused Go to the Big Data Mall to download 26

Resources Project Sonoma (Data Lake) demo at Informatica World 2015 Project Atlantic (Machine Data Parsing) demo at Informatica World 2015 Informatica solutions for Big Data Informatica Big Data Management Editions Informatica Big Data Management Editions datasheet Big Data Management Deep Dive Webinar & Demo Informatica Blaze Executive Brief Big Data Relationship Manager Metadata Management with Cloudera Navigator