Taming the Elephant with Big Data Management. Deep Dive

Similar documents

Az adatintegráció értéke nagyvállalatok számára

BUS05 The Evolution of Data Integration. John Motler Principal Sales Consultant Informatica

Decision Ready Data: Power Your Analytics with Great Data. Murthy Mathiprakasam

Data Integration Hub

Ganzheitliches Datenmanagement

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Enterprise Data Integration The Foundation for Business Insight

IPI*GrammTech Original Price List

Data Integration Checklist

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Oracle Database 12c Plug In. Switch On. Get SMART.

Informatica Platform v10 for: Next Generation Analytics Cloud Modernization Data Archiving. Presented by Ilya Gershanov

<Insert Picture Here> Oracle BI Standard Edition One The Right BI Foundation for the Emerging Enterprise

Automated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

IMS Application Retirement Think the Unthinkable

Apache Sentry. Prasad Mujumdar

DATA INNOVATION. Pervasive Data Integrator Migrate, Connect and Integrate to Anything!

The ESB and Microsoft BI

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Informatica and the Vibe Virtual Data Machine

Constructing a Data Lake: Hadoop and Oracle Database United!

Luncheon Webinar Series May 13, 2013

Cloud Ready Data: Speeding Your Journey to the Cloud

The Digital Enterprise Demands a Modern Integration Approach. Nada daveiga, Sr. Dir. of Technical Sales Tony LaVasseur, Territory Leader

Introduction to Oracle Business Intelligence Standard Edition One. Mike Donohue Senior Manager, Product Management Oracle Business Intelligence

Oracle Data Integration: CON7926 Oracle Data Integration: A Crucial Ingredient for Cloud Integration

IBM Optim. The ROI of an Archiving Project. Michael Mittman Optim Products IBM Software Group IBM Corporation

Self-service BI for big data applications using Apache Drill

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Self-service BI for big data applications using Apache Drill

Artur Borycki. Director International Solutions Marketing

The Future of Data Management

IBM Cognos 8 Business Intelligence Reporting Meet all your reporting requirements

ENTERPRISE EDITION ORACLE DATA SHEET KEY FEATURES AND BENEFITS ORACLE DATA INTEGRATOR

INFORMATICA WORLD TOUR August 2012

Cisco IT Hadoop Journey

Hadoop Ecosystem B Y R A H I M A.

WebSphere Cast Iron Cloud integration

5 Ways Informatica Cloud Data Integration Extends PowerCenter and Enables Hybrid IT. White Paper

The Future of Data Management with Hadoop and the Enterprise Data Hub

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

Data Security in Hadoop

AtScale Intelligence Platform

JBoss Data Services. Enabling Data as a Service with. Gnanaguru Sattanathan Twitter:@gnanagurus Website: bushorn.com

Lofan Abrams Data Services for Big Data Session # 2987

Changing the world with SOA? Lalo Steinmann Enterprise Technology Architect

IBM WebSphere Cast Iron Cloud integration

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

HDP Enabling the Modern Data Architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Jitterbit Technical Overview : Salesforce

Market Overview: Big Data Integration

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

Salesforce.com and MicroStrategy. A functional overview and recommendation for analysis and application development

Cisco IT Hadoop Journey

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

Oracle Data Integration: CON7920 Making the Move to Oracle Data Integrator

SAS Enterprise Data Integration Server - A Complete Solution Designed To Meet the Full Spectrum of Enterprise Data Integration Needs

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

What is new in ArcGIS 10.2 for Server. Nikki Golding

Contents. Pentaho Corporation. Version 5.1. Copyright Page. New Features in Pentaho Data Integration 5.1. PDI Version 5.1 Minor Functionality Changes

Connecting the Dots. The Progress DataDirect Vision & Roadmap. Brad Wright Product Management October 7, 2013

What s Happening to the Mainframe? Mobile? Social? Cloud? Big Data?

Oracle Big Data Fundamentals Ed 1 NEW

HDP Hadoop From concept to deployment.

ORACLE DATA INTEGRATOR ENTERPRISE EDITION

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

Data Warehouse Center Administration Guide

Dominik Wagenknecht Accenture

Integration in Action using JBoss Middleware. Ashokraj Natarajan - Cognizant

Enterprise Data Integration

Native Connectivity to Big Data Sources in MSTR 10

SQL on NoSQL (and all of the data) With Apache Drill

Focus on the business, not the business of data warehousing!

How To Get A Free Microsoft Powerbook From Acedo For A Year For Free On A Discounted Price On A Microsoft Microsoft Server (For A Limited Time) For A Month (For Free) On A 2Nd Generation Microsoft

IBM WebSphere Cast Iron Cloud integration

BUSINESSOBJECTS DATA INTEGRATOR

SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI. May 2013

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Jitterbit Technical Overview : Microsoft Dynamics AX

WHITE PAPER. Data Migration and Access in a Cloud Computing Environment INTELLIGENT BUSINESS STRATEGIES

Big Data Management and Security

Teradata s Big Data Technology Strategy & Roadmap

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

IBM WebSphere Cast Iron Cloud integration

The Logical Data Warehouse

Session 4002: Data Provisioning from SAP and non-sap Data Sources to SAP HANA for both Real time and Batch type Data Replication

Next Generation Solutions for Indian Railways. Sundar Ram VP, Technology Sales Consulting

The Enterprise Data Hub and The Modern Information Architecture

Attunity Integration Suite

SQL Server What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

Welkom! Copyright 2014 Oracle and/or its affiliates. All rights reserved.

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

IBM InfoSphere Optim Test Data Management

IBM InfoSphere Optim Data Masking solution

Sisense. Product Highlights.

Transcription:

Taming the Elephant with Big Data Management Deep Dive

Big Data Management Introduction

Safe Harbor The information being provided today is for informational purposes only. The development, release and timing of any Informatica product or functionality described today remain at the sole discretion of Informatica and should not be relied upon in making a purchasing decision. Statements made today are based on currently available information, which is subject to change. Such statements should not be relied upon as a representation, warranty or commitment to deliver specific products or functionality in the future

Overview of Data Integration Solutions PowerCenter Big Data Management Cloud Data Integration Traditional Workloads Next-Gen Workloads Cloud & SaaS Workloads Data Warehousing Agile BI Real-time DI Data Migration Apps Integration (onprem) DW Offloading/ Optimization Data Lakes Big Data Analytics NoSQL Integration Apps Integration (Hybrid) Cloud & Hybrid DI DW & Analytics (Cloud DBs)

Informatica s big data Journey 2012 2012 1 st release of Informatica Big Data Edition 1 st Data Integration Platform to Natively execute on Hadoop Support for Map Reduce Support for HDFS/Hive/HBase Profile Natively on Hadoop Map Reduce Processing & Resource Management HDFS Distributed Storage Hadoop 1.0

Informatica s big data Journey 2016 Polyglot computing: Map Reduce, Blaze, Tez, Spark Informatica Big Data Management Smart Executor Multi-distribution support on both on- Hive on Map Reduce Hive on Tez Hive on Spark Spark Blaze prem and cloud End to End Big Data Map Reduce Tez Spark Core Spark Core INFA ENGINE Management solutions YARN HDFS

Big data modes of execution Native Hadoop Pushdown Run on Informatica Node(s) Connect to Hadoop sources/targets Run on Hadoop cluster Connect to Hadoop sources/targets Connect to non- Hadoop sources/targets

Why Informatica BDM? Informatica Mappings Business logic Informatica Big Data Management Solution Informatica Native SQL Pushdown Map Reduce Hadoop Pushdown Tez Spark Blaze Polyglot Computing

Big Data Challenges 36% 33% 26% 26% Obtaining Skills and capabilities needed Security, Privacy & Data Quality Integrating multiple data sources Integrating big data technology with existing infrastructure Mapping based development PC Reuse SQL to Mapping Kerberos Support Sentry / Ranger Support Data masking, OS Profiles DQ, Profiling on Hadoop Power Exchange Data Processor SQOOP On-Prem distro support Cloud distro support Source: Gartner

3 pillars of Informatica Big Data Management Single, Comprehensive and Integrated Platform for End-to-End Big Data Management Data Integration Data Quality & Governance Data Security

Universal connectivity WebSphere MQ JMS MSMQ SAP NetWeaver XI Oracle DB2 UDB DB2/400 SQL Server Sybase ADABAS Datacom DB2 IDMS IMS Word, Excel PDF StarOffice WordPerfect Email (POP, IMPA) HTTP Pivotal Vertica Netezza Web Services TIBCO webmethods Informix Teradata Netezza ODBC JDBC VSAM C-ISAM Binary Flat Files Tape Formats Flat files ASCII reports HTML RPG ANSI LDAP Teradata Aster JD Edwards Lotus Notes Oracle E-Business PeopleSoft Salesforce CRM Force.com RightNow NetSuite EDI X12 EDI-Fact RosettaNet HL7 HIPAA XML LegalXML IFX cxml Facebook Twitter LinkedIn Kapow ADP Hewitt SAP By Design Oracle OnDemand AST FIX SWIFT Cargo IMP MVR 100+ PRE-BUILT PARSERS 200+ PRE-BUILT CONNECTORS Out of the Box BUSINESS RULES AND DATA STANDARDIZATION

Pre-Built Parsers for Industry Standards Data Storage & Transport Formats Industry Standard Formats Organizational Formats Informatica IDE XML Delimited Files Financial Services PDF JSON Healthcare Word AVRO Hadoop Cluster EDI Excel Parquet

SQOOP JDBC based universal connectivity to many sources No need for installation of database clients on Hadoop cluster to read / write data Seamless integration into Informatica mappings Integration at both connection and data object level Works similar to External Loaders in PowerCenter

Profiling on Hadoop Statistics to identify anomalies Value & Pattern Analysis Drill down analysis Multi tenancy Analyst Informatica Native Hadoop Pushdown

Data Quality on Hadoop Address validation Parse Match Standardize Data Quality Informatica Native Hadoop Pushdown

Security has many aspects Application Multi-tenancy + Infrastructure Data Encryption Data Masking + Authentication Authorization Auditing Monitoring http://blogs.informatica.com/2015/07/24/bigdatasecurity-2/

Authentication: Kerberos Informatica BDM Supports: Kerberos authentication in INFA domains Connecting to Kerberos enabled Hadoop clusters Industry standard authentication for Hadoop clusters 360 O support: Client & Server Metadata access & Data access Polyglot engines: Hive, Blaze & Spark modes

Blaze Security Integration Ranger/Sentry Blaze Executor Blaze Runtime Blaze Container Mapping at runtime (in-memory) Source Transforms Target Hive Metastore HDFS Service / Hive Server 2 Optimizer call Ranger/Sentry HDFS Data files Informatica node Hadoop Cluster

Informatica Monitoring 1 2

Informatica Monitoring 1 2

Informatica Monitoring 1 2 3

Data Masking Supports Persistent Data Masking 16 different techniques supported including SSN Mask sensitive data while ingesting and processing Credit Card First & Last names, Emails Polyglot engine: Supported in Native mode Supported in Hive mode Supported in Blaze mode

Multi-tenancy Application Binding Bind multiple Informatica users to one or more system accounts System accounts can be OS / Hadoop accounts Primarily used in batch use-cases, mappings User Binding Also known as pass through security Bind individual Informatica users to their corresponding OS / Hadoop accounts Primarily used in BI use-cases, data profiling

3 pillars of Informatica Big Data Management Single, Comprehensive and Integrated Platform for End-to-End Big Data Management SQOOP Blaze DI on Spark Data Integration SQOOP for Profiling Blaze for Profiling JDBC for reference data* Data Quality & Governance Kerberos Sentry / Ranger Data Masking Data Security

Deep Dive Hand s on

DEMO Use case Industry: Airlines Use-case: DWH Optimization Scenario: INFA Air receives information from multiple airports on the expected / actual schedules of various flights. They need to consolidate this information into a Hadoop environment to perform analytics such as flight-on-time analysis Challenges: Data is collected in various formats with various intervals: Some provide in flat files and some are staged in Oracle table All this data is ingested into a Hive table for cleansing and analysis The data from hive table is subsequently sent to alerting system to send individual alerts for travelers

Lab environment Private Network Hadoop Cluster Informatica Server Hadoop Node 1 Hadoop Node 2 Informatica Client

Login credentials Lab access: https://informatica.instructorled.training Access code: 34762748xx Hadoop Node 1 Hadoop Node 2 Host name Username Password psvrl65iw2016hdp00 1 psvrl65iw2016hdp00 2 iw2016 iw2016 iw2016 iw2016 INFA Server psvrl65iw2016i1001 iw2016 iw2016 INFA Client psvw7iw2016i1001 Administrator iw2016 Administrator, Monitoring Administrator Administrator Desktop tools:

Logging into the lab

Logging into the lab

Overview of labs Lab 1 High speed Ingestion in pushdown mode Read from flat file Read from Oracle Union the data Write to hive Lab 2 Extraction with schema-on-read Read from Hive Write to flat file Dynamically update the schema Use Blaze

Questions??!

User Groups Informatica User Groups are a great way for you to invest in your professional development and learn about new Informatica offerings. Local Chapter Leaders manage each IUG online and via in person meetings Network and Socialize Find and share content, best practices & tips Learn about the latest technologies and solutions from Informatica Discover how colleagues and peers use Informatica https://network.informatica.com/welcome/ LEARN MORE AT IW16 : Go to the Solutions Expo Informatica Pavilion / Ecosystem & Innovation Area: Talk to regional user group leaders Learn about meeting plans Join your regional user group When: Monday 6:00pm 8:30pm Tuesday 10:45am 2:15pm Wednesday 10:30am 1:45pm Where: Moscone West Hall Level One