Big Data Polyglot Round table
Big Data Management Introduction
Traditional DI Environments Start simple. Build as you grow
Big Data World! Sentry Kerberos Knox Ranger Map Reduce Spark Stream Exec Engines Spark Tez Pig Impala Avro Security ORC HDFS Storage Layers Data Formats S3 Too many decisions to begin with Azure Blob Text RC Parquet CDH Distributions Map R Sequence Legacy HW Mongo DB Compr ession GZip LZO Relational ERP Data Hadoop No SQL HBase Red Shift BZip 2 Snappy
Big Data World with Informatica BDM Deploy anywhere No SQL Data Storage Layers Security Distributions Exec Engines Data Formats Data Compres sion Connections Configuration Data Objects Informatica Big Data Management Edition Abstract and streamline your data flow Focus on business logic not integration Build for data not technology Build once, run anywhere Mappings Build once
Custom coding vs. Informatica BDM
Custom coding vs. Informatica BDM Simple, Graphical User Interface Import and Validate Existing Power Center Mappings Ensure Ongoing Maintainability and Reuse
What s new? 10.0 Platform Dynamic Schemas, Mappings Parameterization Team Based Development / Versioning Scheduler Service Enhanced monitoring Connectivity Partitioning Big Data Exclusive Blaze Live Data Map 10.0 Update 1 PC Reuse Report Blaze Enhancements Connectivity & Partitioning Amazon EMR support Azure HDInsight support 10.1 Blaze enhancements OS Profiles SQOOP DI on Spark SQL to Mapping Live Data Map 2.0 Intelligent Data Lake
Polyglot computing Introduction
Why Informatica Big Data Management? Mappings Business Logic Informatica Big Data Management Solution Informatica Native SQL Pushdown Hadoop Pushdown Polyglot Computing
Polyglot computing Informatica Big Data Management Data Connectivity Data Integration Data Quality Data Governance Data Masking Smart Executor Native Hadoop Cluster SQL Hive on Map Reduce Hive On Tez Hive On Spark Spark Blaze Informatica Native Engine Map Reduce Tez Spark Core Spark Core INFA Engine Database Pushdown YARN HDFS
Polyglot computing Informatica Big Data Management Smart Executor Hive on Map Reduce Hive on Tez Hive on Spark Spark Blaze Map Reduce Tez Spark Core Spark Core INFA ENGINE YARN HDFS
Blaze AND OR? Open Source Innovation
Blaze Breadth of functionality Resource Utilization Abstraction Performance Logging Monitoring
Demonstration DEMO on BDM capabilities
DEMO Use case Industry: Government Use-case: Data Integration on Hadoop Scenario: Govt of Genmark, has established sensors to monitor road traffic and pollution. Traffic is measured by the number of vehicles moved between any two given points in a given timeframe. Pollution data on the other hand is per geographical location. Govt of Genmark would like to leverage Hadoop for processing Challenges: Leverage Hadoop for processing large volumes of data without having to deal with open-source complexities Some processes are simple, some are complex. How to manage them together? Abstract processes from upcoming incubating open-source technologies
Live DEMO
Summary Customer challenge Solution Informatica Features used Use Hadoop without complexities Design processes independent of run-time engine Consolidated monitoring for all Data Integration processes Abstract business logic from changes in open-source technologies Use Informatica BDM for GUI based mapping development Separate design-time and run-time aspects Go to single consolidated monitoring console Leverage Smart Executor to dynamically determine the right engine Mapping designer and developer Polyglot engine Informatica Monitoring Smart Executor
Performance of Blaze execution SF 100 : 700 GB SF 300 : 1.2 TB SF 5000: 3.4 TB SF10000: 7 TB Google search: Why we love Informatica Big Data Management
Performance of Spark execution We are working on it. Will share with you when we have it
Questions??!
User Groups Informatica User Groups are a great way for you to invest in your professional development and learn about new Informatica offerings. Local Chapter Leaders manage each IUG online and via in person meetings Network and Socialize Find and share content, best practices & tips Learn about the latest technologies and solutions from Informatica Discover how colleagues and peers use Informatica https://network.informatica.com/welcome/ LEARN MORE AT IW16 : Go to the Solutions Expo Informatica Pavilion / Ecosystem & Innovation Area: Talk to regional user group leaders Learn about meeting plans Join your regional user group When: Monday 6:00pm 8:30pm Tuesday 10:45am 2:15pm Wednesday 10:30am 1:45pm Where: Moscone West Hall Level One