Big Data in Cloud Round table
Big Data Management Introduction
Traditional DI Environments Start simple. Build as you grow
Big Data World! Sentry Kerberos Knox Ranger Map Reduce Spark Stream Exec Engines Spark Tez Pig Impala Avro Security ORC HDFS Storage Layers Data Formats S3 Too many decisions to begin with Azure Blob Text RC Parquet CDH Distributions Map R Sequence Legacy HW Mongo DB Compr ession GZip LZO Relational ERP Data Hadoop No SQL HBase Red Shift BZip 2 Snappy
Big Data World with Informatica BDM Deploy anywhere No SQL Data Storage Layers Security Distributions Exec Engines Data Formats Data Compres sion Connections Configuration Data Objects Informatica Big Data Management Edition Abstract and streamline your data flow Focus on business logic not integration Build for data not technology Build once, run anywhere Mappings Build once
Overview of the Data Integration Solutions PowerCenter Big Data Management Cloud Data Integration Traditional Workloads Next-Gen Workloads Cloud & SaaS Workloads Data Warehousing Agile BI Real-time DI Data Migration Apps Integration (onprem) DW Offloading/ Optimization Data Lakes Big Data Analytics NoSQL Integration Apps Integration (Hybrid) Cloud & Hybrid DI DW & Analytics (Cloud DBs)
3 pillars of Big Data Management Single, Comprehensive and Integrated Platform for End-to-End Big Data Management Data Integration Data Quality & Governance Data Security
Custom coding vs. Informatica BDM
Custom coding vs. Informatica BDM Simple, Graphical User Interface Import and Validate Existing Power Center Mappings Ensure Ongoing Maintainability and Reuse
What s new? 10.0 Platform Dynamic Schemas, Mappings Parameterization Team Based Development / Versioning Scheduler Service Enhanced monitoring Connectivity Partitioning Big Data Exclusive Blaze Live Data Map 10.0 Update 1 PC Reuse Report Blaze Enhancements Connectivity & Partitioning Amazon EMR support Azure HDInsight support 10.1 Blaze enhancements OS Profiles SQOOP DI on Spark SQL to Mapping Live Data Map 2.0 Intelligent Data Lake
Big Data Management Cloud deployments
Challenges with on premise deployments Inflexible and static infrastructure Difficulty in keeping up with evolving technologies Lost of planning to get clusters up and running Limited in-house Hadoop expertise High Total cost of ownership
On premise deployment Setup hardware Manually scale cluster Select Hadoop distro Monitor cluster Configure cluster Design data flows
Cloud deployment Setup hardware Manually Auto scale cluster Select Hadoop distro Monitor cluster Configure cluster Design data flows
Informatica BDM: On Premise & Cloud Informatica Big Data Management Edition On-Premise Cloud
Informatica BDM: Cloud connectivity Informatica Big Data Management Edition Blob SQL Server S3 support Redshift HDFS Azure HDInsight Amazon
Azure Marketplace
BDM on HDInsight Setup 02 Node Settings 04 Database 05 Cluster 01 Basics 03 Domain 06 Infrastructure
Demonstration DEMO on BDM capabilities
DEMO Use case Industry: Entertainment Goal: Leverage Hadoop in Cloud Scenario: GetFlix is a entertainment organization that streams movies and TV shows. GetFlix relies on 3 rd party vendors to provide the ratings of various titles and individual episodes for each TV show s season. GetFlix would like to analyze this data in identifying user interests and recommend new movies/shows Challenges: Access the individual rating data in the cloud Process the data in the cloud and store it back in the cloud
Live DEMO
Summary Customer challenge Solution Informatica Features showcased Access data in the cloud and onpremise Process data in the cloud and on Hadoop Reuse components between onpremise and cloud Quickly spin off BDM environments in Microsoft Azure Rely on hybrid connectivity Rely on industry s leader in cloud & big data Abstract design-time processes from run-time Launch BDM from Azure Marketplace Use PowerExchange for Hybrid connectivity Informatica BDM on cloud Smart Executor Azure HD Insight MarketPlace
Questions??!
User Groups Informatica User Groups are a great way for you to invest in your professional development and learn about new Informatica offerings. Local Chapter Leaders manage each IUG online and via in person meetings Network and Socialize Find and share content, best practices & tips Learn about the latest technologies and solutions from Informatica Discover how colleagues and peers use Informatica https://network.informatica.com/welcome/ LEARN MORE AT IW16 : Go to the Solutions Expo Informatica Pavilion / Ecosystem & Innovation Area: Talk to regional user group leaders Learn about meeting plans Join your regional user group When: Monday 6:00pm 8:30pm Tuesday 10:45am 2:15pm Wednesday 10:30am 1:45pm Where: Moscone West Hall Level One
EMR Cluster
EMR Cluster
EC2 Nodes
Informatica Administrator
Informatica Monitoring
Hadoop Monitoring
Connections
EMR Cluster
Hive on Amazon S3
Mapping s execution on EMR
Hive on Amazon S3