Big Data Polyglot. Round table

Similar documents
Taming the Elephant with Big Data Management. Deep Dive

Data Integration Hub

Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview

Introduction to Big Data Training

QUEST meeting Big Data Analytics

Data Security in Hadoop

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

AtScale Intelligence Platform

Native Connectivity to Big Data Sources in MSTR 10

Ganzheitliches Datenmanagement

Constructing a Data Lake: Hadoop and Oracle Database United!

Oracle Big Data Fundamentals Ed 1 NEW

Big Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016

The Future of Data Management with Hadoop and the Enterprise Data Hub

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Dominik Wagenknecht Accenture

HDP Hadoop From concept to deployment.

The Future of Data Management

Apache Sentry. Prasad Mujumdar

SQL on NoSQL (and all of the data) With Apache Drill

Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Encryption and Anonymization in Hadoop

Complete Java Classes Hadoop Syllabus Contact No:

HPE Vertica & Hadoop. Tapping Innovation to Turbocharge Your Big Data. #SeizeTheData

Bringing Big Data to People

Modernizing Your Data Warehouse for Hadoop

Hadoop Ecosystem B Y R A H I M A.

Comprehensive Analytics on the Hortonworks Data Platform

Big Analytics in the Cloud. Matt Winkler PM, Big

Big Data and Industrial Internet

brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS PART 4 BEYOND MAPREDUCE...385

Workshop on Hadoop with Big Data

Communicating with the Elephant in the Data Center

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

BIG DATA TRENDS AND TECHNOLOGIES

DANIEL EKLUND UNDERSTANDING BIG DATA AND THE HADOOP TECHNOLOGIES NOVEMBER 2-3, 2015 RESIDENZA DI RIPETTA - VIA DI RIPETTA, 231 ROME (ITALY)

Hadoop & Spark Using Amazon EMR

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Maximizing Hadoop Performance with Hardware Compression

Self-service BI for big data applications using Apache Drill

Virtual Machine (VM) These VMs are to be used for teaching: they are not workstations for calculation.

SAS BIG DATA SOLUTIONS ON AWS SAS FORUM ESPAÑA, OCTOBER 16 TH, 2014 IAN MEYERS SOLUTIONS ARCHITECT / AMAZON WEB SERVICES

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Big Data Management and Security

Implement Hadoop jobs to extract business value from large and varied data sets

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Cloudera Impala: A Modern SQL Engine for Hadoop Headline Goes Here

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

The Internet of Things and Big Data: Intro

From Lab to Factory: The Big Data Management Workbook

Moving From Hadoop to Spark

Hadoop Job Oriented Training Agenda

Parquet. Columnar storage for the people

ITG Software Engineering

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet

Upcoming Announcements

Peers Techno log ies Pv t. L td. HADOOP

Why Spark on Hadoop Matters

Hortonworks CISC Innovation day

Federated SQL on Hadoop and Beyond: Leveraging Apache Geode to Build a Poor Man's SAP HANA. by Christian

COURSE CONTENT Big Data and Hadoop Training

Tap into Hadoop and Other No SQL Sources

ENABLING GLOBAL HADOOP WITH EMC ELASTIC CLOUD STORAGE

Self-service BI for big data applications using Apache Drill

Big Data and Hadoop. Module 1: Introduction to Big Data and Hadoop. Module 2: Hadoop Distributed File System. Module 3: MapReduce

Professional Hadoop Solutions

Cisco IT Hadoop Journey

Apache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc.

Cloud Big Data Architectures

HDP Enabling the Modern Data Architecture

Hadoop implementation of MapReduce computational model. Ján Vaňo

Real Time Big Data Processing

Big Data projects and use cases. Claus Samuelsen IBM Analytics, Europe

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

Dansk IT Big Data i de største danske banker

Accelerating Enterprise Big Data Success. Tim Stevens, VP of Business and Corporate Development Cloudera

Lars Francke Diplom Wirtschaftsinformatiker (FH) Sülldorfer Kirchenweg 34

Impala: A Modern, Open-Source SQL

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Hadoop, the Data Lake, and a New World of Analytics

How To Extend An Enterprise Bio Solution

BIG DATA SERIES: HADOOP DEVELOPER TRAINING PROGRAM. An Overview

Interactive data analytics drive insights

#TalendSandbox for Big Data

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

Big Data Course Highlights

Automating Big Data Benchmarking for Different Architectures with ALOJA

Informatica and the Vibe Virtual Data Machine

Lofan Abrams Data Services for Big Data Session # 2987

Architecture & Experience

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Fighting Cyber Fraud with Hadoop. Niel Dunnage Senior Solutions Architect

Information Builders Mission & Value Proposition

Transcription:

Big Data Polyglot Round table

Big Data Management Introduction

Traditional DI Environments Start simple. Build as you grow

Big Data World! Sentry Kerberos Knox Ranger Map Reduce Spark Stream Exec Engines Spark Tez Pig Impala Avro Security ORC HDFS Storage Layers Data Formats S3 Too many decisions to begin with Azure Blob Text RC Parquet CDH Distributions Map R Sequence Legacy HW Mongo DB Compr ession GZip LZO Relational ERP Data Hadoop No SQL HBase Red Shift BZip 2 Snappy

Big Data World with Informatica BDM Deploy anywhere No SQL Data Storage Layers Security Distributions Exec Engines Data Formats Data Compres sion Connections Configuration Data Objects Informatica Big Data Management Edition Abstract and streamline your data flow Focus on business logic not integration Build for data not technology Build once, run anywhere Mappings Build once

Custom coding vs. Informatica BDM

Custom coding vs. Informatica BDM Simple, Graphical User Interface Import and Validate Existing Power Center Mappings Ensure Ongoing Maintainability and Reuse

What s new? 10.0 Platform Dynamic Schemas, Mappings Parameterization Team Based Development / Versioning Scheduler Service Enhanced monitoring Connectivity Partitioning Big Data Exclusive Blaze Live Data Map 10.0 Update 1 PC Reuse Report Blaze Enhancements Connectivity & Partitioning Amazon EMR support Azure HDInsight support 10.1 Blaze enhancements OS Profiles SQOOP DI on Spark SQL to Mapping Live Data Map 2.0 Intelligent Data Lake

Polyglot computing Introduction

Why Informatica Big Data Management? Mappings Business Logic Informatica Big Data Management Solution Informatica Native SQL Pushdown Hadoop Pushdown Polyglot Computing

Polyglot computing Informatica Big Data Management Data Connectivity Data Integration Data Quality Data Governance Data Masking Smart Executor Native Hadoop Cluster SQL Hive on Map Reduce Hive On Tez Hive On Spark Spark Blaze Informatica Native Engine Map Reduce Tez Spark Core Spark Core INFA Engine Database Pushdown YARN HDFS

Polyglot computing Informatica Big Data Management Smart Executor Hive on Map Reduce Hive on Tez Hive on Spark Spark Blaze Map Reduce Tez Spark Core Spark Core INFA ENGINE YARN HDFS

Blaze AND OR? Open Source Innovation

Blaze Breadth of functionality Resource Utilization Abstraction Performance Logging Monitoring

Demonstration DEMO on BDM capabilities

DEMO Use case Industry: Government Use-case: Data Integration on Hadoop Scenario: Govt of Genmark, has established sensors to monitor road traffic and pollution. Traffic is measured by the number of vehicles moved between any two given points in a given timeframe. Pollution data on the other hand is per geographical location. Govt of Genmark would like to leverage Hadoop for processing Challenges: Leverage Hadoop for processing large volumes of data without having to deal with open-source complexities Some processes are simple, some are complex. How to manage them together? Abstract processes from upcoming incubating open-source technologies

Live DEMO

Summary Customer challenge Solution Informatica Features used Use Hadoop without complexities Design processes independent of run-time engine Consolidated monitoring for all Data Integration processes Abstract business logic from changes in open-source technologies Use Informatica BDM for GUI based mapping development Separate design-time and run-time aspects Go to single consolidated monitoring console Leverage Smart Executor to dynamically determine the right engine Mapping designer and developer Polyglot engine Informatica Monitoring Smart Executor

Performance of Blaze execution SF 100 : 700 GB SF 300 : 1.2 TB SF 5000: 3.4 TB SF10000: 7 TB Google search: Why we love Informatica Big Data Management

Performance of Spark execution We are working on it. Will share with you when we have it

Questions??!

User Groups Informatica User Groups are a great way for you to invest in your professional development and learn about new Informatica offerings. Local Chapter Leaders manage each IUG online and via in person meetings Network and Socialize Find and share content, best practices & tips Learn about the latest technologies and solutions from Informatica Discover how colleagues and peers use Informatica https://network.informatica.com/welcome/ LEARN MORE AT IW16 : Go to the Solutions Expo Informatica Pavilion / Ecosystem & Innovation Area: Talk to regional user group leaders Learn about meeting plans Join your regional user group When: Monday 6:00pm 8:30pm Tuesday 10:45am 2:15pm Wednesday 10:30am 1:45pm Where: Moscone West Hall Level One