Getting Real Real Time Data Integration Patterns and Architectures



Similar documents
End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Luncheon Webinar Series May 13, 2013

How To Make Data Streaming A Real Time Intelligence

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Informatica and our product strategy

Ganzheitliches Datenmanagement

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Integrating a Big Data Platform into Government:

CAPTURING & PROCESSING REAL-TIME DATA ON AWS

Real time information -Philips case

Find the Information That Matters. Visualize Your Data, Your Way. Scalable, Flexible, Global Enterprise Ready

Data Integration Hub

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Real-time Big Data Analytics with Storm

NEEDLE STACKS & BIG DATA: USING EVENT STREAM PROCESSING FOR RISK, SURVEILLANCE & SECURITY ANALYTICS IN CAPITAL MARKETS

Traditional BI vs. Business Data Lake A comparison

ENABLING OPERATIONAL BI

Enabling Real-Time Sharing and Synchronization over the WAN

Testing Big data is one of the biggest

Cloud Integration and the Big Data Journey - Common Use-Case Patterns

The Future of Data Management

BIG DATA FOR MEDIA SIGMA DATA SCIENCE GROUP MARCH 2ND, OSLO

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Architecting for the Internet of Things & Big Data

The Enterprise Data Hub and The Modern Information Architecture

An Integrated Big Data & Analytics Infrastructure June 14, 2012 Robert Stackowiak, VP Oracle ESG Data Systems Architecture

Data Virtualization A Potential Antidote for Big Data Growing Pains

STREAM PROCESSING AT LINKEDIN: APACHE KAFKA & APACHE SAMZA. Processing billions of events every day

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

MDM and Data Warehousing Complement Each Other

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Ten Things You Need to Know About Data Virtualization

CLOUD BASED SEMANTIC EVENT PROCESSING FOR

Converging Technologies: Real-Time Business Intelligence and Big Data

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Unified Batch & Stream Processing Platform

COURSE 20463C: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Implementing a Data Warehouse with Microsoft SQL Server

An Oracle White Paper March Best Practices for Real-Time Data Warehousing

Real Time Big Data Processing

The Purview Solution Integration With Splunk

Towards Smart and Intelligent SDN Controller

Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes

Azure Data Lake Analytics

Service Oriented Data Management

Implementing a Data Warehouse with Microsoft SQL Server

Enterprise Information Integration (EII) A Technical Ally of EAI and ETL Author Bipin Chandra Joshi Integration Architect Infosys Technologies Ltd

More Data in Less Time

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Cloud Ready Data: Speeding Your Journey to the Cloud

Implement a Data Warehouse with Microsoft SQL Server 20463C; 5 days

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Microsoft. Course 20463C: Implementing a Data Warehouse with Microsoft SQL Server

Implementing a Data Warehouse with Microsoft SQL Server

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

HDP Hadoop From concept to deployment.

Time-Series Databases and Machine Learning

Reference Architecture, Requirements, Gaps, Roles

Big Data Use Case: Business Analytics

SAP and Hortonworks Reference Architecture

Modern IT Operations Management. Why a New Approach is Required, and How Boundary Delivers

Using Tableau Software with Hortonworks Data Platform

Streaming Analytics A Framework for Innovation

Emerging Technologies Shaping the Future of Data Warehouses & Business Intelligence

BIG DATA TECHNOLOGY. Hadoop Ecosystem

XpoLog Center Suite Data Sheet

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Deploying an Operational Data Store Designed for Big Data

Reactive Applications: What if Your Internet of Things has 1000s of Things?

Data Integration for the Real Time Enterprise

Big Data: What You Should Know. Mark Child Research Manager - Software IDC CEMA

A Next-Generation Analytics Ecosystem for Big Data. Colin White, BI Research September 2012 Sponsored by ParAccel

Comprehensive Analytics on the Hortonworks Data Platform

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Implementing a Data Warehouse with Microsoft SQL Server MOC 20463

COURSE OUTLINE MOC 20463: IMPLEMENTING A DATA WAREHOUSE WITH MICROSOFT SQL SERVER

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Information Builders Mission & Value Proposition

Business Intelligence in Microservice Architecture. Debarshi bol.com

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

The Lab and The Factory

Cloudera Enterprise Data Hub in Telecom:

2015 Analyst and Advisor Summit. Advanced Data Analytics Dr. Rod Fontecilla Vice President, Application Services, Chief Data Scientist

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Informatica PowerCenter The Foundation of Enterprise Data Integration

CitusDB Architecture for Real-Time Big Data

Advantage Integration Strategy Integrating for the Enterprise

The Synergy of SOA, Event-Driven Architecture (EDA), and Complex Event Processing (CEP)

Transcription:

Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture Conference, May 1, 2014, Washington, DC

The World has Changed

User Expectations MORE AGILITY RIGHT Time Immediate Response Times All Data INSTANT TRUST PROACTIV E vs. REACTIVE 100% Uptime One Place Self- Service Fresh Information

Representative Use Cases Sensor Monitoring Customer Interaction Security Asset Optimization

Changing Perspectives on Data It is no longer sufficient to view information after the fact. Business demands information sooner, with more accuracy, in order to meet competitive and regulatory demands. Business needs to respond to threats and opportunities sooner. Reduce decision latency. Proactive alerts and notifications. Improve TTA (time to answer).

Traditional Data Management Approaches Act Analyze Data Integratio n EDW BI Store Valuable for: Reporting Historical Activity Strategic Analysis

The Challenges with Traditional Approaches Act Analyze Store Takes too long to deliver what is needed. Lots of wait and waste in the process. No common and trusted data access. Information is missing or is stale / delayed. Too much decision latency.

Next Generation Data Integration Real-Time Design Patterns and Architectural Approaches

A Shift in Thinking is Needed Need to shift from building large, monolithic applications to smaller sets of distributed micro-applications based on the principles of Reactive Applications *. Resilient Scalable Event Driven Responsive Move away from a store first approach; provide the ability to process event data as it arrives. Focus on hybrid architectures that facilitate both batch and real-time processing. * See: http://www.reactivemanifesto.org/#the-need-to-go-reactive

Reactive Applications: Characteristics Resilient Able to recover at all levels. Utilize fine grained resilience on the component level. Bulkhead pattern. Scalable Avoid contention on shared resources. Scale out or up as needed (without rewrites). Maintain programming model as system is scaled. Event-Driven System communicate via events. Loosely coupled, asynchronous, Amdahl s Law. Efficient use of resources. Responsive Honor response time guarantees regardless of load. Provide users with a rich, interactive experience. Observable models, event streams, stateful clients. * See: http://www.reactivemanifesto.org/#the-need-to-go-reactive

Sample Architectural Approach: Reactive Applications Data Warehouse Hadoop / NoSQL Analytics Event Based Applications Event Processing Streaming Analytics RulePoint Ultra Real Time Stream Transport / Delivery Messaging Ultra Messaging Stream Transformation B2B Data Transformation CDC / Data Access CDC PWX Data PowerCenter Integration Streaming Collection Vibe Data Stream Power Exchange Various Source Applications / Technologies Operational Data (Field Devices, Applications, Clickstream, IoT, logs, etc.)

Resulting Activity Based Intelligence Process Proactive actions instead of reactive. EVENT S Action ALERT S OI System DATA Allows the end-user to define conditions and rules through selfservice capabilities. Users are pushed the information they need, when they need it, in the system that they need it.

Sample Big Data Reference Architectures Real-Time Component * Source: http://hortonworks.com/hdp/ * Source: http://www.cloudera.com/content/cloudera/en/products-and-services/

Hybrid Architecture: Batch Plus Real-Time Historical Batch Computation Batch Map / Reduce, YARN Data Analytics Long term Persistence, High Latency e.g. Purchase history analysis. Data Sources (Devices, Apps, Clickstream, IoT, logs, etc.) Distributed Real- Time Computation Real-Time Continuous Computations Streaming Analytics / Event Processing Incremental, Low Latency e.g. Sensor / infrastructure monitoring. Data Targets (Dashboards, BI, Mobile, etc.) Big Data Supply Chain

Stream Collection Separate from batch or bulk data loading. Involves the collection of event data ( streams ) as they occur, from various endpoints, systems, and people. Multiple options available: Micro-batch or near real-time data integration. Data integration hub pattern. Real-time collection. Data replication, etc. Number of factors to look at when determining the right pattern to utilize.

Stream Collection: Replication Utilize replication beyond the copying of data from one data store to another. Console Event-enable back-end data stores. Non-intrusively detect changes in data, publish data changes to one or more targets. Real-time delivery of the latest data changes to target systems. Source System High Speed Extraction EXTRACT Checkpoint Intermediate Files Committed SQL Apply Merge Apply Audit Apply APPLY Checkpoint Target System High Speed Parallel Apply SERVER MANAGER http:// SERVER MANAGER

Stream Collection: Data Integration Hub Pattern Eliminate point-to-point collection / delivery interfaces. Provide a location independent mechanism for data producers (and consumers) to talk to one another. Publish and Subscribe Manage data delivery impedance mismatches. Provide self-service capabilities. Centralize data quality, masking, transformation logic.

Data Integration Hubs: Beyond Collection

Stream Collection: Distributed Agents Distribute collection across thousands of endpoints. Perform filtering, transformation, etc. close to the source. Focus on daemon-less or broker-less designs for improved performance and scalability. Provide varying qualities of service. Streaming, guaranteed, etc. Allow for dynamic configuration. Sources Stream Node Stream Node Stream Node Stream Node Stream Node Stream Node Targets

Stream Collection: Distributed Agents with Collectors Local Hub Agent Streaming Data Collection Regional Hub Central Hub Event Processin g Real Time Actions Agent Data Integration EDW Agent Edge data filtering and processing Data Transfer HDFS Agent

Event Streaming Analytics Execute logic against real-time streams. Utilize streaming language constructs. Logic may be executed at a point-in-time, or over time. Temporal reasoning. Join or merge multiple streams together for real-time pattern recognition, correlation, etc. across data sources. Distributed Real- Time Computation RulePoint Timely and contextual. Augment real-time streams with historical context.

Event Delivery Data Integration Hub Allow data consumers to subscribe to data previously pushed to the hub. Batch + near real-time feed. Data Integration Feed content into back-end systems through application interfaces. Batch + near real-time feed. Streaming Delivery Push content to end applications, dashboards, etc. Content may consist of derived or raw events. Near real-time + real-time feed.

Lambda Architecture Source: http://jameskinley.tumblr.com/post/37398560534/the-lambda-architecture-principles-for-architecting Data is distributed to both a Batch Layer and Speed Layer for processing. Batch layer manages the append-only master set of raw data. Serving Layer indexes batch views for lowlatency queries. Speed Layer covers recent data not in the Batch Layer. Queries merge results from the batch and realtime views.

Data Security with Data Integration

Architectural Implications

Questions? www.operationalintelligenc e.me