Large Scale/Big Data Federation & Virtualization: A Case Study

Similar documents

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

TRANSFORM BIG DATA INTO ACTIONABLE INFORMATION

So What s the Big Deal?

Reference Architecture, Requirements, Gaps, Roles

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Big Data Analytics Nokia

Applications for Big Data Analytics

Building Scalable Big Data Pipelines

How To Handle Big Data With A Data Scientist

Big Data and Analytics in Government

Making Sense of Big Data in Insurance

Architecting for the Internet of Things & Big Data

REAL-TIME BIG DATA ANALYTICS

Big Data and Data Science: Behind the Buzz Words

Harnessing big data with Hortonworks Data Platform and Red Hat JBoss Data Virtualization

BIG DATA TOOLS. Top 10 open source technologies for Big Data

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Big Data: Are You Ready? Kevin Lancaster

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Information Builders Mission & Value Proposition

Organisaties groot en klein, beginnen zich meer en meer te realiseren dat inzicht in (real-time) data helpt

SAP and Hortonworks Reference Architecture

Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big Data and Analytics (Fall 2015)

Integrating a Big Data Platform into Government:

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Search and Real-Time Analytics on Big Data

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Enterprise Operational SQL on Hadoop Trafodion Overview

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Oracle Big Data Strategy Simplified Infrastrcuture

Big Data Analytics - Accelerated. stream-horizon.com

Data Virtualization A Potential Antidote for Big Data Growing Pains

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

The Lab and The Factory

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Big Data Architectures. Tom Cahill, Vice President Worldwide Channels, Jaspersoft

Real Time Big Data Processing

SAP Database Strategy Overview. Uwe Grigoleit September 2013

Native Connectivity to Big Data Sources in MSTR 10

TE's Analytics on Hadoop and SAP HANA Using SAP Vora

Cloud Scale Distributed Data Storage. Jürmo Mehine

MDM and Data Warehousing Complement Each Other

Oracle Database 12c Plug In. Switch On. Get SMART.

Introduction to Polyglot Persistence. Antonios Giannopoulos Database Administrator at ObjectRocket by Rackspace

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

Composite Software Data Virtualization Turbocharge Analytics with Big Data and Data Virtualization

Extend your analytic capabilities with SAP Predictive Analysis

Transforming the Telecoms Business using Big Data and Analytics

The Future of Data Management

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Data Integration Checklist

BIG DATA What it is and how to use?

The Enterprise Data Hub and The Modern Information Architecture

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Where is... How do I get to...

HDP Hadoop From concept to deployment.

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

The 4 Pillars of Technosoft s Big Data Practice

Sentimental Analysis using Hadoop Phase 2: Week 2

Internals of Hadoop Application Framework and Distributed File System

SAP BusinessObjects Business Intelligence 4.1 One Strategy for Enterprise BI. May 2013

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

Integrating Big Data into the Computing Curricula

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Time-Series Databases and Machine Learning

Independent process platform

With the Emergence of Big Data, Where do Relational Technologies Fit? Donna Burbank President, DAMA Rocky Mountain Chapter

Open Source in Financial Services: Meet the challenges of new business models and disruption

Understanding traffic flow

Introducing Red Hat s JBoss Portfolio

What s New with Informatica Data Services & PowerCenter Data Virtualization Edition

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Transcription:

Large Scale/Big Data Federation & Virtualization: A Case Study Vamsi Chemitiganti, Chief Solution Architect Derrick Kittler, Senior Solution Architect Bill Kemp, Senior Solution Architect Red Hat 06.29.12

Red Hat Big Data Week In Retrospective Big Data, Volume, Speed & Benefits with Red Hat JBoss Data Grid Red Hat's Big Data Strategy Overview & Optimizing Apache Hadoop with Red Hat Enterprise MRG Grid JBoss Enterprise Middleware & Big Data NoSQL & Big Data at Red Hat Large Scale / Big Data Federation & Virtualization: A Case Study

Goals Big Data by Example Significant Learning/s Solution Architectures

Background Big Data Data growing at 40% per year (McKinsey Global Institute) a lot of analyst reports exist... Big Data Data size and performance requirements become significant design and decision factors for implementing a data management and analysis system. In this definition, there s not an absolute size milestone between data and big data. 4V's How much? formats? speed? change?

Background Where is it coming from? Volume - old and new types of data transaction volumes and other traditional data types Variety more types of information to analyze social media, mobile (context-aware) Velocity how fast is data produced? tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions, etc... how fast must the data be processed? Variability - data unpredictability, new forms, risk! UPC barcodes, RFID scanners, sensors (HVAC), etc...

Background Big Data Tech Landscape NoSQL (Not only SQL) Brewers CAP Theorem Data Federation and Virtualization Cassandra, MongoDB, Neo4J, Hive, Pig, JDG, etc... JBoss Enterprise Data Services Platform MapReduce and batch processing And the list goes on...

Background Financial Services Industry Large commercial bank History of large acquisitions Fast moving business and compliance environment Different sources of data that capture only parts of their overall data lifecycle Dodd-Frank and Basel 2 regulations Need agility on top of all their data challenges

The Problem Case Study Problem Domain Acquisitions and partner integration Many sources of financial data with different origins and formats. Primary Business Drivers Business Intelligence (predictive analytics and forecasting) How to harness the volumes of Data; 'Big' Data Compliance and Risk Management Provide exposure to clients via Multiple Channels Large pains around ETL

Reference Architecture Open Source SOA, Jeff Davis, Manning Press

Reference Architecture EDSP Open Source SOA, Jeff Davis, Manning Press

The Problem Data sources App Name Format Size # Table MERCURY MF ASCII Extract ~ 450 GB 1 tbl/day ATLAS M/F ASCII Extract ~ 54 GB 1 tbl/day ARES Oracle DB ~ 350 GB 8-10 tbls HERCULES Oracle DB ~ 200 GB 11-12 tbls APOLLO ASCII File ~ 10 GB 1 tbl ZEUS ASCII File ~ 10 GB 1 tbl ATHENA Oracle DB ~ 20 GB 8-10 tbls HADES Sybase Table ~ 50 GB 8 tbls HERA XML Dump ~ 10GB 9 tbls DEMETER Oracle DB ~ 10 GB 10-12 tbls

The Problem Overall Data Flow Mercury Trades ATLAS Data Trades from ATLAS flows through to APOLLO (domestic trade) If international data ATLAS passes to ARES and then to HADES/ATHENA Position information are captured in the GPDW (HERCULES) application Data from Mercury flows through to ZEUS If international data ZEUS pass to International Settlement systems Data from MERCURY is also logged to Trademart (ARES) system Position information are captured in the GPDW (HERCULES) application

The Problem views of data... Trade View Rationalization, data agility, data source flexibility Easily and rapidly augment models with new sources and/or data attributes Account View Federate across multiple account sources and create a virtualized canonical mode Data augmentation, federation and data source flexibility Service real time data integration challenges Instrument View Rationalization, virtualization, data source flexibility

The Problem data hierarchies Trade Source Data systems Priority 1 2 3 4 5 Source ARES Trade DM HERCULES APOLLO ATHENA HADES Account Source Data systems Priority 1 2 3 4 5 6 Instrument Source Data systems Priority 1 2 3 4 5 6 Source SMC Master ARES Trade DM HERCULES APOLLO ATHENA HADES Source AMC Master ARES Trade DM HERCULES APOLLO ATHENA HADES

Significant Learnings not as easy as you think Source database performance matters Materialization is a read-only solution Write back is disabled! Unstructured data is not query-ready Database tuning is important for push-down model Pre-processing, indexing and optimization is required Data virtualization provides the ability to discover nuances in the source data and relationships Learn more about your data

Solution Speed Layer Serving Layer Batch Layer

The Solution Batch Layer High Latency Lots of data Continual compute Pre-compute Views of Data Master data set / all data Constantly growing! MapReduce is a Canonical Example Red Hat Storage, Hadoop, Etc...

The Solution Serving Layer Loads the batch views Indexes for efficient querying Continual compute Pre-compute Views of Data Master data set / all data Constantly growing! NoSQL is a Canonical Example MongoDB, Cassandra, Neo4J, Etc... Key-Value, Graph, Document, Column??

The Solution Speed Layer Similar to Service Layer but Faster! Incremental Updates vs. Re-computation Updates Doesn't look at all new data at once Updates real-time view as it receives new data Produces views only on RECENT data vs. entire dataset Random reads/writes Way more complex than batch and serving layer Data Federation/Virtualization, Caching, etc.. Enterprise Data Services Platform, JBoss Data Grid

Solution Architecture JDG, EDSP Red Hat Storage, m:r NoSQL Big Data, Nathan Martz, Manning Press

Solution Architecture Enterprise Data Services Platform JBoss Data Grid Speed layer; real time data and batch data access Speed layer and Service layer; in-memory data Cassandra Built for analytics; fast writes, highly consistent Maintains indexes for Speed layer MapReduce/Red Hat Storage Continual processing of master data Many jobs / continual processing

Enterprise Data Services Platform

Red Hat Storage applicability Out-of-box compatible with MapReduce apps Superior Storage Economics NAS, NFS, CIFS, HTTP Access to data Unify Data Storage Eliminate need for NameNode

JBoss Data Grid Core Architecture

Improving Integration of Big Data into Enterprise Application Architectures Red Hat s Solution Existing data producers, standards-based interfaces into the ETL pipeline.

Achieving Greater Transparency into Enterprise Data and Big Data Assets Red Hat s Solution A virtualized view of enterprise data, regardless of the source or type. It copes with large amounts of unstructured data via an ETL intake pipeline that extracts and store metadata in a fully customizable manner, increasing overall data visibility.

QUESTIONS?