2 Large Scale/Big Data Federation & Virtualization: A Case Study Vamsi Chemitiganti, Chief Solution Architect Derrick Kittler, Senior Solution Architect Bill Kemp, Senior Solution Architect Red Hat
4 Red Hat Big Data Week In Retrospective Big Data, Volume, Speed & Benefits with Red Hat JBoss Data Grid Red Hat's Big Data Strategy Overview & Optimizing Apache Hadoop with Red Hat Enterprise MRG Grid JBoss Enterprise Middleware & Big Data NoSQL & Big Data at Red Hat Large Scale / Big Data Federation & Virtualization: A Case Study
5 Goals Big Data by Example Significant Learning/s Solution Architectures
6 Background Big Data Data growing at 40% per year (McKinsey Global Institute) a lot of analyst reports exist... Big Data Data size and performance requirements become significant design and decision factors for implementing a data management and analysis system. In this definition, there s not an absolute size milestone between data and big data. 4V's How much? formats? speed? change?
7 Background Where is it coming from? Volume - old and new types of data transaction volumes and other traditional data types Variety more types of information to analyze social media, mobile (context-aware) Velocity how fast is data produced? tabular data (databases), hierarchical data, documents, , metering data, video, still images, audio, stock ticker data, financial transactions, etc... how fast must the data be processed? Variability - data unpredictability, new forms, risk! UPC barcodes, RFID scanners, sensors (HVAC), etc...
8 Background Big Data Tech Landscape NoSQL (Not only SQL) Brewers CAP Theorem Data Federation and Virtualization Cassandra, MongoDB, Neo4J, Hive, Pig, JDG, etc... JBoss Enterprise Data Services Platform MapReduce and batch processing And the list goes on...
9 Background Financial Services Industry Large commercial bank History of large acquisitions Fast moving business and compliance environment Different sources of data that capture only parts of their overall data lifecycle Dodd-Frank and Basel 2 regulations Need agility on top of all their data challenges
10 The Problem Case Study Problem Domain Acquisitions and partner integration Many sources of financial data with different origins and formats. Primary Business Drivers Business Intelligence (predictive analytics and forecasting) How to harness the volumes of Data; 'Big' Data Compliance and Risk Management Provide exposure to clients via Multiple Channels Large pains around ETL
11 Reference Architecture Open Source SOA, Jeff Davis, Manning Press
12 Reference Architecture EDSP Open Source SOA, Jeff Davis, Manning Press
13 The Problem Data sources App Name Format Size # Table MERCURY MF ASCII Extract ~ 450 GB 1 tbl/day ATLAS M/F ASCII Extract ~ 54 GB 1 tbl/day ARES Oracle DB ~ 350 GB 8-10 tbls HERCULES Oracle DB ~ 200 GB tbls APOLLO ASCII File ~ 10 GB 1 tbl ZEUS ASCII File ~ 10 GB 1 tbl ATHENA Oracle DB ~ 20 GB 8-10 tbls HADES Sybase Table ~ 50 GB 8 tbls HERA XML Dump ~ 10GB 9 tbls DEMETER Oracle DB ~ 10 GB tbls
14 The Problem Overall Data Flow Mercury Trades ATLAS Data Trades from ATLAS flows through to APOLLO (domestic trade) If international data ATLAS passes to ARES and then to HADES/ATHENA Position information are captured in the GPDW (HERCULES) application Data from Mercury flows through to ZEUS If international data ZEUS pass to International Settlement systems Data from MERCURY is also logged to Trademart (ARES) system Position information are captured in the GPDW (HERCULES) application
15 The Problem views of data... Trade View Rationalization, data agility, data source flexibility Easily and rapidly augment models with new sources and/or data attributes Account View Federate across multiple account sources and create a virtualized canonical mode Data augmentation, federation and data source flexibility Service real time data integration challenges Instrument View Rationalization, virtualization, data source flexibility
16 The Problem data hierarchies Trade Source Data systems Priority Source ARES Trade DM HERCULES APOLLO ATHENA HADES Account Source Data systems Priority Instrument Source Data systems Priority Source SMC Master ARES Trade DM HERCULES APOLLO ATHENA HADES Source AMC Master ARES Trade DM HERCULES APOLLO ATHENA HADES
18 Significant Learnings not as easy as you think Source database performance matters Materialization is a read-only solution Write back is disabled! Unstructured data is not query-ready Database tuning is important for push-down model Pre-processing, indexing and optimization is required Data virtualization provides the ability to discover nuances in the source data and relationships Learn more about your data
19 Solution Speed Layer Serving Layer Batch Layer
20 The Solution Batch Layer High Latency Lots of data Continual compute Pre-compute Views of Data Master data set / all data Constantly growing! MapReduce is a Canonical Example Red Hat Storage, Hadoop, Etc...
21 The Solution Serving Layer Loads the batch views Indexes for efficient querying Continual compute Pre-compute Views of Data Master data set / all data Constantly growing! NoSQL is a Canonical Example MongoDB, Cassandra, Neo4J, Etc... Key-Value, Graph, Document, Column??
22 The Solution Speed Layer Similar to Service Layer but Faster! Incremental Updates vs. Re-computation Updates Doesn't look at all new data at once Updates real-time view as it receives new data Produces views only on RECENT data vs. entire dataset Random reads/writes Way more complex than batch and serving layer Data Federation/Virtualization, Caching, etc.. Enterprise Data Services Platform, JBoss Data Grid
23 Solution Architecture JDG, EDSP Red Hat Storage, m:r NoSQL Big Data, Nathan Martz, Manning Press
24 Solution Architecture Enterprise Data Services Platform JBoss Data Grid Speed layer; real time data and batch data access Speed layer and Service layer; in-memory data Cassandra Built for analytics; fast writes, highly consistent Maintains indexes for Speed layer MapReduce/Red Hat Storage Continual processing of master data Many jobs / continual processing
25 Enterprise Data Services Platform
26 Red Hat Storage applicability Out-of-box compatible with MapReduce apps Superior Storage Economics NAS, NFS, CIFS, HTTP Access to data Unify Data Storage Eliminate need for NameNode
27 JBoss Data Grid Core Architecture
28 Improving Integration of Big Data into Enterprise Application Architectures Red Hat s Solution Existing data producers, standards-based interfaces into the ETL pipeline.
29 Achieving Greater Transparency into Enterprise Data and Big Data Assets Red Hat s Solution A virtualized view of enterprise data, regardless of the source or type. It copes with large amounts of unstructured data via an ETL intake pipeline that extracts and store metadata in a fully customizable manner, increasing overall data visibility.
View Point Use of Big Data Technologies in Capital Markets - Ruchi Verma, Sathyan R Mani Abstract Data is growing at a tremendous rate with an increase in digital universe from 281 Exabyte s (year 2007)
White Paper Intel Reference Architecture Big Data Analytics Predictive Analytics and Interactive Queries on Big Data WRITERS Moty Fania, Principal Engineer, Big Data/Advanced Analytics, Intel IT Parviz
GLOBAL FINANCIAL SERVICES COMPLIANCE & RISK MANAGEMENT with Bombuz A Big Data & Semantic Web Solution Insigma Hengtian Software Ltd. Bayshore Management Consultants, LLC Table of Contents OVERVIEW... 1
Database Systems Journal vol. IV, no. 3/2013 31 Big Data Challenges Alexandru Adrian TOLE Romanian American University, Bucharest, Romania firstname.lastname@example.org The amount of data that is traveling across
INTELLIGENT BUSINESS STRATEGIES W H I T E P A P E R Architecting A Big Data Platform for Analytics By Mike Ferguson Intelligent Business Strategies October 2012 Prepared for: Table of Contents Introduction...
Expert Reference Series of White Papers Big Data and Apache Hadoop Adoption: Key Challenges and Rewards 1-800-COURSES www.globalknowledge.com Big Data and Apache Hadoop Adoption: Key Challenges and Rewards
BIG DATA TECHNOLOGIES, USE CASES, AND RESEARCH ISSUES Il-Yeol Song, Ph.D. College of Computing & Informatics Drexel University Philadelphia, PA 19104 ACM SAC 2015 April 14, 2015 Salamanca, Spain Source:
Big Data and Advanced Analytics Technologies and Use Cases" Colin White President, BI Research DAMA Portland February 2013" Agenda There is considerable interest at present on the topic of big data. Much
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
White Paper The Business Analyst s Guide to Hadoop Get Ready, Get Set, and Go: A Three-Step Guide to Implementing Hadoop-based Analytics By Alteryx and Hortonworks (T)here is considerable evidence that
An Oracle White Paper June 2013 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure
Customer Cloud Architecture for Big Data and Analytics Executive Overview Using analytics reveals patterns, trends and associations in data that help an organization understand the behavior of the people
32 Big Data: present and future Big Data: present and future Mircea Răducu TRIFU, Mihaela Laura IVAN University of Economic Studies, Bucharest, Romania email@example.com, firstname.lastname@example.org
1 Contents Introduction. 1 View Point Phil Shelley, CTO, Sears Holdings Making it Real Industry Use Cases Retail Extreme Personalization. 6 Airlines Smart Pricing. 9 Auto Warranty and Insurance Efficiency.
A Forrester Consulting Thought Leadership Paper Commissioned By SAP Real-Time Data Management Delivers Faster Insights, Extreme Transaction Processing, And Competitive Advantage June 2013 Table Of Contents
White Paper Data Warehouse Optimization with Hadoop A Big Data Reference Architecture Using Informatica and Cloudera Technologies This document contains Confidential, Proprietary and Trade Secret Information
WHITEPAPER OPEN MODERN DATA ARCHITECTURE FOR FINANCIAL SERVICES RISK MANAGEMENT A top-tier global bank s end-of-day risk analysis jobs didn t complete in time for the next start of trading day. To solve
QLIKVIEW AND BIG DATA: HAVE IT YOUR WAY A QlikView White Paper November 2012 qlikview.com Table of Contents Executive Summary 3 Introduction 3 The Two Sides of Big Data Analytics 3 How Big Data Flows from
Top 3 Ways Big Data Impacts Financial Services The Big Data Dilemma for Financial Services Today s firms are looking for new ways to solve Big Data challenges. From front-office risk management to back-office
MASARYK UNIVERSITY FACULTY OF INFORMATICS Best Practices in Scalable Web Development MASTER THESIS Martin Novák May, 2014 Brno, Czech Republic Declaration Hereby I declare that this paper is my original
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers