and NoSQL Data Governance for Regulated Industries Using Hadoop Justin Makeig, Director Product Management, MarkLogic October 2013



Similar documents
Data Governance for Regulated Industries

Making Sense of Big Data in Insurance

Endeca Introduction to Big Data Analytics

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

The Future of Data Management

You Have Your Data, Now What?

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Unleashing the Power of Hadoop for Big Data Analytics

The Enterprise Data Hub and The Modern Information Architecture

Big Data Analytics Nokia

Data Integration Checklist

The Future of Data Management with Hadoop and the Enterprise Data Hub

Simplifying Data Governance and Accelerating Real-time Big Data Analysis in Financial Services with MarkLogic Server and Intel

Big Data Introduction

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Government Institutions with MarkLogic Server and Intel

Luncheon Webinar Series May 13, 2013

Big Data and Analytics in Government

Putting Apache Kafka to Use!

Big Data, Integration and Governance: Ask the Experts

How To Create A Data Science System

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Hadoop Trends and Practical Use Cases. April 2014

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Architecting for the Internet of Things & Big Data

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

HDP Hadoop From concept to deployment.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Bringing Strategy to Life Using an Intelligent Data Platform to Become Data Ready. Informatica Government Summit April 23, 2015

Simplifying Data Governance and Accelerating Real-time Big Data Analysis for Healthcare with MarkLogic Server and Intel

HADOOP. Unleashing the Power of. for Big Data Analysis. Best Practices Series. MarkLogic. Cisco. Attunity. Couchbase PAGE 14 PAGE 17 PAGE 18 PAGE 19

More Data in Less Time

Building Confidence in Big Data Innovations in Information Integration & Governance for Big Data

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Exploiting Data at Rest and Data in Motion with a Big Data Platform

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Ganzheitliches Datenmanagement

Beyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

MarkLogic for Government. July 2014

A Big Data Storage Architecture for the Second Wave David Sunny Sundstrom Principle Product Director, Storage Oracle

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Data Virtualization and ETL. Denodo Technologies Architecture Brief

The Top 10 7 Hadoop Patterns and Anti-patterns. Alex

Il mondo dei DB Cambia : Tecnologie e opportunita`

SAP Database Strategy Overview. Uwe Grigoleit September 2013

BIG DATA CAN DRIVE THE BUSINESS AND IT TO EVOLVE AND ADAPT RALPH KIMBALL BUSSUM 2014

A Whole New World. Big Data Technologies Big Discovery Big Insights Endless Possibilities

Deploying Big Data to the Cloud: Roadmap for Success

Survival Tips for Big Data Impact on Performance Share Pittsburgh Session 15404

Optimized for the Industrial Internet: GE s Industrial Data Lake Platform

Autonomy Consolidated Archive

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth

Accelerate Data Loading for Big Data Analytics Attunity Click-2-Load for HP Vertica

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

... Foreword Preface... 19

Composite Software Data Virtualization Five Steps to More Effective Data Governance

Informatica Platform v10 for: Next Generation Analytics Cloud Modernization Data Archiving. Presented by Ilya Gershanov

Balance and maximise your Oracle EBS investment with IBM Optim A Priceline and Travel Industry Case Study Philip McBride

Agile Business Intelligence Data Lake Architecture

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

The IBM Cognos Platform

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

Processing and Analyzing Streams. CDRs in Real Time

Oracle Big Data Building A Big Data Management System

<Insert Picture Here> Extending Hyperion BI with the Oracle BI Server

A 360 Degree View of Anything

MDM and Data Warehousing Complement Each Other

Real World Strategies for Migrating and Decommissioning Legacy Applications

How the oil and gas industry can gain value from Big Data?

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Enabling Big Data with Cloud. Go faster Reduce risk Scale as you grow Avoid mistakes

Data Big and Small: How Publisher gain Value out of Data in the Future

Integrating a Big Data Platform into Government:

By Makesh Kannaiyan 8/27/2011 1

HITACHI DATA SYSTEMS HADOOP SOLUTION JUNE 12, 2012

Data Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com

Data warehouse Architectures and processes

So What s the Big Deal?

MarkLogic and Cisco: A Next-Generation, Real-Time Solution for Big Data

Choosing A Customer Data Platform

The Principles of the Business Data Lake

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Data Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here

Traditional BI vs. Business Data Lake A comparison

SAP HANA SPS 09 - What s New? HANA IM Services: SDI and SDQ

Information Governance in the Cloud

Getting Started Practical Input For Your Roadmap

AnalytiX MappingManager Big Data Edition

Information Architecture

Independent process platform

MarkLogic Enterprise Data Layer

One Size Doesn t Fit All Choosing which big data, NoSQL or database technology to use

Transcription:

Data Governance for Regulated Industries Using Hadoop and NoSQL Justin Makeig, Director Product Management, MarkLogic October 2013

Who am I? Product Manager for 6 years at MarkLogic Background in FinServ and web development Passionate about data, infrastructure, and user experience What is MarkLogic? Enterprise NoSQL since 2001 Distributed database + search + app platform 250+ paying customers, 500+ production applications SLIDE: 2

Agenda Data governance considerations Legacy approaches: Why it s hard New generation: Hadoop + Enterprise NoSQL Enterprise NoSQL Case studies: FATCA, ediscovery, Dodd-Frank Q&A SLIDE: 3

Data Governance Considerations Security Privacy Provenance Retention Continuity Compliance SLIDE: 4

Why is this difficult? And risky? And expensive? And behind schedule? SLIDE: 5

Last Generation ETL Reference Data Unstructured OLTP ETL Warehouse ETL ETL ETL Archives ETL Data Marts Documents, Messages { } Metadata Social Video ETL Search Audio Signals, Logs, Streams SLIDE: 6

Enter Hadoop Updates? Queries Hadoop Staging Analytics Persistence Aggregates, Models SLIDE: 7

Why must we choose? Legacy RDBMS Indexes Transactions Security Enterprise operations NoSQL Flexible data model Commodity scale out Distributed, fault-tolerant Hadoop sink/source SLIDE: 8

New Generation ETL Updates Queries MarkLogic! NoSQL Hadoop Aggregates, Models SLIDE: 9

Enterprise NoSQL Flexible data model, comprehensive indexes SLIDE: 10 o Documents: Hierarchy, text, values, tags schema on-demand o Scalars: Aggregates and range filters, including geospatial o Triples: Linked facts and inferencing o Permissions: Users, roles, compartments, and privileges o Queries: Reverse indexes for alerting, matching Ad hoc dimensions, lock-free reads Real-time transformation Strict consistency throughout

Preserving Context with Documents Before movement of materials was observed en route to Abattabad some time after 14:30 Inline Enrichment After movement of materials was observed en route to <place lat=" " long=" " version="2.2.1"> <original>abattabad</original> <canonical ref=" ">Abbottabad</canonical> <source>/sources/1234</source> <confidence>0.87</confidence> </place> some time after 14:30 Transactional updates SLIDE: 11

Complementary Approaches NoSQL Online applications Delivery Decision-making Real-time Granular updates Distributed indexes Hadoop Offline analytics Staging Model-building Long-haul batch Write-once, read-many Distributed file system SLIDE: 12

Case Studies SLIDE: 13

KPMG: FATCA Compliance for Customer On-Boarding Thousands of rules, 1 2M accounts, 30 40M documents Encoding, adjusting, and matching rules must scale Impossible to pre-define dimensions, relationships Vet new accounts and show your work Real-time decision-making 6 48 hours to 3 seconds SLIDE: 14

KPMG: FATCA Compliance for Customer On-Boarding Validate MarkLogic! Enrich Rules Documents Archive SLIDE: 15

Tier 1 European Bank: Compliance and Legal Holds Accurately respond to discovery as part of litigation Hold, review, produce data across current, legacy systems Repatriate and reconcile distributed data Demonstrate fidelity and audit trail Reduce infrastructure and maintenance costs Estimated $16M savings SLIDE: 16

Tier 1 European Bank: Compliance and Legal Holds Ingest Query Oracle 100TB 40TB Mainframe MarkLogic! MarkLogic! 87 total systems Sybase Offline Replication Shared Storage NAS HDFS SLIDE: 17

Ernst & Young: Dodd-Frank Compliance Trace lineage of order lifecycle for OTC derivatives Search, link supporting communications, documents Strict reporting and retention rules, response times Existing policies, point solutions don t scale SLIDE: 18

Current State Missing key relationships between pre-/post-trade data No way to query across silos Segregated reporting and surveillance SLIDE: 19

Ernst & Young: Dodd-Frank Compliance { } Metadata Documents Reporting Surveillance Ad hoc analysis Reference Data MarkLogic! Staging Quality Assurance Categorization Enrichment Linking SLIDE: 20

Enrichment and Linking SLIDE: 21

Management Dashboard SLIDE: 22

What now? SLIDE: 23

Take-Aways New and more data is both an opportunity and a threat Last generation of data management is not sufficient More copies, representations, transformations increase risk Index once and reuse across workloads, lifecycle NoSQL: indexing and updates for interactive apps Hadoop: staging, persistence, and analytics SLIDE: 24

DO MORE WITH HADOOP SECURE Minimize duplication, costly ETL, reduce risk REAL-TIME Enterprise-class database for real-time search, delivery & analytics RUN APPLICATIONS Run mission critical applications directly on HDFS SLIDE: 25