Data Governance for Regulated Industries Amir Halfon CTO, Worldwide Financial Service
Agenda Components of Data Governance Challenges Solutions and Case Studies Q&A SLIDE: 2
Data Governance Considerations Security SLIDE: 3
Data Governance Considerations Security Privacy SLIDE: 4
Data Governance Considerations Security Privacy Provenance SLIDE: 5
Data Governance Considerations Security Retention Privacy Provenance SLIDE: 6
Data Governance Considerations Security Retention Privacy Continuity Provenance SLIDE: 7
Data Governance Considerations Security Retention Privacy Provenance Continuity Compliance SLIDE: 8
Why is this difficult? And risky? And expensive? And behind schedule? SLIDE: 9
It s Complicated Reference Data Unstructured OLTP Warehouse Documents, Messages { } Social Metadata Video Audio Signals, Logs, Streams Archives Data Marts Search SLIDE: 10
Can Anything be Done? SLIDE: 11
Case Studies Records Retention and Investigations Trade Operational Data Store Regulatory Compliance Customer On-Boarding SLIDE: 12
Case Study: Records Retention and Investigations Accurately respond to litigation Hold, review, produce data across current, legacy systems Repatriate and reconcile distributed data Demonstrate fidelity and audit trail Reduce infrastructure and maintenance costs SLIDE: 13
Old Generation Records Retention and Investigations Oracle Mainframe 87 total systems Sybase SLIDE: 14
New Generation Records Retention and Investigations Ingest Query Oracle 100TB 40TB Mainframe MarkLogic MarkLogic 87 total systems Sybase Offline Replication Shared Storage NAS HDFS SLIDE: 15
Data Retention and Tiered Storage Provide multiple Service Level Agreements (SLAs) in a single system Decrease time and costs of ETL to bring offline content back online Empower your operations team without imposing burdens on your developers SLIDE: 16
Tiered Storage Architecture Data tiers are defined based on indexes balanced into forests by tier Query one tier or the other tier or both at once! All with no downtime, and 100% consistency! SLIDE: 17
Case Study: Operational Trade Data Store Comply with regulations requiring operational insights Quickly operationalize business innovation Support risk management requirements Reduce costs per trade Trade processing exceptions infrastructure and maintenance costs SLIDE: 18
Old Generation Operations and Analytics Limited, fragmented analytics and reporting capabilities Long, costly development cycles Expensive, error-prone post-trade processing Derivatives Rates FX ETL ETL ETL Matching Clearing Settlement etc. etc. Multiple Relational Data Stores for different instrument types SLIDE: 19
New Generation Operations, Compliance and Analytics Single source of truth Surveillance, Risk & Compliance Matching Clearing Settlement etc. Single ODS for all instrument types persisted as-is off a message bus Executed Trades MarkLogic Simplified workflow archtecture Post Trade Processing Exceptions Management HDFS Historical Analysis Tiered Storage using Hadoop SLIDE: 20
Event Handling and Workflow Integration Queries can be serialized and indexed, just like documents Index these queries For a given data document, quickly find all possible matching queries Narrow down these candidates to exact matches using a unified expression tree Fire off events to drive workflow, or alert users SLIDE: 21
Tiered Storage with Hadoop Minimize duplication and ETL, reduce risk MarkLogic Active ~$25/GB Historical ~$1/GB HDFS SLIDE: 22
Case Study: On-Boarding Compliance Thousands of rules, 1 2M accounts, 30 40M documents Encoding, adjusting, and matching rules must scale Impossible to pre-define dimensions, relationships Vet new accounts and show your work Real-time decision-making SLIDE: 23
Old Generation On-Boarding Compliance Regulations Policies Documents SLIDE: 24
New Generation On-Boarding Compliance MarkLogic Onboarding Workflow Regulations Policies Documents SLIDE: 25
Semantics Facts about the data, expressed in a flexible triple format Bring context to the content Improve query precision with search and semantic queries SLIDE: 26
Data Provenance Using Semantics <Trade> <Cashflows> <PartyIdentifier> <TradeID> 123456 </TradeID> </PartyIdentifier> </Cashflows> <provenance> <triple> <subject> Cashflows </subject> <predicate> wasderivedfrom </predicate> <object> CDS_xyz </object> </triple> <triple> <subject> TradeID </subject> <predicate> wasattributedto </predicate> <object> System_123 </object> </triple> </provenance> </Trade> SLIDE: 27
Case Study: Dodd Frank Compliance Trace lineage of order lifecycle for OTC derivatives Search, link supporting communications, documents Strict reporting and retention rules, response times Existing policies, point solutions don t scale SLIDE: 28
Old Generation Regulatory Compliance email Operations Reporting Trade Records Categorization Linking Reference Data SLIDE: 29
New Generation Regulatory Compliance email { } Metadata Reporting Surveillance Ad hoc analysis Operations Trade Records Reference Data MarkLogic Categorization Enrichment Linking SLIDE: 30
Enrichment and Linking SLIDE: 31
Management Dashboard SLIDE: 32
Recap SLIDE: 33
New Generation Data Governance Security Retention Privacy Provenance Continuity Compliance SLIDE: 34
Parting Thoughts One way to do more with less is to do less.. ETL, schema design, database maintenance.. to spend less on complex systems and human intervention and to focus more on data governance AND business agility SLIDE: 35
Thank You SLIDE: 36