Riding the Data Wave New Capabilities New Techniques Bill Chute Acadiant Limited
There are new challenges New technologies are on your side 2
MiFID II & MIFIR Basel III NAV Volcker VaR Dodd-Frank MAD II & MIR FATCA EMIR 3
MiFID II & MIFIR MAD II & MIR Basel III Dodd-Frank EMIR Volcker FATCA NAV VaR Who loaned that security? Which transaction hedged that other transaction? Which model calculated that ratio? What s your exposure to? 4
Historic Scarcity Modern Abundance Constrained Storage Constrained CPU Constrained RAM Constrained Bandwidth Computing was Expensive Cheap Storage on Demand Cheap CPU on Demand Cheap RAM on Demand Cheap Bandwidth on Demand Unit Costs Tend Towards Zero Response to Scarcity: Structured Query Language Offline Storage Batch Processing Response to Abundance: NoSQL / Document Oriented DBs Online Archive Asynchronous Processing 5
Scarcity Built the Grid Row / Column was a crude, cheap way to organise data RDBMS, Spreadsheets, Fixed Data Formats Rigid Reporting Systems Time Grids coped with limited time, limited power Batch: gather once, process once BREAK THE GRID 6
BREAK THE GRID Exploit Mobile Social Technologies Loosely Structured Data Store everything, parse when needed Flexible Query Systems Asynchronous computing Process data whenever available 7
Ride That Wave NASDAQ OMX UltraFeed 8 hours per day: ~230GB per day Sustained 8Mbps, peaks ~3X to 4X Packed Binary, Optimised for Real-Time Trading Twitter Firehose 24 hours per day: ~900GB per day Sustained 10Mbps, peaks ~3X to 4X JSON, Optimised for Rich Data 8
What is in a tweet? Along with our new #Twitterbird, we've also updated our Display Guidelines: https://t.co/ed4omjys ^JC 9
What is in a tweet? 1. { 2. "coordinates": null, 3. "favorited": false, 4. "truncated": false, 5. "created_at": "Wed Jun 06 20:07:10 +0000 2012", 6. "id_str": "210462857140252672", 7. "entities": { 8. "urls": [ 9. { 10. "expanded_url": "https://dev.twitter.com/terms/display-guidelines", 11. "url": "https://t.co/ed4omjys", 12. "indices": [ 13. 76, 14. 97 15. ], 16. "display_url": "dev.twitter.com/terms/display-\u2026" 17. } 18. ], 19. "hashtags": [ 20. { 21. "text": "Twitterbird", 22. "indices": [ 23. 19, 24. 31 25. ] 26. } 27. ], 28. "user_mentions": [ 29. 30. ] 31. }, 32. "in_reply_to_user_id_str": null, 33. "contributors": [ 34. 14927800 35. ], 36. "text": "Along with our new #Twitterbird, we've also updated our Display Guidelines: https://t.co/ed4omjys ^JC", 37. "retweet_count": 66, 38. "in_reply_to_status_id_str": null, 101. "show_all_inline_media": false, 102. "screen_name": "twitterapi" 103. }, 104. "in_reply_to_screen_name": null, 105. "source": "web", 106. "in_reply_to_status_id": null 107. } 10
Process Data At Rest Use an Aggregation Framework like MapReduce Store in Structures like BigTable, Cassandra, DynamoDB, MongoDB Think About Your Data Server & Application Server Use many CPUs 11
Process Data At Rest Use an Aggregation Framework like MapReduce Store in Structures like BigTable, Cassandra, DynamoDB, MongoDB Think About Your Data Server & Application Server Use many CPUs And In Motion Update asynchronously. Do not wait for batch time. Use an Enterprise Service Bus Use many CPUs 11
Use the Cloud Unit Cost Tends Towards Zero 2ECU, 4GB RAM, 24x365: 400 88ECU, 60GB RAM, 24x365: 11,000 Data Warehouse 1TB: 600/year Online Archive 1TB: 100/year 12
Use the Cloud Security, Audit, Compliance Can Be Managed PPCI, ITAR, FIPS, HIPAA, ISO27001 An Opportunity for Enhanced Governance 13
Acadiant Every data element is timestamped and attributed for audit Nothing is ever deleted or overwritten History is always available Multilingual Multiple Character Sets Multi Currency 14
Acadiant Every data element is timestamped and attributed for audit Nothing is ever deleted or overwritten History is always available Multilingual Multiple Character Sets Multi Currency 14
Acadiant Every data element is timestamped and attributed for audit Nothing is ever deleted or overwritten History is always available Multilingual Multiple Character Sets Multi Currency 14
Acadiant Modern Graphics: SVG Server Side: Calculation and Storage Client Side: Display and Interaction 15
Back End Stack: Asynchronous Services Ruby Golang R MongoDB Node.js ESB Front End Stack: Asynchronous Mobile Clients HTML JavaScript Cascading Style Sheets Scalable Vector Graphics Intelligent Local Cache Protocols Optimised for High Latency Mobile Networks 16
A New Model for Collaboration Define a Product, Not a One-Off Fix Reduce Implementation Risk Scale Up From a Small Proof of Concept 17
Thank You bill.chute@acadiant.com @AcadiantLtd