Big Data for Good or Evil Lessons from the NSA PRISM Scandal Jason Bloomberg About Jason Bloomberg President of ZapThink, a Dovel Technologies Company One of the original Managing Partners of ZapThink LLC Acquired by Dovel Technologies in August 2011 Global thought leader in the areas of Cloud Computing, EA, & SOA Created the Licensed ZapThink Architect (LZA) SOA course & associated credential Run LZA course & Enterprise Cloud Computing course around the world Analyst for GigaOM and blogger for DevX New book, The Agile Architecture Revolution, is now available! 2 1
What are Big Data? Datasets whose size is beyond the ability of typical database software tools to capture, store, manage & analyze 3 2012 Big Data Technology Landscape 4 2
Today s Big Data are Tomorrow s Small Data? Definition intentionally subjective & moving definition of how big a dataset must be No fixed threshold As technology advances, size of datasets that qualify will increase 5 What about yesterday s data? Big Data May Include Historical Data If the amount of data doubles every two years, then half your data are always over two years old 6 3
Big Data Crisis Point Quantity & complexity of information The Big Data crisis point Ability to deal with quantity & complexity of information Time 7 Parkinson s Law (Big Data Corollary) Quantity of data will always expand to exceed available capacity for storing & processing it 8 4
If Someone Can Collect Big Data, then Someone Will Corollary to Parkinson s Law in action If you re not collecting Big Data, then someone else is The easier it is to collect Big Data, the more important it is to govern them 9 You must govern your metadata Metadata may even contain most of the business value Not just technical value Metadata governance at least as important as data governance Metadata may be Big Data as Well 10 5
Govern the Data You Don t Want Big Data analytics focuses on finding the nuggets of gold in the dross The data you don t want must still be governed, secured, & managed As Big Data sets grow, governing the dross is increasingly challenging 11 Not just valuable but dangerous Personally identifiable information Risk of false positives Big Data Results May be Dangerous Nuggets of Uranium, not Gold 12 6
Big Data Used to Mislead According to the figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However, of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world s traffic in conducting their mission that s less than one part in a million. Put another way, if a standard basketball court represented the global communications environment, NSA s total collection would be represented by an area smaller than a dime on that basketball court. NSA 13 In Other Words 7.5 terabytes of analytical results to process manually every day Would be equivalent of Call Detail Records for 5 million calls every day per person on the planet! 14 7
Wrong Conclusion? NSA spies on data in the US, so Keep your data out of the US, right? Assumes: Your country isn t spying on you too! Your country isn t working with the NSA! The NSA can t spy on data outside the US! What are your Big Data policies? 15 Governance the Old Way Information Problem Tools Policies for using the tools Governance 16 8
Today s Data Governance (simplified) Our data are unclean! Great! Here are policies & processes for how to manage data quality using our tool. Let s use this data quality tool. 17 Governance the New Way Information Problem Tools Policies for using the tools Meta-policies for dealing with governance Next-generation governance tools Best practice approach to Big Data Crisis 18 9
Meta Thinking Meta-requirement Requirement that applies to other requirements E.g., Business Agility requirement Meta-methodology Methodology for creating or modifying methodologies Following the Agile principle responding to change over following a plan even if the plan is to follow Agile Meta-policy Policy for how to perform governance 19 Dealing with Change Meta thinking doesn t look at something Meta thinking means looking at how something changes Meta thinking is typically manual Always includes people 20 10
Avoiding Hall of Mirrors Problem Meta-policy: how to we automate policy enforcement? Meta-meta-policy: how to we automate metapolicy enforcement? Answer: we don t (yet)! 21 Big Data Governance (even more simplified), part 1 We have too much information! Great! Here are policies & processes for how to use the Big Data tool. Let s use this Big Data tool. Uhh, our Big Data got too big for the tool. 22 11
Big Data Governance, part 2 Dang. Here s our policy for how to deal with ever-increasing quantities of data. Huh? We need a way to manage policies for dealing with ongoing Big Data challenges 23 Especially if central challenge is data quality Big Data sets tend to be unclean Structured, semi-structured, & unstructured Good and bad mixed together The move from traditional analytics to Big Data analytics is a move to poorer levels of data quality Big Data Analytics Tools May be Governance Tools 24 12
Not just governance of technology Governance with technology Largely automated Proactive Inherently iterative Not your Parents Governance! Agile 25 Governance Leads to Empowerment The more powerful the tools, the more important it is that people know how to use them properly IT should empower the people in the organization 26 13
SOA Governance (Supposedly) Works this Way! SOA Policy Security Policies, Routing Policies, etc. Registry/ Repository Policies for handling governance in the reg/rep ESB Meta-policy 27 Cloud shifts IT provisioning & management to the user Cloud automates previously manual tasks Greater risk of mucking things up How Cloud Changes the Equation Increased need for governance 28 14
Next Generation Governance 29 the Key to the Big Data Explosion 30 15
Big Data will always be too big Big Data challenge will always be changing Next-generation data governance tools must drive business agility Our Tools are Only as Good as our Architecture Tools will always fall short without architecture that supports and drives change 31 Book Giveaway! Jason Bloomberg President ZapThink, a Dovel Technologies Company jbloomberg@zapthink.com @theebizwizard 16