1 Big Data and Mass Surveillance Dr. Paul J. Ennis, Trinity College Dublin Big Data is one of those terms that comes along every few years that academics are invited to reflect upon. The problem with reflection, like the good come- back, is that it always comes too late. By the time it becomes real to us that this technology is proliferating it has already been disseminated, put into use, and its virtues memetically spread by corporate innovators. Understanding Big Data is even more pressing when considered in light of government entities and especially so when we consider the role of intelligence agencies. Since the National Security Agency sits at the top of the Western intelligence community it will be my focus, but you can translate much of what follows to its partners especially the Five Eyes network of which the United Kingdom is a member. The roots of the NSA are in the science of cryptography, of encryption and decryption, and its traditional role in serving the national interest was to develop military- grade encryption standards for secure communication and to decrypt that of their adversaries. Of course, to find communications to decrypt you have to intercept them and this part of the NSA s mission, known as signals intelligence, has become central to its contemporary identity. Signals intelligence is more commonly known in the media as mass surveillance and fears about it have contributed to the agency s post- Snowden reputation as a kind of ghostly Orwellian institution. Even to a sympathetic observer such as myself one must admit there is some truth to this caricature. The NSA is, no doubt about it, a powerful deep- state entity with the ability to sway the direction of the highest office of its country through intelligence reports. In this capacity the NSA can clearly be seen, as they sometimes are not, in their militarised colours. It can be startling when one first comes across footage of their home, Fort Meade, expecting racks of servers and busy nerds only to see uniformed men and women a reminder that the NSA is constantly on a quasi- war footing, engaged in an endless information- collection battle. Even when one thinks of the days of cloistered mathematicians engaged in cryptography we must remember that this was once a field so militarily delicate that the exporting of specific encryption ciphers was a criminal offence because that cipher was a munition, a weapon. Things have changed since those days. Now variations on these munitions live in your browser address bar as when that little lock secures your data when you use, for example, online banking. Behind that lock lies complex cryptographic processes operating on the principle that it is harder to decrypt than it is to encrypt sort of like how it s easy to smash an egg, but try putting it back together again. Filling in the gaps you can pre- empt that at some point the NSA had to loosen their grip on encryption and in some cases it was loosened for them. The spread of encryption certainly altered the nature of the organisation, but more
2 important than this is the simple fact that the internet and the spread of mobile phones necessitated a change in attitude and a renewed focus from encryption to that of interception. They were no longer going to be just plucking Soviet ciphers from the sky. Rather the bulk collection of data in its own right took on more precedence and access to ever larger volumes a new priority. In fact the NSA became so good at this that the sheer amount of data has become a problem. We can see a conflicted attitude to volume at work in a leaked NSA internal message- board discussion. In it an analyst makes reference to the Big Data Problem, but he is nonetheless impressed by their ability to pull bits off the internet and bring them back to the mother- base to evaluate and build intelligence off of,' which he considers just plain awesome! He adds, One of the coolest things about it is how much data we have at our fingertips. So how much data? Well let s take a quick look at Boundless Informant which a Big Data tool allowing us to see the metadata produced by 504 signal activity designators (SIGADs). Using the Boundless Informant tool the foreign signals intelligence infrastructure can be viewed in real time. The FAQ happily notes that there is no need for human intervention. Keen eyes will note it is built upon MapReduce which is a model for creating, processing and handling large data sets. They ll also know it allows for two types of views, one can zoom in and out between map and org views, and here we are actually dealing with the smaller view. Which is surprising because the numbers here are approximately 97 billion internet records and 124 billion telephony records for March (That s 97 billion digital network intelligence (DNI) records and 124 billion dial number recognition (DNR) records in NSA- speak). You can find larger numbers with some digging, but these will suffice for our purposes. Note this is metadata and not content. Think of the header of your rather than the message itself. You would be surprised at just how much detail can be extracted using metadata alone: for instance, who you are talking to, how often, at what times, and so on. In the following slide we see an analyst using this kind of data in tandem with a Big Data tool, presumably and very likely actually IBM s Analyst s Handbook, to perform a pattern- of- life analysis on a selector - which can be a phone number, an , an IP address and so on. You can see here how useful metadata is in building a picture of a target s activity and wider network. However, one should stress that our target has become a central node after some consideration. Their selection has likely been examined by other overseers to satisfy, at the very least, Executive Order Other forms of legislation are stricter, but is broadly, and somewhat casually applied, to foreign targets sometimes, if they fit an exemption, Americans as well. However, innocent associates will in such cases be considered either as incidental collection or used in contact chaining, that is to help establish links to further possible targets. If the contacts are innocent Americans this means they must have their details
3 minimized. Nonetheless I have found that what truly worries people is not this form of collection even though the media is very focused on metadata. In this format, as an outlying node, one is just a data point. It's intrusive, but this is not what elicits strong reactions in people when it comes to surveillance. Rather it is the possibility that the NSA might have direct access to the servers of major service providers such as Facebook, Google and Yahoo. It is important to stress that PRISM produces the most serialised reports for the NSA meaning it is responsible for most of the good intelligence produced by the agency. It utterly outshines the bulk interception programs, known as non- corporate programs, and there is a good reason for this. Whilst metadata is useful it pales in comparison to the information that can be derived from content- rich and exceptionally popular resources such as Google who can, when compelled to do so, provide exchanges, for example. PRISM produces good intelligence because it allows for access to the stored communications hoarded by corporations. Better yet, the data is cleaner: it is simply less raw. Sometimes data is just unavailable through other means. Or there may be some gems in the backups held by these corporations that had been lost in the noise of the internet. This is what helps leverage the cases made by the intelligence agencies when presenting targets to the Foreign Intelligence Surveillance Court. Under PRISM the NSA is not actually responsible for interacting with these corporations at all and we can see in the following slide that this is the job of the domestic intelligence outfit, the Federal Bureau of Investigation. We can get a glimpse here of just how many hoops an analyst must jump through to begin monitoring or accessing the data of their PRISM selector. It is important to keep this in mind because it provides the context that helps defuse the famous collection directly slide. Perhaps no such clumsy PowerPoint presentation decision has led to such worldwide confusion. We already know from the tasking slide direct collection to clean stored communications is not possible. In fact, in the next slide it states explicitly that there is no direct relationship with Communications Providers. When read in isolation, out of context, the collections directly slide does look ominous. When read against an awareness of how an analyst actually operates it emerges as little more than shorthand gone wrong a bad decision, but easy to make if you are assuming your audience is just fellow NSA analysts and not the wider public. We therefore find that the protestations of the major service providers after the leaks were sincere. Another issue worth highlighting is what kind of people are actually targeted. Here s a slide concerning A week in the life of PRISM reporting and it helps situate the kinds of issues that concern the NSA neatly and in particular to help us see them again in their militarised context. Columbia, for instance, comes under the topics Trafficking and FARC. Japan is under Trade and Israel. Who are the targets? Anyone who can provide intelligence about
4 these topics. And since the process of getting the go- ahead to watch such targets can take a little time, when we mean access to corporate servers, and granting that most analysts are assigned to quite specific topics such as we can see here, it is likely that pragmatic thinking kicks in concerning who to focus upon valuable targets, not a generic sweep for everyone. Hence it is not having nothing to hide that protects you, as many claim. Rather it is having nothing of value to hide. Or in terms of incidental collection not associating, as best as one can ascertain, with those that do. One must also remember that as a foreigner there are few legal protections assigned to you once you come under the gaze of a foreign intelligence agency. Here is the rub. The ability of the NSA to sweep up and access vast volumes of data is not one restricted to the hallways of Fort Meade. Less scrupulous foreign agencies do so as well. The list of potential adversaries online is extensive: corporate competitors, hackers, stalkers, data miners, sophisticated advertising companies and it goes on. They are all are incentivised to gather your data. It is for this reason I believe you should act as if the NSA was worth worrying about. The NSA represent the strongest possible adversary around which you should build your defences granting their status as the most advanced intelligence agency in existence. If this sounds like an ultra- paranoid strategy just keep in mind that the guiding principle that whilst all your data might be, for now, not pursued that it can be pursued is a fact. With this in mind here are some means to defend yourself and they all fall under the remit of cryptography. (1) PGP encryption: used in to scramble the contents of the message allowing for private communication (albeit not always anonymity due to metadata). (2) Bitcoin: a form of virtual currency, specifically cryptocurrency, that allows for the possibility, albeit again not perfect, of relatively anonymous transactions (relative to PayPal, for instance). (3) The Tor network or The Onion Router: a means of surfing the internet anonymously behind three layers of encryption. All three have a suspicious tag attached to them since they comprise the three means of purchasing narcotics on dark net marketplaces wherein one uses Tor to access the site, Bitcoin to purchase, and PGP to communicate with the vendor. However none of these were developed for that purpose and the means of implementation, as with all technology, depends entirely on the human agent responsible for their use. As Big Data innovations escalate so too will the nature of its use in intelligence. Ever more comprehensive pattern- of- life analysis will occur, ever more detailed snapshots will be obtained, and volume- limits will be broken. Nothing will have changed in the fundamentals. The game remains exactly the same as the low- tech variations on it played out throughout history. In the particulars the game has changed. As it moves into the new era of Big Data the NSA will adapt.
5 Nonetheless what will haunt it is precisely the fruits of its own heritage: the suite of cryptographic techniques that scramble the signal at the moment of inception such that what is intercepted is just noise. It is in the manner that you might hide in the cryptographic noise born of your highest possible adversary.
AUDIT COMMITTEE INSTITUTE Global Boardroom Insights The Cyber Security Challenge kpmg.com/globalaci About KPMG s Audit Committee Institutes Sponsored by more than 30 member firms around the world, KPMG
Law Enforcement Disclosure report Our customers have a right to privacy which is enshrined in international human rights law and standards and enacted through national laws. Respecting that right is one
Does My Infrastructure Look Big In This? A review of Fujitsu s Infrastructure Services Author: Robin Bloor Published: April 2002 Bloor Research Table of Contents Contents Management Summary................................
Climate Surveys: Useful Tools to Help Colleges and Universities in Their Efforts to Reduce and Prevent Sexual Assault Why are we releasing information about climate surveys? Sexual assault is a significant
Behind Every Great Product The Role of the Product Manager Martin Cagan Silicon Valley Product Group BEHIND EVERY GREAT PRODUCT Martin Cagan, Silicon Valley Product Group Every member of the product team
A Cooperative Agreement Program of the Federal Maternal and Child Health Bureau and the American Academy of Pediatrics Acknowledgments The American Academy of Pediatrics (AAP) would like to thank the Maternal
BELGIAN CYBER SECURITY GUIDE PROTECT YOUR INFORMATION This Guide and the accompanying documents have been produced jointly by ICC Belgium, FEB, EY, Microsoft, L-SEC, B-CCENTRE and ISACA Belgium. All texts,
overy in digital forensic investigations D Lawton R Stacey G Dodd (Metropolitan Police Service) September 2014 CAST Publication Number 32/14 overy in digital forensic investigations Contents 1 Summary...
CRM: Taking One-to-One Marketing to the Next Level An Executive White Paper Coravue, Inc. 7742 Redlands St., #3041 Los Angeles, CA 90293 USA (310) 305-1525 www.coravue.com Table of Contents Introduction...1
Identity and access management as a driver for business growth February 2013 Identity and access management (IAM) systems are today used by the majority of European enterprises. Many of these are still
APress/Authoring/2001/10/31:16:40 Page 83 CHAPTER 6 Design Principles THE STUDY OF COMPUTER SECURITY is by no means new. Principles of secure design are not unknown. Yet as discussed earlier, as technology
Webinar Transcript December 4, 2014 Financial Help for Marketplace Health Insurance: Tax Credits & Cost Sharing Hello everyone and welcome to today s ACE TA Center webinar. I am Mira Levinson the ACE TA
The sky is falling! Or is it? Wither the records manager in the digital age? Roundtable National Archives of the Netherlands John McDonald I would like to begin by thanking the National Archivist for the
Cyber Security Perspectives 2013 Quotes contributing partners JAYA BALOO, CISO KPN Philip of Macedon, the father of Alexander the Great said that the way to get into an impenetrable fortress was to send
Big Data in Private Sector and Public Sector Surveillance Recent years have seen an explosion in the popularity of big data. This popularity is attributable to a variety of reasons, including the easier
Computer Science and Artificial Intelligence Laboratory Technical Report MIT-CSAIL-TR-2015-026 July 6, 2015 Keys Under Doormats: Mandating insecurity by requiring government access to all data and communications
stiftung neue verantwortung Impulse 25/13 Law and Policy in Internet Surveillance Programs: United States, Great Britain and Germany 1 Many Europeans are outraged about US government surveillance programs
PLAN THE WORK Strategic Communication Planning for Not-for-Profit Organizations This handbook was produced by the Institute for Media, Policy and Civil Society for the Centre for Community Organizations
THE FUTURE OF INSURANCE IT INFRASTRUCTURE A SURVEY OF GLOBAL INSURANCE LEADERS This is an authorised reprint of an independently researched and executed report granted by Celent exclusively to Wipro Technologies.
ABC of Knowledge Management Freely extracted from the NHS National Library for Health for the FAO as a knowledge organization initiative at http://www.library.nhs.uk/knowledgemanagement/ Creator: NHS National
LOOKING OUT for the LOOKING OUT FOR THE FUTURE Katherine Fulton and Andrew Blau AN ORIENTATION FOR TWENTY-FIRST CENTURY PHILANTHROPISTS By Katherine Fulton and Andrew Blau Global Business Network and Monitor
Issue 4 Handling Inactive Data Efficiently 1 Editor s Note 3 Does this mean long term backup? NOTE FROM THE EDITOR S DESK: 4 Key benefits of archiving the data? 5 Does archiving file servers help? 6 Managing
Innovation Roles The People You Need for Successful Innovation A White Paper By Dean Hering Jeffrey Phillips NetCentrics Corporation November 1, 2005 NetCentrics 2005. All rights reserved. 1 Table of Contents
AN INTRODUCTION TO Data Science Jeffrey Stanton, Syracuse University INTRODUCTION TO DATA SCIENCE 2012, Jeffrey Stanton This book is distributed under the Creative Commons Attribution- NonCommercial-ShareAlike
Big Data Privacy Workshop Advancing the State of the Art in Technology and Practice co- hosted by The White House Office of Science & Technology Policy & Massachusetts Institute of Technology MIT Big Data