Challenges of Data Privacy in the Era of Big Data. Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014

Size: px
Start display at page:

Download "Challenges of Data Privacy in the Era of Big Data. Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014"

Transcription

1 Challenges of Data Privacy in the Era of Big Data Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18,

2 Outline Why should we care? What is privacy? How do achieve privacy? Big Data and the challenge of privacy

3 Why Enormous amounts of data being collected Huge research potential Data sharing is important But

4 Acxiom IT knows who you are. It knows where you live. It knows what you do.

5 Acxiom Its servers process more than 50 trillion data transactions a year. Company executives have said its database contains information about 500 million active consumers worldwide, with about 1,500 data points per person. That includes a majority of adults in the United States. Such large-scale data mining and analytics based on information available in public records, consumer surveys and the like are perfectly legal. Acxiom s customers have included big banks like Wells Fargo and HSBC, investment services like E*Trade, automakers like Toyota and Ford, department stores like Macy s. If someone is listed as diabetic or pregnant, what is happening with this information? Where is the information going? she asks. We need to figure out what the rules should be as a society.

6 Data Privacy and Big Data

7 Smart Phones

8 Why?

9

10 What is privacy? Security? Anonymity? Confidentiality? Controlling the data? Ownership of the data? Eg. Health record patient?

11 Privacy in Statistical Databases People Database Queries Answer s Users Government, researchers, businesses (or) Malicious adversary Goal Utility - Release data with statistical utility Risk Preserve the privacy of individuals in the database!

12 Protecting Privacy Restricted access! Restricted data Anonymization Restrict the amount of information.! Combination of the two Interactive setting Synthetic databases 12

13 Anonymization

14 87% (216 million of 248 million) of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth} In general, few characteristics are needed to uniquely identify a person. Population Uniqueness Zip Code County Year of Birth 0.2 % 0 % Year and month of Birth 4.2 % 0.2 % Year, Month and Day 63.3 % 14.8 % Table 1 from: P. Golle, "Revisiting the Uniqueness of Simple Demographics in the U.S. Population", WPES 2006

15

16 The Netflix Prize Netflix Recommends Movies to its Subscribers Seeks improved recommendation system Offers $1,000,000 for 10% improvement Publishes de-identified training data! Auxiliary Information: Internet Movie Database (IMDb) Individuals may register for an account and rate movies Need not be anonymous Visible material includes ratings, dates, comments! A linkage attack: Narayanan & Shmatikov (2006) With 8 movie ratings (of which we allow 2 to be completely wrong) and dates that may have a 3-day error, 96% of Netflix subscribers whose records have been released can be uniquely identified in the dataset. 16

17 De-anonymizing Social Networks - Attacker knows two social nets whose membership partially overlaps - Additionally, a few seed' nodes that are in both networks - Idea: Attacker iteratively establishes correspondences between nodes Picture from Andreas Haeberlen s slides

18 Anonymity & Auxiliary Information Anonymity is required for privacy, but it is not sufficient Surprisingly little information needed to identify an individual 63% of U.S. citizens uniquely identifiable by DoB+zip code Unique Work/Home location Google - names are noise

19 Statistical Disclosure Limitation Methods Traditional approaches: Removing obvious identifiers/near-identifiers Data transformations: Matrix masking X = AXB + C, Noise addition Data suppression Deleting cases / sampling Cell suppression! Modern approaches:! Remote access servers Secure computation Synthetic data Partial information releases Rigorous privacy guarantee Privacy cartoon by Chris Slane at: cagle.msnbc.com/news/ PrivacyCartoons/main.asp 19

20 R-U Confidentiality Map Disclosure Risk No Data Original Data Released Data Maximum Tolerable Risk (Duncan, et al. 2001) Data Utility

21 Partial Data Release of Tabular Data Contingency tables: cross-classifies individuals by attributes Balance between data utility and disclosure risk Is it safe to release the row and columns sums? Education Level of Head of Household County Low Medium High Very High Total Alpha Beta Gamma Delta Total Delinquent Children by County & Education Level Source: OMB Statistical Policy Working Paper 22 21

22 Partial Data Release of Tabular Data Education Level of Head of Household County Low Medium High Very High Total Alpha Beta Gamma Delta Total ,272,363,056 tables have our margins (De Loera & Sturmfels). Bounds? Distributions? Data Source: OMB Statistical Policy Working Paper 22 & S. Roehrig 22

23 Synthetic Data Sampling & imputation technique so that released data look like actual data Does it guarantee confidentiality? Is it valid for inferences? It depends on the model used to generate the data Not unless we are careful in how it is synthesized Choosing the models and posterior distributions can be tricky 23

24 Modern Approach Privacy Guarantees? Earlier methods were ad-hoc, without any precise promises.! Absolute Privacy Releasing a dataset will leak no information about me.! Not possible if we want data to be useful

25 Defining Privacy/Confidentiality Releasing any data inevitably reveals something about a respondent! Dalenius (1977): If the release of statistics S makes it possible to determine the value (of private information) more accurately than is possible without access to S, a disclosure has taken place. Unachievable: Auxiliary information E.g. Vishesh is 2 years older than the average CMU postdoc! Dwork (2006): Differential Privacy 25

26 Cryptographic Example What is the average age of people in this room?! What if someone goes to the leave? What if we do the counting again? example by - Arvin Blum

27 Differential Privacy - Intuition! Lets you leave in peace Output is independent of my data I am okay with giving my data for the study, but I need to protect my privacy. Don t worry, no one will learn anything more about you than what they already know.

28 Differential privacy [Dwork, McSherry, Nissim, Smith 06] Hides presence/absence of an individual person People Database Users f(g) querie s answer s Government, researchers, businesses (or) Malicious adversary 28

29 Differential privacy ϵ-differential privacy: For all pairs of neighbors D, D and for all events S: f(d) f(d ) 29

30 Differential privacy Hides extreme outputs! ϵ - measures information leakage Handles arbitrary auxiliary information What can be done privately? Histograms, inference for simple models Convex optimization Recommendation Systems and much more!

31 Protecting Network Data What are we trying to protect? Nodes Edges Subgraphs?

32 Data Privacy for Network Data A lot of the data are already public! Name, current city, pictures, friends list etc. are now 'public information' on Facebook

33

34 Privacy and Facebook Create a facebook account As users, do we understand every time privacy settings change? Recent controversy: The FTC last year settled with Facebook, resolving charges that it had deceived users with changes to its privacy settings. State regulators recently fined Google for harvesting s and passwords of unsuspecting users during its Street View mapping project. White House proposed a privacy bill of rights to give consumers greater control over how their personal data is used. 34

35 Facebook and privacy Tracked a cohort of more than 5,000 undergrads and grads. People revealed more and more of their personal history responding to Facebook s prompts, they were also restricting who could see it. Over time, they were, on the whole, less likely to let everyone see their date of birth, for instance, and what high school they had attended. 35

36 Implications for Data Analyses Preserving confidentiality is a complex task. Need to be careful about how the data were generated noise addition model used for synthetic data generation Direct access to the dataset may be difficult If DP becomes the norm, need to be careful with the use of data due to limitations on the epsilon budget.

37 Privacy in Statistical Databases at CMU Main theme: integrating computer science and statistical approaches to data privacy! Faculty: Stephen Fienberg, Jing Lei, Rebecca Steorts Postdoc: Vishesh Karwa Privatization of social networks data Private blocking Privacy preserving record linkage Differential privacy and relations in machine learning Image ref: .html 37

38 Conclusion Lots of useful data with huge potential of answering scientific questions Comes with privacy risks! Lots of ongoing work to address these issues!! Questions??

CS346: Advanced Databases

CS346: Advanced Databases CS346: Advanced Databases Alexandra I. Cristea A.I.Cristea@warwick.ac.uk Data Security and Privacy Outline Chapter: Database Security in Elmasri and Navathe (chapter 24, 6 th Edition) Brief overview of

More information

Differentially Private Analysis of

Differentially Private Analysis of Title: Name: Affil./Addr. Keywords: SumOriWork: Differentially Private Analysis of Graphs Sofya Raskhodnikova, Adam Smith Pennsylvania State University Graphs, privacy, subgraph counts, degree distribution

More information

Privacy Techniques for Big Data

Privacy Techniques for Big Data Privacy Techniques for Big Data The Pros and Cons of Syntatic and Differential Privacy Approaches Dr#Roksana#Boreli# SMU,#Singapore,#May#2015# Introductions NICTA Australia s National Centre of Excellence

More information

Privacy and Data-Based Research

Privacy and Data-Based Research Journal of Economic Perspectives Volume 28, Number 2 Spring 2014 Pages 75 98 Privacy and Data-Based Research Ori Heffetz and Katrina Ligett On n August 9, 2006, the Technology section of the New York Times

More information

Database Security. The Need for Database Security

Database Security. The Need for Database Security Database Security Public domain NASA image L-1957-00989 of people working with an IBM type 704 electronic data processing machine. 1 The Need for Database Security Because databases play such an important

More information

Discussion on papers on anonymisation

Discussion on papers on anonymisation Discussion on papers on anonymisation Josep Domingo-Ferrer 1 and Aleksandra Slavkovic 2 1 Universitat Rovira i Virgili, Tarragona josep.domingo@urv.cat 2 Pennsylvania State University sesa@stat.psu.edu

More information

Principles and Best Practices for Sharing Data from Environmental Health Research: Challenges Associated with Data-Sharing: HIPAA De-identification

Principles and Best Practices for Sharing Data from Environmental Health Research: Challenges Associated with Data-Sharing: HIPAA De-identification Principles and Best Practices for Sharing Data from Environmental Health Research: Challenges Associated with Data-Sharing: HIPAA De-identification Daniel C. Barth-Jones, M.P.H., Ph.D Assistant Professor

More information

Differential privacy in health care analytics and medical research An interactive tutorial

Differential privacy in health care analytics and medical research An interactive tutorial Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could

More information

Practicing Differential Privacy in Health Care: A Review

Practicing Differential Privacy in Health Care: A Review TRANSACTIONS ON DATA PRIVACY 5 (2013) 35 67 Practicing Differential Privacy in Health Care: A Review Fida K. Dankar*, and Khaled El Emam* * CHEO Research Institute, 401 Smyth Road, Ottawa, Ontario E mail

More information

Differential Privacy Tutorial Simons Institute Workshop on Privacy and Big Data. Katrina Ligett Caltech

Differential Privacy Tutorial Simons Institute Workshop on Privacy and Big Data. Katrina Ligett Caltech Differential Privacy Tutorial Simons Institute Workshop on Privacy and Big Data Katrina Ligett Caltech 1 individuals have lots of interesting data... 12 37-5 π 2 individuals have lots of interesting data...

More information

Policy-based Pre-Processing in Hadoop

Policy-based Pre-Processing in Hadoop Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden yi.cheng@ericsson.com, christian.schaefer@ericsson.com Abstract While big data analytics provides

More information

Privacy-preserving Data-aggregation for Internet-of-things in Smart Grid

Privacy-preserving Data-aggregation for Internet-of-things in Smart Grid Privacy-preserving Data-aggregation for Internet-of-things in Smart Grid Aakanksha Chowdhery Postdoctoral Researcher, Microsoft Research ac@microsoftcom Collaborators: Victor Bahl, Ratul Mahajan, Frank

More information

CS377: Database Systems Data Security and Privacy. Li Xiong Department of Mathematics and Computer Science Emory University

CS377: Database Systems Data Security and Privacy. Li Xiong Department of Mathematics and Computer Science Emory University CS377: Database Systems Data Security and Privacy Li Xiong Department of Mathematics and Computer Science Emory University 1 Principles of Data Security CIA Confidentiality Triad Prevent the disclosure

More information

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG 1 The Big Data Working Group (BDWG) will be identifying scalable techniques for data-centric security and privacy problems. BDWG s investigation

More information

Survey of Research on Information Security in Big Data

Survey of Research on Information Security in Big Data Survey of Research on Information Security in Big Data Zhang Hongjun 1, Hao Wenning 1, He Dengchao 1, Mao Yuxing 1 1 PLA university of Industry and Technology Nan Jing, China hdchao1989@163.com Abstract.

More information

Statistical Data Stewardship in the 21st Century: An Academic Perspective

Statistical Data Stewardship in the 21st Century: An Academic Perspective Statistical Data Stewardship in the 21st Century: An Academic Perspective George T. Duncan Carnegie Mellon University Joint Statistical Meetings New York City 2002 August 11 863 Statistical Data Stewardship

More information

future proof data privacy

future proof data privacy 2809 Telegraph Avenue, Suite 206 Berkeley, California 94705 leapyear.io future proof data privacy Copyright 2015 LeapYear Technologies, Inc. All rights reserved. This document does not provide you with

More information

Privacy Issues and Data Protection in Technology Enhanced Learning. Seda Gürses COSIC, K.U. Leuven datatel 2011 Alpines Rendez-vous

Privacy Issues and Data Protection in Technology Enhanced Learning. Seda Gürses COSIC, K.U. Leuven datatel 2011 Alpines Rendez-vous Privacy Issues and Data Protection in Technology Enhanced Learning Seda Gürses COSIC, K.U. Leuven datatel 2011 Alpines Rendez-vous 1 - mendeley: - group: privacy and datatel - slides: - after talk: - http://www.esat.kuleuven.be/~sguerses/

More information

A Practical Application of Differential Privacy to Personalized Online Advertising

A Practical Application of Differential Privacy to Personalized Online Advertising A Practical Application of Differential Privacy to Personalized Online Advertising Yehuda Lindell Eran Omri Department of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il,omrier@gmail.com

More information

The Algorithmic Foundations of Differential Privacy

The Algorithmic Foundations of Differential Privacy Foundations and Trends R in Theoretical Computer Science Vol. 9, Nos. 3 4 (2014) 211 407 c 2014 C. Dwork and A. Roth DOI: 10.1561/0400000042 The Algorithmic Foundations of Differential Privacy Cynthia

More information

ACCESS METHODS FOR UNITED STATES MICRODATA

ACCESS METHODS FOR UNITED STATES MICRODATA ACCESS METHODS FOR UNITED STATES MICRODATA Daniel Weinberg, US Census Bureau John Abowd, US Census Bureau and Cornell U Sandra Rowland, US Census Bureau (retired) Philip Steel, US Census Bureau Laura Zayatz,

More information

Top Ten Security and Privacy Challenges for Big Data and Smartgrids. Arnab Roy Fujitsu Laboratories of America

Top Ten Security and Privacy Challenges for Big Data and Smartgrids. Arnab Roy Fujitsu Laboratories of America 1 Top Ten Security and Privacy Challenges for Big Data and Smartgrids Arnab Roy Fujitsu Laboratories of America 2 User Roles and Security Concerns [SKCP11] Users and Security Concerns [SKCP10] Utilities:

More information

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26

Big Data and Privacy. Fritz Henglein Dept. of Computer Science, University of Copenhagen. Finance IT Day Riga, 2015-03-26 Big Data and Privacy Fritz Henglein Dept. of Computer Science, University of Copenhagen Finance IT Day Riga, 2015-03-26 About me Professor, Programming Languages and Systems, University of Copenhagen Director,

More information

Big Data and Consumer Privacy in the Internet Economy 79 Fed. Reg. 32714 (Jun. 6, 2014)

Big Data and Consumer Privacy in the Internet Economy 79 Fed. Reg. 32714 (Jun. 6, 2014) Big Data and Consumer Privacy in the Internet Economy 79 Fed. Reg. 32714 (Jun. 6, 2014) Comment of Solon Barocas, Edward W. Felten, Joanna N. Huey, Joshua A. Kroll, and Arvind Narayanan Thank you for the

More information

Li Xiong, Emory University

Li Xiong, Emory University Healthcare Industry Skills Innovation Award Proposal Hippocratic Database Technology Li Xiong, Emory University I propose to design and develop a course focused on the values and principles of the Hippocratic

More information

No Free Lunch in Data Privacy

No Free Lunch in Data Privacy No Free Lunch in Data Privacy Daniel Kifer Penn State University dan+sigmod11@cse.psu.edu Ashwin Machanavajjhala Yahoo! Research mvnak@yahoo-inc.com ABSTRACT Differential privacy is a powerful tool for

More information

Computer Security (EDA263 / DIT 641)

Computer Security (EDA263 / DIT 641) Computer Security (EDA263 / DIT 641) Lecture 12: Database Security Erland Jonsson Department of Computer Science and Engineering Chalmers University of Technology Sweden Outline Introduction to databases

More information

(Big) Data Anonymization Claude Castelluccia Inria, Privatics

(Big) Data Anonymization Claude Castelluccia Inria, Privatics (Big) Data Anonymization Claude Castelluccia Inria, Privatics BIG DATA: The Risks Singling-out/ Re-Identification: ADV is able to identify the target s record in the published dataset from some know information

More information

Differential Privacy Preserving Spectral Graph Analysis

Differential Privacy Preserving Spectral Graph Analysis Differential Privacy Preserving Spectral Graph Analysis Yue Wang, Xintao Wu, and Leting Wu University of North Carolina at Charlotte, {ywang91, xwu, lwu8}@uncc.edu Abstract. In this paper, we focus on

More information

Tomislav Križan Consultancy Poslovna Inteligencija d.o.o

Tomislav Križan Consultancy Poslovna Inteligencija d.o.o In-Situ Anonymization of Big Data Tomislav Križan Consultancy Director @ Poslovna Inteligencija d.o.o Abstract Vast amount of data is being generated from versatile sources and organizations are primarily

More information

Andree E. Widjaja Jengchung Victor Chen

Andree E. Widjaja Jengchung Victor Chen Andree E. Widjaja Jengchung Victor Chen Institute of International Management National Cheng Kung University, Tainan, Taiwan Andree/Victor 1 Agenda Introduction Cloud Computing Information Security and

More information

WEBSITE PRIVACY POLICY. Last modified 10/20/11

WEBSITE PRIVACY POLICY. Last modified 10/20/11 WEBSITE PRIVACY POLICY Last modified 10/20/11 1. Introduction 1.1 Questions. This website is owned and operated by. If you have any questions or concerns about our Privacy Policy, feel free to email us

More information

Aircloak Analytics: Anonymized User Data without Data Loss

Aircloak Analytics: Anonymized User Data without Data Loss Aircloak Analytics: Anonymized User Data without Data Loss An Aircloak White Paper Companies need to protect the user data they store for business analytics. Traditional data protection, however, is costly

More information

No silver bullet: De-identification still doesn't work

No silver bullet: De-identification still doesn't work No silver bullet: De-identification still doesn't work Arvind Narayanan arvindn@cs.princeton.edu Edward W. Felten felten@cs.princeton.edu July 9, 2014 Paul Ohm s 2009 article Broken Promises of Privacy

More information

Technical Approaches for Protecting Privacy in the PCORnet Distributed Research Network V1.0

Technical Approaches for Protecting Privacy in the PCORnet Distributed Research Network V1.0 Technical Approaches for Protecting Privacy in the PCORnet Distributed Research Network V1.0 Guidance Document Prepared by: PCORnet Data Privacy Task Force Submitted to the PMO Approved by the PMO Submitted

More information

NORTH CAROLINA DEPARTMENT OF PUBLIC INSTRUCTION. Division of Data, Research and Federal Policy July 29, 2013

NORTH CAROLINA DEPARTMENT OF PUBLIC INSTRUCTION. Division of Data, Research and Federal Policy July 29, 2013 NORTH CAROLINA DEPARTMENT OF PUBLIC INSTRUCTION Transmitting Private Information Electronically Best Practices Guide for Communicating Personally Identifiable Information by Email, Fax or Other Electronic

More information

UNILEVER PRIVACY PRINCIPLES UNILEVER PRIVACY POLICY

UNILEVER PRIVACY PRINCIPLES UNILEVER PRIVACY POLICY UNILEVER PRIVACY PRINCIPLES Unilever takes privacy seriously. The following five principles underpin our approach to respecting your privacy: 1. We value the trust that you place in us by giving us your

More information

Secure Thinking Bigger Data. Bigger risk?

Secure Thinking Bigger Data. Bigger risk? Secure Thinking Bigger Data. Bigger risk? MALWARE HACKERS REPUTATION PROTECTION RISK THEFT There has always been data. What is different now is the scale and speed of data growth. Every day we create 2.5

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

More information

Big Data Big Security Problems? Ivan Damgård, Aarhus University

Big Data Big Security Problems? Ivan Damgård, Aarhus University Big Data Big Security Problems? Ivan Damgård, Aarhus University Content A survey of some security and privacy issues related to big data. Will organize according to who is collecting/storing data! Intelligence

More information

The Need for Training in Big Data: Experiences and Case Studies

The Need for Training in Big Data: Experiences and Case Studies The Need for Training in Big Data: Experiences and Case Studies Guy Lebanon Amazon Background and Disclaimer All opinions are mine; other perspectives are legitimate. Based on my experience as a professor

More information

Shroudbase Technical Overview

Shroudbase Technical Overview Shroudbase Technical Overview Differential Privacy Differential privacy is a rigorous mathematical definition of database privacy developed for the problem of privacy preserving data analysis. Specifically,

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Privacy Committee. Privacy and Open Data Guideline. Guideline. Of South Australia. Version 1

Privacy Committee. Privacy and Open Data Guideline. Guideline. Of South Australia. Version 1 Privacy Committee Of South Australia Privacy and Open Data Guideline Guideline Version 1 Executive Officer Privacy Committee of South Australia c/o State Records of South Australia GPO Box 2343 ADELAIDE

More information

Securing Big Data Learning and Differences from Cloud Security

Securing Big Data Learning and Differences from Cloud Security Securing Big Data Learning and Differences from Cloud Security Samir Saklikar RSA, The Security Division of EMC Session ID: DAS-108 Session Classification: Advanced Agenda Cloud Computing & Big Data Similarities

More information

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG 1 Security Analytics Crypto and Privacy Technologies Infrastructure Security 60+ members Framework and Taxonomy Chair - Sree Rajan, Fujitsu

More information

DRAFT NISTIR 8053 De-Identification of Personally Identifiable Information

DRAFT NISTIR 8053 De-Identification of Personally Identifiable Information 1 2 3 4 5 6 7 8 DRAFT NISTIR 8053 De-Identification of Personally Identifiable Information Simson L. Garfinkel 9 10 11 12 13 14 15 16 17 18 NISTIR 8053 DRAFT De-Identification of Personally Identifiable

More information

ICS 351: Today's plan. DNS WiFi

ICS 351: Today's plan. DNS WiFi ICS 351: Today's plan DNS WiFi Domain Name System Hierarchical system of names top-level domain names include.edu,.org,.com,.net, and many country top-level domains root is just "." so the fully qualified

More information

Big Data and Open Data

Big Data and Open Data Big Data and Open Data Bebo White SLAC National Accelerator Laboratory/ Stanford University!! bebo@slac.stanford.edu dekabytes hectobytes Big Data IS a buzzword! The Data Deluge From the beginning of

More information

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence

INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence INTRODUCTION TO BUSINESS INTELLIGENCE What to consider implementing a Data Warehouse and Business Intelligence Summary: This note gives some overall high-level introduction to Business Intelligence and

More information

MANAGED WORKSTATIONS: Keeping your IT running

MANAGED WORKSTATIONS: Keeping your IT running MANAGED WORKSTATIONS: Keeping your IT running What state are your PCs in? Systems running slowly? PCs or laptops crashing for no reason? Too much time trying to resolve simple IT issues? Out-of-date software?

More information

De-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013

De-identification Koans. ICTR Data Managers Darren Lacey January 15, 2013 De-identification Koans ICTR Data Managers Darren Lacey January 15, 2013 Disclaimer There are several efforts addressing this issue in whole or part Over the next year or so, I believe that the conversation

More information

A Practical Differentially Private Random Decision Tree Classifier

A Practical Differentially Private Random Decision Tree Classifier 273 295 A Practical Differentially Private Random Decision Tree Classifier Geetha Jagannathan, Krishnan Pillaipakkamnatt, Rebecca N. Wright Department of Computer Science, Columbia University, NY, USA.

More information

Privacy Preserving Similarity Evaluation of Time Series Data

Privacy Preserving Similarity Evaluation of Time Series Data Privacy Preserving Similarity Evaluation of Time Series Data Haohan Zhu Department of Computer Science Boston University zhu@cs.bu.edu Xianrui Meng Department of Computer Science Boston University xmeng@cs.bu.edu

More information

When Security, Privacy and Forensics Meet in the Cloud

When Security, Privacy and Forensics Meet in the Cloud When Security, Privacy and Forensics Meet in the Cloud Dr. Michaela Iorga, Senior Security Technical Lead for Cloud Computing Co-Chair, Cloud Security WG Co-Chair, Cloud Forensics Science WG March 26,

More information

The Challenge of Commercial Data Mining in Public Sector Cloud Services

The Challenge of Commercial Data Mining in Public Sector Cloud Services The Challenge of Commercial Data Mining in Public Sector Cloud Services SafeGov.org jeff.gould@safegov.org SafeGov Non-Profit Organization whose mission is to promote safe and secure cloud computing in

More information

STATE OF HAWAI I INFORMATION PRIVACY AND SECURITY COUNCIL

STATE OF HAWAI I INFORMATION PRIVACY AND SECURITY COUNCIL STATE OF HAWAI I INFORMATION PRIVACY AND SECURITY COUNCIL Category Security, Breach Title Breach Best Practices Document: IPSC2009-02 Revision: 2009.08.28-01 Posted URL: http://ipsc.hawaii.gov Status Under

More information

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges

Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Big Data, Official Statistics and Social Science Research: Emerging Data Challenges Professor Paul Cheung Director, United Nations Statistics Division Building the Global Information System Elements of

More information

Introduction to DISC and Hadoop

Introduction to DISC and Hadoop Introduction to DISC and Hadoop Alice E. Fischer April 24, 2009 Alice E. Fischer DISC... 1/20 1 2 History Hadoop provides a three-layer paradigm Alice E. Fischer DISC... 2/20 Parallel Computing Past and

More information

Dude, Where's My Car? And Other Questions in Context-Awareness

Dude, Where's My Car? And Other Questions in Context-Awareness Dude, Where's My Car? And Other Questions in Context-Awareness Jason I. Hong James A. Landay Group for User Interface Research University of California at Berkeley The Context Fabric: Infrastructure Support

More information

Dawn Song dawnsong@cs.berkeley.edu

Dawn Song dawnsong@cs.berkeley.edu Privacy and Anonymity Dawn Song dawnsong@cs.berkeley.edu Current state of the world II EU directive 2006/24/EC: 3 year data retention For ALL traffic, requires EU ISPs to record:» Sufficient information

More information

Triangle Census Research Data Center Notes from information sessions

Triangle Census Research Data Center Notes from information sessions Triangle Census Research Data Center Notes from information sessions About TCRDC Administrator Bert Grider visited Appalachian State University on February 28, 2011 to tell researchers about the resources

More information

The Challenges of Effectively Anonymizing Network Data

The Challenges of Effectively Anonymizing Network Data The Challenges of Effectively Anonymizing Network Data Scott E. Coull Fabian Monrose Michael K. Reiter Michael Bailey Johns Hopkins University Baltimore, MD coulls@cs.jhu.edu University of North Carolina

More information

Privacy Patterns in Public Clouds

Privacy Patterns in Public Clouds Privacy Patterns in Public Clouds Sashank Dara Security Technologies Group, Cisco Systems, Bangalore email: krishna.sashank@gmail.com January 25, 2014 Abstract Internet users typically consume a wide range

More information

In fact, one of the biggest challenges that the evolution of the Internet is facing today, is related to the question of Identity Management [1].

In fact, one of the biggest challenges that the evolution of the Internet is facing today, is related to the question of Identity Management [1]. 1. Introduction Using the Internet has become part of the daily habits of a constantly growing number of people, and there are few human activities that can be performed without accessing the enormous

More information

Personalization vs. Privacy in Big Data Analysis

Personalization vs. Privacy in Big Data Analysis Personalization vs. Privacy in Big Data Analysis Benjamin Habegger 1, Omar Hasan 1, Lionel Brunie 1, Nadia Bennani 1, Harald Kosch 2, Ernesto Damiani 3 1 University of Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205,

More information

ICTR Cloud Efforts developing canonical SIGINT analytics, finding hard targets and exploratory data analysis at scale

ICTR Cloud Efforts developing canonical SIGINT analytics, finding hard targets and exploratory data analysis at scale ICTR Cloud Efforts developing canonical SIGINT analytics, finding hard targets and exploratory data analysis at scale Data Mining Research ICTR, GCHQ Dr Building a SIGINT toolbox for BIG DATA Cloud analytics

More information

Airavat: Security and Privacy for MapReduce

Airavat: Security and Privacy for MapReduce Airavat: Security and Privacy for MapReduce Indrajit Roy Srinath T.V. Setty Ann Kilzer Vitaly Shmatikov Emmett Witchel The University of Texas at Austin {indrajit, srinath, akilzer, shmat, witchel}@cs.utexas.edu

More information

Attack and defense. 2nd ATE Symposium University of New South Wales Business School December, 2014

Attack and defense. 2nd ATE Symposium University of New South Wales Business School December, 2014 Attack and defense Simona Fabrizi 1 Steffen Lippert 2 José Rodrigues-Neto 3 1 Massey University 2 University of Auckland 3 Australian National University 2nd ATE Symposium University of New South Wales

More information

ARX A Comprehensive Tool for Anonymizing Biomedical Data

ARX A Comprehensive Tool for Anonymizing Biomedical Data ARX A Comprehensive Tool for Anonymizing Biomedical Data Fabian Prasser, Florian Kohlmayer, Klaus A. Kuhn Chair of Biomedical Informatics Institute of Medical Statistics and Epidemiology Rechts der Isar

More information

Information Security in Big Data using Encryption and Decryption

Information Security in Big Data using Encryption and Decryption International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor

More information

Limits of Computational Differential Privacy in the Client/Server Setting

Limits of Computational Differential Privacy in the Client/Server Setting Limits of Computational Differential Privacy in the Client/Server Setting Adam Groce, Jonathan Katz, and Arkady Yerukhimovich Dept. of Computer Science University of Maryland {agroce, jkatz, arkady}@cs.umd.edu

More information

DESTINATION MELBOURNE PRIVACY POLICY

DESTINATION MELBOURNE PRIVACY POLICY DESTINATION MELBOURNE PRIVACY POLICY 2 Destination Melbourne Privacy Policy Statement Regarding Privacy Policy Destination Melbourne Limited recognises the importance of protecting the privacy of personally

More information

The Social Impact of Open Data

The Social Impact of Open Data United States of America Federal Trade Commission The Social Impact of Open Data Remarks of Maureen K. Ohlhausen 1 Commissioner, Federal Trade Commission Center for Data Innovation The Social Impact of

More information

On the features and challenges of security and privacy in distributed internet of things. C. Anurag Varma achdc@mst.edu CpE 6510 3/24/2016

On the features and challenges of security and privacy in distributed internet of things. C. Anurag Varma achdc@mst.edu CpE 6510 3/24/2016 On the features and challenges of security and privacy in distributed internet of things C. Anurag Varma achdc@mst.edu CpE 6510 3/24/2016 Outline Introduction IoT (Internet of Things) A distributed IoT

More information

Managing Incompleteness, Complexity and Scale in Big Data

Managing Incompleteness, Complexity and Scale in Big Data Managing Incompleteness, Complexity and Scale in Big Data Nick Duffield Electrical and Computer Engineering Texas A&M University http://nickduffield.net/work Three Challenges for Big Data Complexity Problem:

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

Tammy Pirmann HS CS teacher in PA NSF RET in Big Data with Temple University Teach CS Principles course

Tammy Pirmann HS CS teacher in PA NSF RET in Big Data with Temple University Teach CS Principles course Tammy Pirmann HS CS teacher in PA NSF RET in Big Data with Temple University Teach CS Principles course Slobodan Vucetic Temple University NSF research project involving Big Data education through the

More information

2. A Note about Children. We do not intentionally gather Personal Data from visitors who are under the age of 13.

2. A Note about Children. We do not intentionally gather Personal Data from visitors who are under the age of 13. PRIVACY POLICY Macromeasures Inc. ("Macromeasures") is committed to protecting your privacy. We have prepared this Privacy Policy to describe to you our practices regarding the Personal Data (as defined

More information

Formal Methods for Preserving Privacy for Big Data Extraction Software

Formal Methods for Preserving Privacy for Big Data Extraction Software Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability

More information

NoSQL Database Options

NoSQL Database Options NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has

More information

Privacy and Health Information Technology

Privacy and Health Information Technology Privacy and Health Information Technology Testimony of Dr. Alan F. Westin Director of the Program on Information Technology, Health Records and Privacy Professor of Public Law and Government Emeritus,

More information

Big Data Security and Privacy

Big Data Security and Privacy Big Data Security and Privacy Kevin T. Smith, Novetta Solutions AFCEA CyberSecurity Symposium 2014 June 25, 2014 Ksmith Novetta.com KevinTSmith Comcast.Net Big Data With the increase of computing

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

RE-IDENTIFICATION RISK IN SWAPPED MICRODATA RELEASE

RE-IDENTIFICATION RISK IN SWAPPED MICRODATA RELEASE RE-IDENTIFICATION RISK IN SWAPPED MICRODATA RELEASE Krish Muralidhar, Department of Marketing and Supply Chain Management, University of Oklahoma, Norman, OK 73019, (405)-325-2677, krishm@ou.edu ABSTRACT

More information

Identifying Broken Business Processes

Identifying Broken Business Processes Identifying Broken Business Processes A data-centric approach to defining, identifying, and enforcing protection of sensitive documents at rest, in motion, and in use 6/07 I www.vericept.com Abstract The

More information

Privacy and Transparency for Decision Making. Simone Fischer-Hübner Karlstad University, Sweden MDAI 2015

Privacy and Transparency for Decision Making. Simone Fischer-Hübner Karlstad University, Sweden MDAI 2015 Privacy and Transparency for Decision Making Simone Fischer-Hübner Karlstad University, Sweden MDAI 2015 Content I. Profiling, Big Data & Decision Making - Privacy Challenges II. III. IV. Peer Profiling

More information

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12

MapReduce Algorithms. Sergei Vassilvitskii. Saturday, August 25, 12 MapReduce Algorithms A Sense of Scale At web scales... Mail: Billions of messages per day Search: Billions of searches per day Social: Billions of relationships 2 A Sense of Scale At web scales... Mail:

More information

Hui(Wendy) Wang Stevens Institute of Technology New Jersey, USA. VLDB Cloud Intelligence workshop, 2012

Hui(Wendy) Wang Stevens Institute of Technology New Jersey, USA. VLDB Cloud Intelligence workshop, 2012 Integrity Verification of Cloud-hosted Data Analytics Computations (Position paper) Hui(Wendy) Wang Stevens Institute of Technology New Jersey, USA 9/4/ 1 Data-Analytics-as-a- Service (DAaS) Outsource:

More information

Introduction to predictive modeling and data mining

Introduction to predictive modeling and data mining Introduction to predictive modeling and data mining Rebecca C. Steorts Predictive Modeling and Data Mining: STA 521 August 25 2015 1 Today s Menu 1. Brief history of data science (from slides of Bin Yu)

More information

Aircloak Anonymized Analytics: Better Data, Better Intelligence

Aircloak Anonymized Analytics: Better Data, Better Intelligence Aircloak Anonymized Analytics: Better Data, Better Intelligence An Aircloak White Paper Data is the new gold. The amount of valuable data produced by users today through smartphones, wearable devices,

More information

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014

Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014 Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)

More information

REVIEW OF SECURITY AND PRIVACY ISSUES IN CLOUD STORAGE SYSTEM

REVIEW OF SECURITY AND PRIVACY ISSUES IN CLOUD STORAGE SYSTEM International Journal of Computer Science and Engineering (IJCSE) ISSN(P): 2278-9960; ISSN(E): 2278-9979 Vol. 2, Issue 5, Nov 2013, 55-60 IASET REVIEW OF SECURITY AND PRIVACY ISSUES IN CLOUD STORAGE SYSTEM

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

HKU Big Data and Privacy Workshop. Privacy Risks of Big Data Analytics From a Regulator s Point of View

HKU Big Data and Privacy Workshop. Privacy Risks of Big Data Analytics From a Regulator s Point of View HKU Big Data and Privacy Workshop Privacy Risks of Big Data Analytics From a Regulator s Point of View 30 November 2015 Henry Chang, IT Advisor Office of the Privacy Commissioner for Personal Data, Hong

More information

Big Data Big Privacy. Setting the scene. Big Data; Big Privacy 29 April 2013 Privacy Awareness Week 2013 Launch.

Big Data Big Privacy. Setting the scene. Big Data; Big Privacy 29 April 2013 Privacy Awareness Week 2013 Launch. Big Data Big Privacy Privacy Awareness Week SPEAKING NOTES Stephen Wilson Lockstep Group Setting the scene Practical experience shows a gap in the understanding that technologists as a class have regarding

More information

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger

Grand Challenges Making Drill Down Analysis of the Economy a Reality. John Haltiwanger Grand Challenges Making Drill Down Analysis of the Economy a Reality By John Haltiwanger The vision Here is the vision. A social scientist or policy analyst (denoted analyst for short hereafter) is investigating

More information

Probabilistic Prediction of Privacy Risks

Probabilistic Prediction of Privacy Risks Probabilistic Prediction of Privacy Risks in User Search Histories Joanna Biega Ida Mele Gerhard Weikum PSBD@CIKM, Shanghai, 07.11.2014 Or rather: On diverging towards user-centric privacy Traditional

More information

Business Intelligence meets Big Data: An Overview on Security and Privacy

Business Intelligence meets Big Data: An Overview on Security and Privacy Business Intelligence meets Big Data: An Overview on Security and Privacy Claudio A. Ardagna Ernesto Damiani Dipartimento di Informatica - Università degli Studi di Milano NSF Workshop on Big Data Security

More information