Big Data - Security and Privacy

Similar documents
NSF Workshop on Big Data Security and Privacy

Secure Computation Martin Beck

An Efficient and Secure Data Sharing Framework using Homomorphic Encryption in the Cloud

Associate Prof. Dr. Victor Onomza Waziri

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Security and Privacy in Big Data, Blessing or Curse?

Top Ten Security and Privacy Challenges for Big Data and Smartgrids. Arnab Roy Fujitsu Laboratories of America

Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG

Computing on Encrypted Data

NIST Big Data Public Working Group

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules

Securing Big Data Learning and Differences from Cloud Security

EU Threat Landscape Threat Analysis in Research ENISA Workshop Brussels 24th February 2015

Privacy and Security in Cloud Computing

Privacy-preserving Data-aggregation for Internet-of-things in Smart Grid

1. Understanding Big Data

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN

Survey on Securing Data using Homomorphic Encryption in Cloud Computing

Overview of Information Security. Murat Kantarcioglu

CONTROLLING DATA IN THE CLOUD: OUTSOURCING COMPUTATION WITHOUT OUTSOURCING CONTROL

Survey of Big Data Architecture and Framework from the Industry

Privacy-preserving Data Mining: current research and trends

Global Soft Solutions JAVA IEEE PROJECT TITLES

Digital Signatures. Prof. Zeph Grunschlag

Developing. and Securing. the Cloud. Bhavani Thuraisingham CRC. Press. Taylor & Francis Group. Taylor & Francis Croup, an Informs business

Paillier Threshold Encryption Toolbox

Advanced Cryptography

3-6 Toward Realizing Privacy-Preserving IP-Traceback


1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

Homomorphic encryption and emerging technologies COSC412

Boarding to Big data

DATA MINING - 1DL360

3rd International Symposium on Big Data and Cloud Computing Challenges (ISBCC-2016) March 10-11, 2016 VIT University, Chennai, India

A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA

Network Security. Computer Networking Lecture 08. March 19, HKU SPACE Community College. HKU SPACE CC CN Lecture 08 1/23

Security for Cloud & Big Data

preliminary experiment conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

Information and Network Security & Privacy Research

I. System Activities that Impact End User Privacy

Business Intelligence meets Big Data: An Overview on Security and Privacy

Cloud Data Storage Services Considering Public Audit for Security

Tackling The Challenges of Big Data. Tackling The Challenges of Big Data Big Data Systems. Security is a Negative Goal. Nickolai Zeldovich

WORKSHOP REPORT BIG DATA SECURITY AND PRIVACY Sponsored by the National Science Foundation September 16-17, 2014 The University of Texas at Dallas

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

CS377: Database Systems Data Security and Privacy. Li Xiong Department of Mathematics and Computer Science Emory University

Efficient Similarity Search over Encrypted Data

Turning Big Data into Big Decisions Delivering on the High Demand for Data

K-NN CLASSIFICATION OVER SECURE ENCRYPTED RELATIONAL DATA IN OUTSOURCED ENVIRONMENT

Information Security, PII and Big Data

Secure Deduplication of Encrypted Data without Additional Independent Servers

International Journal of Engineering Research ISSN: & Management Technology November-2015 Volume 2, Issue-6

Some Research Challenges for Big Data Analytics of Intelligent Security

A Fully Homomorphic Encryption Implementation on Cloud Computing

Fully homomorphic encryption equating to cloud security: An approach

Sunnie Chung. Cleveland State University

A Privacy-preserving Approach for Records Management in Cloud Computing. Eun Park and Benjamin Fung. School of Information Studies McGill University

IEEE JAVA Project 2012

Privacy-preserving Digital Identity Management for Cloud Computing

Security and Privacy for Web Databases and Services

Global Network Initiative Protecting and Advancing Freedom of Expression and Privacy in Information and Communications Technologies

Professor Radha Poovendran EE Department, University of Washington, Seattle, WA & Professor Dawn Song EECS Department, University of California,

Homomorphic Encryption Method Applied to Cloud Computing

How Can Data Sources Specify Their Security Needs to a Data Warehouse?

Security/Privacy Models for "Internet of things": What should be studied from RFID schemes? Daisuke Moriyama and Shin ichiro Matsuo NICT, Japan

IT Privacy Certification Outline of the Body of Knowledge (BOK) for the Certified Information Privacy Technologist (CIPT)

BIG DATA: BIG BOOST TO BIG TECH

Elements of Applied Cryptography Public key encryption

Chapter 23. Database Security. Security Issues. Database Security

Digital Identity Management

Big Data Working Group. Comment on Big Data and the Future of Privacy

I n t e r S y S t e m S W h I t e P a P e r F O R H E A L T H C A R E IT E X E C U T I V E S. In accountable care

Approaches for privacy-friendly Smart Metering: Architecture using homomorphic encryption and homomorphic MACs

Secure cloud access system using JAR ABSTRACT:

Privacy-Preserving In Big Data with Efficient Metrics

Privacy-Preserving Network Aggregation

Information Security in Big Data using Encryption and Decryption

CIS 6930 Emerging Topics in Network Security. Topic 2. Network Security Primitives

Data Outsourcing based on Secure Association Rule Mining Processes

Hey! Cross Check on Computation in Cloud

Encryption for Cloud Services Security: Problem or / CTO /

Data collection architecture for Big Data

Introduction. Digital Signature

Formal Methods for Preserving Privacy for Big Data Extraction Software

Privacy-Preserving Social Network Analysis for Criminal Investigations

IEEE JAVA TITLES

An Overview of Cloud & Big Data Technology and Security Implications. Tim Grance Senior Computer Scientist NIST Information Technology Laboratory

NEW CRYPTOGRAPHIC CHALLENGES IN CLOUD COMPUTING ERA

Lecture 10: CPA Encryption, MACs, Hash Functions. 2 Recap of last lecture - PRGs for one time pads

Introduction to Cyber Security / Information Security

Chapter 23. Database Security. Security Issues. Database Security

Global Network Initiative Protecting and Advancing Freedom of Expression and Privacy in Information and Communications Technologies

Security Aspects of. Database Outsourcing. Vahid Khodabakhshi Hadi Halvachi. Dec, 2012

Verifiable Outsourced Computations Outsourcing Computations to Untrusted Servers

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Analysis of Privacy-Preserving Element Reduction of Multiset

Cryptography for the Cloud

Improving data integrity on cloud storage services

ATTPS Publication: Trustworthy ICT Taxonomy

Transcription:

Big Data - Security and Privacy Elisa Bertino CS Department, Cyber Center, and CERIAS Purdue University Cyber Center!

Big Data EveryWhere! Lots of data is being collected, warehoused, and mined Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card transactions Social networks Surveillance devices and systems Embedded systems and IoT Drones

Industry View of Big Data! O Reilly Radar definition: " Big data is when the size of the data itself becomes part of the problem! EMC/IDC definition of big data: " Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-velocity capture, discovery, and/or analysis.! IBM view: three characteristics define big data: " Volume (Terabytes -> Zettabytes) " Variety (Structured -> Semi-structured -> Unstructured) " Velocity (Batch -> Streaming Data) Multi-source is another important characteristic Big data is often obtained by aggregating many small datasets from very large numbers of sources

Cyber Security Security information and event management (SIEM) Authentication (biometrics, federated digital identity management, continuous data authentication) Access control (e.g. attribute-based, location-based and contextbased access control) Insider threat (anomaly detection) and user monitoring Homeland Protection Identification of links and relationships among individuals in social networks Prediction of attacks Management of emergence and disasters Healthcare Monitoring and prevention of disease spreading Evidence-based healthcare Use of Data for Security

Privacy Risks Exchange and integration of data across multiple sources Data becomes available to multiple parties Re-identification of anonymized user data becomes easier Security tasks such as authentication and access control may require detailed information about users For example, location-based access control requires information about user location and may lead to collecting data about user mobility Continuous authentication requires collecting information such as typing speed, browsing habits, mouse movements

Can security and privacy be reconciled? And if so which are the methods and techniques that make this reconciliation possible?

Relevant Initiatives Internet Rights and Principles (IRP) Dynamic Coalition Developed a Charter of Human Rights and Principles for Internet Two salient principles 4. Expression and association: Everyone has the right to seek, receive, and impart information freely on the Internet without censorship or other interference. Everyone also has the right to associate freely through and on the Internet, for social, political, cultural or other purposes. 5. Privacy and data protection: Everyone has the right to privacy online. This includes freedom from surveillance, the right to use encryption, and the right to online anonymity. Everyone also has the right to data protection, including control over personal data collection, retention, processing, disposal and disclosure. http://internetrightsandprinciples.org/wpcharter. Accessed Sept. 14, 2014

Relevant Initiatives Global Network Initiative (GNI) Participant ICT companies: Google, Microsoft, and Yahoo! On privacy, companies agree to employ protections with respect to personal information in all countries where they operate in order to protect the privacy rights of users, and to respect and protect the privacy rights of users when confronted with government demands, laws and regulations that compromise privacy in a manner inconsistent with internationally recognized laws and standards. https://www.globalnetworkinitiative.org/sites/default/files/gni_brochure.pdf. Accessed Sept. 14, 2014

Relevant Initiatives GNI Implementation Guidelines Interpret government requests as narrowly as possible and challenge requests that are not legally binding Establish a clear policy and process in the company for evaluating and responding to government requests Inform users about the nature and volume of government demands and how the company responds to them ( transparency reporting ) Conduct human rights impact assessment prior to entering new markets, entering into new partnerships or launching new products in order to identify human rights risks in advance, and factor the conclusions of those assessments into the company s decision about whether and how to proceed On all matters where the company has control, make best efforts (technical, legal, operational) to mitigate the impact of government laws or other actions that violate international human rights standards http://globalnetworkinitiative.org/implementationguidelines/index.php

Can security and privacy can be reconciled? The research side

Privacy-Enhancing Techniques Significant examples Privacy-preserving data matching protocols based on hybrid strategies (by E. Bertino, M. Kantarcioglu et al.) open issues: Scalability Support for complex matching, such as semantic matching Definition of new security models

Significant examples Department of Computer Science Privacy-Enhancing Techniques Privacy-preserving collaborative data mining (earlier work by C. Clifton, M. Kantarcioglu et al. open issues: Scalability Privacy-preserving biometric authentication (by E. Bertino et al.) open issues: Reducing false rejection rates Using homomorphic techniques

Significant examples (con t) Department of Computer Science Privacy-preserving data management on the cloud CryptDB DBMask (by E. Bertino et al.) Issues: Weak security (CryptDB) Privacy-Enhancing Techniques Weak protection (or lack or protection) of access patterns

An Example Privacy-Preserving Complex Query Evaluation over Semantically Secure Encrypted Data Bharath K. Samanthula, Wei Jiang, and Elisa Bertino (ESORICS 2014)

Federated Cloud Model Two non-colluding semi-honest cloud service providers, denoted by C 1 and C 2 (they together form a federated cloud) Alice (data owner) generates (pk,sk), computes T using pk and outsources it to C 1, where T i,j = E pk (t i,j ), for 1 i n and 1 j m She also outsources sk to C 2

Problem Definition Bob issues a complex query Q to the cloud and wants to retrieve t i s that satisfy Q. Q is defined as a query with arbitrary number of subqueries where each sub-query consists of conjunctions of an arbitrary number of relational predicates Q : G 1 G 2 G l-1 G l {0,1} G j is a clause with a number b j of predicates and is given by P j,1 P j,2 P j, bj -1 P j,b j Eg: Q = ((Age 40) (Disease = Diabetes)) ((Sex = M) (Marital Status = Married) (Disease = Diabetes))

The Paillier Cryptosystem Additive homomorphic and probabilistic encryption scheme (E pk, D sk ): encryption and decryption functions Homomorphic addition: D sk (E pk (x+y)) = D sk (E pk (x)*e pk (y) mod N 2 ) Homomorphic multiplication: D sk (E pk (x*y)) = D sk (E pk (x) y mod N 2 ) Semantic security: Given a ciphertext, the adversary cannot deduce any information about the corresponding plaintext

Secure Primitives Secure Multiplication (SM): C 1 holds E pk (a), E pk (b) and C 2 holds sk, it computes E pk (a*b) Secure Bit-OR (SBOR): C 1 holds E pk (o 1 ), E pk (o 2 ) and C 2 holds sk, it computes E pk (o 1 o 2 ) Secure Comparison (SC): C 1 holds E pk (a), E pk (b) and C 2 holds sk, it computes E pk (c), where c =1 if a > b and c=0 otherwise. Here we assume 0 a,b < 2 w Note: the outputs are revealed only to C 1

Secure Multiplication Require: C 1 has E pk (a) and E pk (b); C 2 has sk 1.C 1 : (a). Pick two random numbers r a, r b Z N (b). a E pk (a) E pk (r a ) (c). b E pk (b) E pk (r b ); send a, b to C 2 2.C 2 : (a). Receive a and b from C 1 (b). h a D sk (a ) (c). h b D sk (b ) (d). h h a h b mod N (e). h E pk (h); send h to C 1 3.C1: (a). Receive h from C 2 (b). s h E pk (a) N rb (c). s s E pk (b) N ra (d). E pk (a b) s E pk (N r a r b ) Note: a b = (a + r a ) (b + r b ) a r b b r a r a r b

Evaluation of complex predicates Naïve solution Use SM to evaluate each clause G j Given E pk (P j,h (t i )), compute E pk (G j (t i )) = E pk (P j,1 (t i )... P j,bj (t i )) using SM Use SBOR to compute final query result Given E pk (G j (t i )), compute E pk (Q(t i )) = E pk (G 1 (t i )... G l (t i )) Expensive for large number of predicates and clauses

Evaluation of complex predicates Our solution To compute E pk (G j (t i )): Compute E pk (Σ h P j,h (t i )) Compare it with b j using SC Key Observation: G j (t i ) = 1 iff Σ h P j,h (t i ) = b j Example: G([p 1 (t i ) AND p 2 (t i ) AND p 3 (t i )] =1 (e.g. is true) if P 1 (t j )=1 AND P 2 (t j )=1 AND P 2 (t j )=1 that is if P 1 (t j )+P 2 (t j )+ P 2 (t j ) =3 To compute E pk (Q(t i )): Compute E pk (Σ j G j (t i )) Compare it with 0 using SC Key Observation: Q(t i ) = 1 iff Σ j G j (t i ) > 0 Example: G([p 1 (t i ) OR p 2 (t i )] =1 (e.g. is true) if P 1 (t j )=1 OR P 2 (t j )=1 that is if P 1 (t j )+P 2 (t j ) > 0

Future Work Implementation with MapReduce framework Extension to malicious setting In current work, we considered basic relational operators {<, >,,,=} Focus on other SQL queries, such as JOIN and GROUP BY, and evaluate their complexities

Research Agenda For which domains security with privacy is critical? Which are the policy issues related to the use of data for security? Ethical use of data Ownership of data perhaps we need to move from the notion of data owner to that of data stakeholder Is control by users something which is possible in all domains? Which research advances are needed to make it possible to reconcile security with privacy? Efficient techniques for performing computations on encrypted data Privacy-preserving data mining techniques Privacy-aware software engineering How do we balance personal privacy with collective security? Could a risk-based approach to this problem work? Partially based on results of the NSF Workshop on Big Data Security and Privacy, Dallas, TX, Sept. 16 th -17 th, 2014, http:// csi.utdallas.edu/events/nsf/nsf%20papers%202014.htm

Research Agenda (con t) Access control for big data techniques for: Automatically merging, and evolving large number of heterogeneous access control policies Automatic authorization administration Enforcing access control policies on heterogeneous multimedia data Privacy-preserving data correlation techniques Techniques to control what is extracted from multiple correlated datasets and to check that what is extracted can be shared and/or used Approaches for data services monetization How would the business model around data change if privacy-preserving data analytic tools were available? Also if data is considered as a good to be sold, are there regulations concerning contracts for buying/selling data? Can these contracts include privacy clauses be incorporated requiring for example that users to whom this data pertains to have been notified? Privacy implications on data quality

Thank You! Questions? Elisa Bertino bertino@cs.purdue.edu