CS377: Database Systems Data Security and Privacy Li Xiong Department of Mathematics and Computer Science Emory University 1
Principles of Data Security CIA Confidentiality Triad Prevent the disclosure of information to unauthorized users Integrity Prevent improper modification Availability Make data available to legitimate users 2
Security Measures Access control Restrict access to the (subset or view of) data to authorized users Inference control Restrict inference from accessible data to additional data Flow control Prevent information flowing from authorized use to unauthorized use Encryption Use cryptography to protect information from unauthorized disclosure while in transmit and in storage 3
Access Control Identification and Authentication Verify identity - who you are Something you know, something you have, something you are Authorization Specify access rights - what you can access Access control policies Discretionary access control Mandatory access control Role based access control 4
Discretionary AC Restricts access to objects based on the identity of users Access matrix model Row: subject (users, accounts, programs) Column: object (relations, records, attributes, views) Entry: privileges that subject i hold on object j (select, update) Individuals Resources Database 1 database 2 database 3 5
Mandatory AC Governs access based on the classification of subjects and objects Assign a security level to all information sensitivity of information Assign a security level to each user security clearance Military and government: Top secret (TS) > secret (S) > confidential (C) > unclassified (U) Access principles Read Down a subject s clearance must dominate the security level of the object being read Write Up a subject s clearance must be dominated by the security level of the object being written 6
Mandatory AC (cont) Information can only be accessed by users of the same or higher class Information can only flow to the same or higher class Write up Read down Individuals TS S C U Resources/Information Database 1 TS Database 2 S Database 3 C 7
Discretionary Access Control vs Mandatory Access Control Discretionary Access Control (DAC) high degree of flexibility No information flow control - vulnerability to malicious attacks, such as Trojan horses embedded in application programs. Mandatory Access Control (MAC) high degree of protection in a way, prevent illegal information flow too rigid and applicable in limited environments. In many practical situations, DAC is preferred RBAC 8
Role-Based AC Governs the access based on roles Users are assigned to appropriate roles Privileges are associated with roles Individuals Roles Resources Role 1 Database 1 Role 2 Database 2 Role 3 Database 3 9
ROLES A role is typicall a job function with some associated semantics regarding responsibility and authority (permissions). Developer Help Desk Representative Budget Manager Director 10
UA (user assignment) USERS set A user can be assigned to one or more roles ROLES set Developer A role can be assigned to one or more users Help Desk Rep 11
PA (permission assignment) PRMS set A permission can be assigned to one or more roles ROLES set Create Delete Drop View Update Append A role can be assigned to one or more prms developer Help Desk Rep 12
Role Hierarchies - Tree Hierarchies Director Project Lead 1 Project Lead 2 Production Engineer 1 Quality Engineer 1 Production Engineer 2 Quality Engineer 2 13
Role-based Access Control Benefits Authorization management assigning users to roles and assigning access rights to roles Hierarchical roles Inheritance of privileges based on hierarchy of roles Least privilege allow a user to sign on with least privilege required for a particular task Separation of duties no single user should be given enough privileges 14
Access Control in a Database System Typical DBMS enforces discretionary access control based on the granting and revoking privileges. The account level: DBA specifies the particular privileges that each account holds independently of the relations in the database. The relation level (or table level): DBA specifies the privilege to access each individual relation or view in the database. 15
Types of Discretionary Privileges The privileges at the account level apply to the capabilities provided to the account itself and can include the CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base relation; the CREATE VIEW privilege; the ALTER privilege, to apply schema changes such adding or removing attributes from relations; the DROP privilege, to delete relations or views; the MODIFY privilege, to insert, delete, or update tuples; and the SELECT privilege, to retrieve information from the database by using a SELECT query. 16
Types of Discretionary Privileges In SQL the following types of privileges can be granted on each individual relation R: SELECT (retrieval or read) privilege on R: Gives the account retrieval privilege. MODIFY privileges on R: This gives the account the capability to modify tuples of R. In SQL this privilege is further divided into UPDATE, DELETE, and INSERT privileges to apply the corresponding SQL command to R. In addition, both the INSERT and UPDATE privileges can specify that only certain attributes can be updated by the account. 17
Security Measures Access control Restrict access to the (subset or view of) data to authorized users Inference control Restrict inference from accessible data to additional data Flow control Prevent information flowing from authorized use to unauthorized use Encryption Use cryptography to protect information from unauthorized disclosure while in transmit and in storage 18
Inference Control Inference control: Prevent inference from deidentified, anonymized, or statistical information (accessible) to individual information (not accessible) Attack Incidents Massachusetts Group Insurance Commission (GIC) medical encounter database AOL search queries Netflix prize 19
Massachusetts GIC Massachusetts GIC released anonymized data on state employees hospital visit Governor William Weld assured public on privacy GIC Name SSN Birth date Zip Alice 123456789 44 48202 AIDS Bob 323232323 44 48202 AIDS Diagnosis Charley 232345656 44 48201 Asthma Dave 333333333 55 48310 Asthma Eva 666666666 55 48310 Diabetes Anonymized Birth date Zip 44 48202 AIDS 44 48202 AIDS Diagnosis 44 48201 Asthma 55 48310 Asthma 55 48310 Diabetes 20
Massachusetts GIC Linking with Voter roller in Cambridge identified Governor Weld s record Name SSN Age Zip Diagnosis Income Alice 123456789 44 48202 AIDS 17,000 Bob 323232323 44 48202 AIDS 68,000 Charley 232345656 44 48201 Asthma 80,000 Dave 333333333 55 48310 Asthma 55,000 Eva 666666666 55 48310 Diabetes 23,000 Data Owner Age Zip Diagnosis Income 44 48202 AIDS 17,000 44 48202 AIDS 68,000 44 48201 Asthma 80,000 55 48310 Asthma 55,000 55 48310 Diabetes 23,000 Voter roll for Cambridge Name Age Zip Alice 44 48202 Charley 44 48201 Dave 55 48310 21
A Face is exposed for AOL searcher No. 4417749 20 million Web search queries released by AOL User ids are replaced by random IDs User 4417749 numb fingers, 60 single men dog that urinates on everything landscapers in Lilburn, Ga Several people names with last name Arnold homes sold in shadow lake subdivision gwinnett county georgia 22
A Face is exposed for AOL User 4417749 numb fingers, 60 single men searcher No. 4417749 dog that urinates on everything landscapers in Lilburn, Ga Several people names with last name Arnold homes sold in shadow lake subdivision gwinnett county georgia Thelma Arnold, a 62-year-old widow who lives in Lilburn, Ga 23
Statistical Database Statistical databases are used to produce statistics on various populations The database may contain confidential data on individuals, which should be protected from user access. Users are permitted to retrieve statistical information on the populations, such as averages, sums, counts, maximums, minimums A population is a set of tuples of a relation (table) that satisfy some selection condition. Inference control: inference from statistical information to individual information 24
A possible attack Q1: Count ( Sex = Female ) = A Q2: Count ( Sex = Female OR (Age = 42 & Sex = Male & Employer = ABC) ) = B If B = A+1 Q3: Count ( Sex = Female OR (Age = 42 & Sex = Male & Employer = ABC) & Diagnosis = Schizophrenia) 25
Techniques Result size restriction Result overlap size restriction Query restriction Output perturbation Data perturbation Data generalization 26
Security Measures Access control Restrict access to the (subset or view of) data to authorized users Inference control Restrict inference from accessible data to additional data Flow control Prevent information flowing from authorized use to unauthorized use Encryption Use cryptography to protect information from unauthorized disclosure while in transmit and in storage 27
Encryption and Public Key Infrastructures Encryption consists of applying an encryption algorithm to data using some prespecified encryption key. The resulting data has to be decrypted using a decryption key to recover the original data. Symmetric key algorithms secret key encryption Asymmetric key algorithms public key encryption 28
Public Key Encryption Plaintext: This is the data or readable message that is fed into the algorithm as input. Encryption algorithm: The encryption algorithm performs various transformations on the plaintext. Public and private keys: These are pair of keys that have been selected so that if one is used for encryption, the other is used for decryption. Ciphertext: This is the scrambled message produced as output. Decryption algorithm: This algorithm accepts the ciphertext and the matching key and produces the original plaintext. 29
Public Key Encryption The essential steps are as follows: Each user generates a pair of keys to be used for the encryption and decryption of messages. Each user places one of the two keys in a public register or other accessible file. This is the public key. The companion key is kept private (private key). If a sender wishes to send a private message to a receiver, the sender encrypts the message using the receiver s public key. When the receiver receives the message, he or she decrypts it using the receiver s private key. 30
Summary Access control Inference control Encryption CS573: data privacy and security 31