Database and Data Mining Security 1 Threats/Protections to the System 1. External procedures security clearance of personnel password protection controlling application programs Audit 2. Physical environment secure areas for DB/hardware radiation shielding 3. Data storage encryption duplication copies 2 4. Processor software - user authentication - access control - threat monitoring - audit trail 5. Processor hardware - memory protection - state of privilege - reliability 6. Communication line : -data encryption, implemented with cost consideration DB holds essential data that reflects the organization s core competencies Protecting data is at the heart of secure system Users rely on DBMS to manage protection DB organization and contents are valuable corporate assets that must be carefully protected 3 4 Two major security issues in DB context Integrity Secrecy Two major problems Inference Multilevel 5 6.1 Introduction to DB DB = organized collection of data and a set of rules that organize the data by specifying certain relationship among the data Data Mining = the process of extracting hidden patterns from DB by using stat, math-inference DB administrator = the person who defines the rules to organize and control the usage of DB DBMS = software providing front end or interface for users to interact with DB 6
DBMS functions: create manage protect provide access Advantages of DB Shared access Minimal redundancy Data consistency Data integrity Controlled access 7 8 6.2 Security Requirements of DB DB system may be attacked at many levels Attackers usually be end users rather than programmers Violation are reading, modifying or destroying info. by unauthorized people Basic problems access control, exclusion of spurious data, Requirements for DB Security 1. Physical DB integrity : DB must be immune to physical problems such as power failure, hardware malfunction,.. DB can be reconstructed if it is destroyed 2. Logical DB integrity : structure of DB must be preserve, eg., modification of one field does not effect other fields authentication of user, reliability 9 10 3. Element integrity:data contained in each element must be correct or accurate. Can be provided by field check : test for appropriate value, access control : concurrency access control change log: list every change made to DB, we can track for all previous actions or when error occurs we can undo, roll back, 11 4. Auditability : Possible to track who or what has accessed the elements in DB pass-through problem access which has no transfer of data to user, eg., when using select, thus difficult to audit log may be overstate or understate 12
5. Access control : user is allowed to access only authorized data. Different users can be restricted to different modes of access 6. User authentication : every user is positively identified, both for auditing and access permission 7. Availability : data must be available for the right person and at the right time 13 6.3 Reliability & Integrity Reliable software = software that run for very long time without failing DB reliability and integrity can be viewed from 3 dimensions: DB integrity : DB as a whole is protected against damage, as from HW/SW failure Element integrity : the value of data elements is changed only by authorized user Element accuracy : only correct values are written into the elements of a DB 14 Protection Features for Reliability DB can be monitored and controlled by many methods as follows: Field checks is a check for validity of values in DB fields. Usually applied at data entry Change logs whenever changing on DB, there must be a log file to keep both old and new values in order that DBA can examine, verify or make correction if error occurs 15 Access control procedures to keep eyes on all users who access DB so that we can know DB status and access of every user before system crash or conflict User authentication to check and allow only authorized user to get DB access Integrity checks info. should be checked for integrity, accuracy and completeness 16 Audits performed by internal or external party to make sure that the system perform as designed Monitors can check for the structural integrity of DB eg., value being entered is consistence with other parts of DB or not Range comparisons verify new value whether it is in the acceptable range 17 State constraints to check whether DB values violate the entire DB constraints Transition constraints to check the conditions necessary before changes can be applied to a DB eg., before new employee can be added, there must be a vacant position Boundary checks to check for sensitive values whether they are fallen in the lower and upper bound without revealing actual values eg., checking salary which is sensitive against it boundary values 18
Two-phase update : secure method for updating intent phase: prepare data to be used for updating, eg., gather data, create dummy record, open file, lock rec., compute final answers (if fail, we can repeat) committing phase : making the permanent change by writing a commit flag, if fail, must perform recovery, eg., undo (roll back)/redo (roll forward) 19 Redundancy/internal consistency error detection/correction shadow fields : create 2 nd copy of field/ record in order that the 1 st copy is failed or error occurs when updating Data recovery roll back/roll forward compensating transaction backup/restore Concurrency/consistency control Serializability, data locking 20 6.4 Sensitive Data Sensitive data =data that should not be made public otherwise it causes damage to individual Security concerns not only the data element but their context and meaning (Table 6-6) We should also take into account different degrees of sensitivity Access control problem : how to limit access so that sensitive data are not to be released to unauthorized people 21 22 Factors that can make data sensitive inherently sensitive, eg., location of defensive missile data from a sensitive source, eg., info. from informer whose identity may be compromised if the information were disclosed declared sensitive, eg., classified military data 23 part of a sensitive attribute/record, eg., salary of personnel DB, record of secret space mission program sensitive in relation to previously disclose information, eg., longitude coordinate of secret gold mine when appearing with latitude can pinpoint the location 24
Access Decision access decision must be based on access policy factors effect to the decision availability of data : whether the access makes a permanent blocking or very long time data locking resulting denial of service acceptability of access : whether the access can release sensitive info. even user does not ask for but it come out with non-sensitive data assurance of authenticity : whether the access are made from authorized people, unauthorized people can reveal sensitive data Types of Disclosures exact data : the most serious disclosure bounds: useful way to present sensitive data negative results : data that are separated into 2 groups, not appearing in one group determines that they are in another group existence : reveal the existence of data regardless of its actual value is sometimes sensitive probable value : combination of nonsensitive query may result in disclosure of sensitive data (in probability) by combining several less sensitive queries 25 26 Security vs Precision Security goal : protect data as secure as possible Precision goal: reveal data as much as possible situation is complicated by a desired to share non-sensitive and protect sensitive data ideal combination of security and precision : maintain for perfect security with maximum precision 27 28 6.5 Inference - inference is the way to infer, derive, deduct info. from non-sensitive data - usually deduct to find sensitive info. from most extreme value of available info. - to protect inference, it can be done by creation of rule-based semantic layer between logical DB design and physical implementation which will be criteria to examine query 29 Methods of Inference Direct Attack attacker uses query trying to put some conditions so little output or a single data item is come out Ex student data containing sensitive field of drug with values; 0, 1, 2, 3 List NAME where SEX=M and DRUG=1 (obvious query) (less obvious query) List NAME where (SEX=M and DRUG=1) or (SEX = M and SEX = F) or (DORM=AYRES) 30
Indirect Attack indirect attack usually be done outside DB by using anonymous statistics to infer individual normally statistical info. from DB must eliminate anything used to identify individual, eg., name address, tel,.. present only neutral statistics, eg., count, sum, mean,.. without extreme values However, indirect attack may take these for inference, eg., sum, count, mean, median, tracker attack linear system vulnerability 31 32 33 34 By combination of Table 6.8 and Table 6.9 we can infer who they are (for yellow) 35 36
37 38 Tracker Attack Fool DB manager into locating the desired data by using additional query The tracker adds additional records to be retrieved for two different queries The two records cancel each other out, leaving only the data required (given n and n-1 we can easily compute single element) 39 40 Linear system vulnerability : -single c can be solved from 5 queries 41 Control for statistical inference attacks Query controls are effective primarily against direct attack Precision checks is set to determine whether a given query discloses sensitive data Suppression : query will be rejected and terminated without any response or indication when query sensitive data Concealing : presents info. that is not exact but close to actual data by slightly modifying data values with random no. 42
Limit response suppression Combined Results 43 44 random sample results are computed from data randomly selected from the whole data random data perturbation : slightly modify statistics before presenting to the requester, eg, put some noise (error) into output data which has no effect to statistics Ex. Average salary may be multiplied by a small number and each record may be added/subtracted by small random number 45 query analysis Used to analyze query whether it can infer sensitive data Technique applied is to consider query history and query context More complicate Difficult to do 46 Inference conclusion No perfect solution Three approaches 1. Suppress obvious sensitive information 2. Track what the user knows 3. Disguise the data Aggregation This attack builds sensitive result from less sensitive data Several less sensitive data can be tracked to sensitive data Rather complicate and difficult to do Presently, advance in data mining can be applied to perform this type of attack 47 48