Michael Clarkson and Fred B. Schneider Cornell University RADICAL May 10, 2010
Goal Information-theoretic Quantification of programs impact on Integrity of Information [Denning 1982] (relationship to database privacy) Clarkson: Quantification of Integrity 2
What is Integrity? Databases: Constraints that relations must satisfy Provenance of data Utility of anonymized data Common Criteria: Protection of assets from unauthorized modification Biba (1977): Guarantee that a subsystem will perform as it was intended; Isolation necessary for protection from subversion; Dual to confidentiality no universal definition Clarkson: Quantification of Integrity 3
Our Notions of Integrity Corruption: damage to integrity Starting Point Taint analysis Program correctness Corruption Measure Contamination Suppression Contamination: bad information present in output Suppression: good information lost from output distinct, but interact Clarkson: Quantification of Integrity 4
Contamination Goal: model taint analysis untrusted trusted Attacker User Program Attacker User Untrusted input contaminates trusted output Clarkson: Quantification of Integrity 5
Contamination o:=(t,u) u contaminates o (Can t u be filtered from o?) Clarkson: Quantification of Integrity 6
Quantification of Contamination Use information theory: information is surprise X, Y, Z: distributions I(X,Y): mutual information between X and Y (in bits) I(X,Y Z): conditional mutual information Clarkson: Quantification of Integrity 7
Quantification of Contamination U in untrusted trusted Attacker User Program Attacker User T in T out Contamination = I(U in,t out T in ) [Newsome et al. 2009] Dual of [Clark et al. 2005, 2007] Clarkson: Quantification of Integrity 8
Example of Contamination o:=(t,u) Contamination = I(U, O T) = k bits if U is uniform on [0,2 k -1] Clarkson: Quantification of Integrity 9
Our Notions of Integrity Corruption: damage to integrity Starting Point Taint analysis Program correctness Corruption Measure Contamination Suppression Contamination: bad information present in output Suppression: good information lost from output Clarkson: Quantification of Integrity 10
Program Suppression Goal: model program (in)correctness Sender Specification correct Receiver untrusted Attacker Attacker Implementation trusted Sender real Receiver Information about correct output is suppressed from real output Clarkson: Quantification of Integrity 11
Example of Program Suppression Spec. for (i=0; i<m; i++) { s := s + a[i]; } a[0..m-1]: trusted Impl. 1 Impl. 2 for (i=1; i<m; i++) { s := s + a[i]; } for (i=0; i<=m; i++) { s := s + a[i]; } Suppression a[0] missing No contamination Suppression a[m] added Contamination Clarkson: Quantification of Integrity 12
Suppression vs. Contamination output := input Contamination Attacker Attacker * * Suppression Clarkson: Quantification of Integrity 13
Quantification of Program Suppression In Spec Sender Specification Receiver untrusted trusted Attacker Sender U in T in Implementation Impl Attacker Receiver Program transmission = I(Spec, Impl) Clarkson: Quantification of Integrity 14
Quantification of Program Suppression H(X): entropy (uncertainty) of X H(X Y): conditional entropy of X given Y Program Transmission = I(Spec, Impl) Info actually learned about Spec by observing Impl = H(Spec) H(Spec Impl) Total info to learn about Spec Info NOT learned about Spec by observing Impl Clarkson: Quantification of Integrity 15
Quantification of Program Suppression H(X): entropy (uncertainty) of X H(X Y): conditional entropy of X given Y Program Transmission = I(Spec, Impl) = H(Spec) H(Spec Impl) Program Suppression = H(Spec Impl) Clarkson: Quantification of Integrity 16
Example of Program Suppression Spec. for (i=0; i<m; i++) { s := s + a[i]; } Impl. 1 Impl. 2 for (i=1; i<m; i++) { s := s + a[i]; } for (i=0; i<=m; i++) { s := s + a[i]; } Suppression = H(A) Suppression H(A) A = distribution of individual array elements Clarkson: Quantification of Integrity 17
Suppression and Confidentiality Declassifier: program that reveals (leaks) some information; suppresses rest Leakage: [Denning 1982, Millen 1987, Gray 1991, Lowe 2002, Clark et al. 2005, 2007, Clarkson et al. 2005, McCamant & Ernst 2008, Backes et al. 2009] m. Leakage + Suppression is a constant What isn t leaked is suppressed Clarkson: Quantification of Integrity 18
Database Privacy Statistical database anonymizes query results: Database response Anonymizer User query User anonymized response sacrifices utility for privacy s sake suppresses to avoid leakage sacrifices integrity for confidentiality s sake Clarkson: Quantification of Integrity 19
k-anonymity DB: Every individual must be anonymous within set of size k. [Sweeney 2002] Programs: Every output corresponds to k inputs. But what about background knowledge? no bound on leakage no bound on suppression Clarkson: Quantification of Integrity 20
L-diversity DB: Every individual s sensitive information should appear to have L (roughly) equally likely values. [Machanavajjhala et al. 2007] Entropy L-diversity: H(anon. block) log L [Øhrn and Ohno-Machado 1999, Machanavajjhala et al. 2007] Program: H(T in t out ) log L implies suppression log L (if T in uniform) Clarkson: Quantification of Integrity 21
Summary Measures of information corruption: Contamination (generalizes taint analysis) Suppression (generalizes program correctness) Application: database privacy (model anonymizers; relate utility and privacy) Clarkson: Quantification of Integrity 22
More Integrity Measures Channel suppression same as channel model from information theory, but with attacker Attacker- and program-controlled suppression Belief-based measures [Clarkson et al. 2005] generalize information-theoretic measures Granularity: Average over all executions Single executions Sequences of executions Clarkson: Quantification of Integrity 23