# No Free Lunch in Data Privacy

Save this PDF as:

Size: px
Start display at page:

## Transcription

3 Definition 2.1. (Discriminant ω). Given an integer k > 1, a privacy-infusing query processor A, and some constant c, we say that the discriminant ω(k, A) c if there exist k possible database instances D 1,..., D k and disjoint sets S 1,..., S k such that P (A(D i) S i) c for i = 1,..., k (the randomness only depends on A). We illustrate this definition with an example. Suppose we are asking a query about how many cancer patients are in the data. If the privacy-infusing query processor A can answer this query with reasonable accuracy, then the discriminant ω(k, A) is close to 1. To see this, we set k = 3 (for example) and D 1 to be any database with 0 cancer patients, D 2 to be any database with 10, 000 cancer patients and D 3 to be any database with 20, 000 cancer patients. Suppose A works as follows: if there are 0 cancer patients, it outputs some number in the range [0, 1000] with probability 0.95; if there are 10, 000 cancer patients, it outputs some number in the range [9000, 11000] with probability 0.95; if there are 20, 000 cancer patients it outputs a number in the range [19000, ] with probability Then we set S 1 = [0, 1000] (in Definition 2.1), S 2 = [9000, 11000] and S 3 = [19000, ]. Since P (A(D i) S i) 0.95 and the S i are pairwise disjoint we must have ω(k, A) From this example we see that if ω(k, A) is close to one, then we may get a reasonably useful answer to some query (or we may not, this is only a lower bound). On the other hand, if ω(k, A) is close to 1/k (which can happen when the output of A is uniformly distributed and does not depend on the data), then we cannot get useful answers to any query. Since the goal of modern privacy definitions is to answer queries relatively accurately, they all allow the use of algorithms with discriminants close to 1. For example, the discriminant of proposed k-anonymity algorithms [31] is 1. Proposed differentially private algorithms also have discriminants close to or equal to 1. For example, the Laplace Mechanism [8] is an algorithm satisfying differential privacy that answers a count query by adding noise from the Laplace distribution with density function ɛ 2 e x ɛ. Proposition 2.1. The discriminant ω(k, A) of the Laplace Mechanism for unbounded differential privacy is 1 for any k. For bounded differential privacy, the discriminant of the Laplace Mechanism becomes arbitrarily close to 1 as the data size increases. Proof. Fix k and δ > 0 and let X be a random variable with density function ɛ 2 e x ɛ. Choose n large enough so that P (X > 2n/3k) < δ. For i = 1,..., k let D i be a database with in/k cancer patients and define S i = [in/k n/3k, in/k + n/3k] (an interval centered around in/k with width 2n/3k). Note that all the S i are disjoint. We will use the Laplace Mechanism A to answer the query how many cancer patients are there. By construction, for any r, P (A(D i) S i) > 1 δ. Thus ω(k, A) 1 δ. Since δ is arbitrary, we get ω(k, A) = 1 for unbounded differential privacy. The result for bounded differential privacy follows since we can make δ arbitrarily small by making n large Non-privacy Motivated by Dwork and Naor s proof [8, 11] on the impossibility of Dalenius vision of statistical privacy [6], we define a game between an attacker and a data curator. Let D be a database schema and let q be a sensitive query, with k possible outputs, such that q(d) is considered sensitive for any database instance D. For example, q(d) may return the first record in D. Let P be a data-generating mechanism. The game proceeds as follows. Definition 2.2. (Non-Privacy Game). The data curator obtains a database instance D from the data-generating mechanism P. The data curator provides the attacker with the output A(D) of a privacy-infusing query processor A. The attacker then guesses the value of q(d). If the guess is correct, the attacker wins and non-privacy is achieved. The following no-free-lunch theorem states that if there are no restrictions on the data-generating mechanism and if the privacy-infusing query processor A has sufficient utility (discriminant close to 1) then there exists a data-generating mechanism P such that the attacker s guess, without access to A(D), is correct with probability 1/k (k is the number of possible outputs of the query). 1 However, with access to A(D), the attacker wins with probability close to 1. Theorem 2.1 (No Free Lunch). Let q be a sensitive query with k possible outcomes. Let A be a privacy-infusing query processor with discriminant ω(k, A) > 1 ɛ (for some ɛ > 0). Then there exists a probability distribution P over database instances D such that q(d) is uniformly distributed but the attacker wins with probability at least 1 ɛ when given A(D). Proof. Choose database instances D 1,..., D k such that i, q(d i) = i. Choose sets S 1,..., S k as in Definition 2.1 (discriminant) so that ω(k, A) min i P (A(D i) S i) > 1 ɛ, with the probability depending only on the randomness in A. Let P be the uniform distribution over D 1,..., D k. The attacker s strategy is to guess q(d) = i if A(D) S i and to guess randomly if A(D) is not in any of the S i. Clearly the attacker wins with probability at least 1 ɛ. This theorem can be viewed as a weaker reformulation and significant simplification of Dwork and Naor s result [8, 11]: we eliminated the use of auxiliary side information, introduced a more natural notion of utility, and gave a shorter and transparent proof. It is now clear why assumptions are needed: without restrictions on P, we cannot guarantee that the privacy-infusing query processor A avoids non-privacy No-Free-Lunch and Differential Privacy We now explain how the no-free-lunch theorem relates to differential privacy and evidence of participation and how an assumption that P generates records independently can prevent an attacker from winning against a differentially private algorithm. Suppose Bob s sensitive attribute R can take one of the values 1,..., k. Consider a P which generates data where Bob s record always appears and when Bob s sensitive attribute R = j then there are j 10, 000 cancer patients in the data. An attacker can ask how many cancer patients are there? and to satisfy 0.1-differential privacy, the query processor A can add Laplace(10) noise to the true answer (with variance 200). The attacker then can divide the noisy answer by 10, 000, round to the nearest integer j, and, with high probability, this will be the correct value of Bob s sensitive attribute R (since 200 << 10, 000). In this case, 1 The theorem easily extends to probabilities other than 1/k.

4 the number of cancer patients provide evidence about Bob s participation. This example relies on a correlation between Bob s sensitive attribute and the other records in the database to provide evidence of participation. In this particular case, such a correlation seems artificial and so we are likely to assume that there is no such dependence between records, so that the number of cancer patients can no longer provide evidence about Bob. If we assume that P generates records independently (and if the assumption is correct), the attacker would not be able to guess Bob s sensitive attribute from the number of cancer patients. Thus under the assumption of independence of records, differential privacy would be safe from this attack. In Sections 3 and 4 we will see examples of correlations which are not artificial and for which the use of differential privacy can lead to privacy breaches. Before concluding this section, note that there do exist privacy definitions which make no assumptions about the data. One example is: Definition 2.3. (Free-Lunch Privacy). A randomized algorithm A satisfies ɛ-free-lunch privacy if for any pair of database instances D 1, D 2 (not just those differing in one tuple) and for any set S, P (A(D 1) S) e ɛ P (A(D 2) S). It is not hard to see that the discriminant ω(k, A) of any algorithm A satisfying ɛ-free-lunch privacy is bounded by e ɛ. Notice that this is bounded away from 1 whether k 1+e ɛ we restrict the size of the database or not. In contrast, virtually all noise-infusing privacy definitions (such as bounded differential privacy, randomized response [32], etc.) have discriminants that become arbitrarily close to 1 as the data size increases. Thus with free-lunch privacy, we would have difficulty distinguishing between small and large databases (let alone finer details such as answers to range queries). This lack of utility is also clear from its definition; this is the price we must pay for making no assumptions about the data-generating mechanism. 2.2 Knowledge vs. Privacy Risk Before discussing privacy breaches in social networks and tabular data (with pre-released exact query answers) in Sections 3 and 4, we present in this section a general guideline for determining whether a privacy definition is suitable for a given application. This guideline is based on hiding evidence of participation of an individual from an attacker. In many cases this guideline calls for limiting the inferences made by attackers that are less knowledgeable than those considered by differential privacy (i.e. attackers who know about all but one tuple in a database). Somewhat counterintuitively, this can actually decrease the probability of disclosing sensitive information. Thus we first show that privacy definitions that limit the inference of more knowledgeable attackers can sometimes leak more sensitive information than privacy definitions that protect against less knowledgeable attackers. Then we present our evidence of participation guideline. Let us start with two more forms of differential privacy attribute and bit differential privacy. Definition 2.4. (Attribute Differential Privacy). An algorithm A satisfies ɛ-attribute differential privacy if for every pair D 1, D 2 of databases such that D 2 is obtained by changing one attribute value of one tuple in D 1, we must have P (A(D 1) S) e ɛ P (A(D 2) S) (for any set S). Definition 2.5. (Bit Differential Privacy). An algorithm A satisfies ɛ-attribute differential privacy if for every pair D 1, D 2 of databases such that D 2 is obtained by changing one bit of one attribute value of one tuple in D 1, we must have P (A(D 1) S) e ɛ P (A(D 2) S) (for any set S). Attribute differential privacy corresponds to an attacker who has full information about the database except for one attribute of one record. Bit differential privacy corresponds to an attacker who knows everything except one bit in the database encoding. Both definitions limit the attacker s inference about the remainder of the data. Clearly the attacker for bit-differential privacy knows more than the attacker for attribute-differential privacy who knows more than the corresponding attacker for bounded-differential privacy (Definition 1.2) who only knows all records but one. It is easy to see that an algorithm A that satisfies boundeddifferential privacy also satisfies attribute-differential privacy and an algorithm that satisfies attribute-differential privacy also satisfies bit-differential privacy. In these cases, an algorithm that protects against (i.e. limits the inference of) a less knowledgeable attacker also protects against the more knowledgeable attacker. However, we now show that an algorithm that limits the inference of a more knowledgeable attacker can leak private information to a less knowledgeable attacker. Consider the following table with 1 tuple (Bob) and two R 2-bit attributes R 1 and R 1 R 2 2: Bob There are 16 possible databases instances with 1 tuple. Bounded-differential privacy allows an algorithm A 1 to output Bob s tuple with probability ɛ and any other tuple e 15+e ɛ 1 with probability. Attribute-differential privacy allows 15+e ɛ an algorithm A 2 to output Bob s record with probability e 2ɛ (each of the 6 records where only one attribute 9+6e ɛ +e 2ɛ e differs from Bob can be output with probability ɛ, 9+6e ɛ +e and each of the 9 remaining records can be output with 2ɛ 1 probability ). The allowable distribution for bitdifferential privacy 2ɛ is more complicated, but Bob s record 9+6e ɛ +e e can be output with probability 4ɛ. Figure e 4ɛ +4e 3ɛ +6e 2ɛ +4e ɛ +1 1 plots the different probabilities of outputting Bob s record as ɛ ranges from 0 to 1. We clearly see that bit-differential privacy, with which we associate the most knowledgeable attacker, outputs Bob s record with the highest probability, followed by attribute-differential privacy and followed by bounded-differential privacy (which we associate with the least knowledgeable attacker). The intuition about why bit-differential privacy leaked the most amount of private information is that, since the attacker already knows so much about Bob s record, there is little need to output a record drastically different from it. There is very little left for the attacker to learn and so we can concentrate most of the randomness on this area of the attacker s uncertainty, which would be the set of records that differ from Bob s by 1 bit. Another example is an attacker who knows everything: we can release the entire database without the attacker learning anything new. We can extend these ideas to correlated data. Consider the following example: Example 2.1. Bob or one of his 9 immediate family members may have contracted a highly contagious disease, in which case the entire family would have been infected. An

7 30 25 With Edge Without Edge Copying Model With Edge Without Edge Copying Model E[cross community edges] E[cross community edges] p(random link) Final Network Size Figure 2: Causal effects in the Copying Model as a function of the random link probability. Figure 3: Causal effects in the Copying Model as a function of the final network size E[cross community edges] With Edge Without Edge MVS Model E[cross community edges] With Edge Without Edge MVS Model xi (friend introduction probability) Time (number of iterations) Figure 4: Causal effects in the MVS Model as a function of ξ, the friend-introduction probability. Figure 5: Causal effects in the MVS Model as a function of time (nearness to steady state). distribution. The alternatives are to set ɛ very small (thus destroying utility), or to ignore causality (i.e. naïve apply differential privacy) and possibly reveal the participation of Bob s edge. MVS Model [24]. Our final model is the MVS model [24]. In contrast to the forest fire and copying models, this model is actually an ergodic Markov chain (whose state space consists of graphs) and has a steady state distribution. Thus the number of edges does not have a tendency to increase over time as with the other two models. The original MVS model is a continuous time process, which we discretize in a way similar to [24]. In this model, the number of nodes remains constant. At each timestep, with probability η, a link is added between two random nodes. With probability ξ, a randomly selected node selects one of its neighbors at random and asks for an introduction to one of its neighbors. Finally k edges are deleted, where k is a random variable drawn from a Poisson(λ) distribution. Initially both communities have an equal number of nodes and community membership does not change. Initially each node has 2 within community links. We perform two sets of experiments. In the first set of experiments, the network has 20 nodes and we vary ξ, the friend-introduction probability. We compute the number of cross-community edges after 100 time steps (before the network has reached a steady state). This is shown in Figure 4. In the second set of experiments, we set the network size to be 20, we set ξ = 1, and we vary the number of iterations to let the network achieve its steady state. This is shown in Figure 5. Note that once the process achieves steady state, the initial conditions (or, more generally, network configurations in the distant past) are irrelevant and the evidence that other edges provide about the participation of Bob s initial link is automatically destroyed by the network model itself. However, there can still be evidence about edges formed in the not-so-distant past. We see from Figure 4, that depending on network parameters, the expected difference in cross-community edges can be moderately large, but we see that this difference disappears as the network is allowed to reach its steady state (Figure 5). Thus differential privacy may be a reasonable choice for limiting inference about the participation of Bob s edge in such a scenario. However, applying differential privacy here still requires the assumption that the data were gener-

10 answers. Bounded differential privacy then guarantees that an attacker who knows that the true table is either T i or T j will have great difficulty distinguishing between these two tables. Note that knowing the true table is either T i or T j is the same as knowing n 1 tuples in the true database and being unsure about whether the remaining tuple is t 1 or t Neighbors induced by prior statistics. Now let us generalize this reasoning to a case where additional exact query answers have been previously released. Example 4.2. Let T be a contingency table with attributes R 1,..., R k. Due to overriding utility concerns, the following query has already been answered exactly: SELECT GENDER, COUNT(*) FROM T GROUP BY GENDER Suppose n is the number of individuals in the table T (i.e. the total cell count), n F is the number of females and n M is the number of males. The quantities n M, n F, and n = n F + n M are now known. Suppose Drew is in this table. Some limited information about Drew has been disclosed by this query, and now Drew would like a variant of differential privacy to limit any possible damage due to future (noisy) query answers. We seek some form of plausible deniability: the attacker should have difficulty distinguishing between tables T i and T j that both have n F females and n M males and are otherwise similar except that Drew s tuple takes different values in T i and T j. Thus we need a good choice of neighbors of T (the original table) which should be difficult to distinguish from each other. Unbounded neighbors N 1, used in the definition of unbounded differential privacy (see Definitions 1.1 and 4.3), do not work here; removing or adding a tuple to T changes the number of males and females thus it is not clear what kind of plausible deniability this would provide. Bounded neighbors N 2, used in the definition of bounded differential privacy (see Definitions 1.1 and 4.3), also do not work; we cannot arbitrarily modify a single tuple and hope to be consistent with the previously released query answers. However, we can make limited changes to 1 or 2 tuples to maintain consistency. For example, we can choose one tuple and modify the attributes other than gender arbitrarily. We can also choose 2 tuples, 1 male and 1 female, swap their gender attributes, and modify the rest of their attributes arbitrarily. Both transformations keep the number of males and females constant. Thus we can provide some level of plausible deniability by making it difficult for an attacker to detect whether one of these transformations has been performed. This example illustrates 3 important points. First, in the absence of additional correlations (as discussed in Sections 2 and 3), privacy is guaranteed by ensuring that an attacker has difficulty distinguishing between the original table T and a set of neighboring tables that are similar to T. Each neighboring table T differs from T by a small set of changes (for example, by modifying the value of Drew s tuple and possibly a few others), thus the attacker would have difficulty in detecting these changes (i.e. the attacker would be uncertain about the value of Drew s tuple). Second, each neighboring table T of T must be consistent with the previously released exact query answers; the reason is that if T were not consistent, then it would be easy to distinguish between T and T (since an attacker would know that T could not possibly be the original table). The third point is that differential Cell A Cell B Cell C Cell D Table T a Cell A Cell B Cell C Cell D Table T b Cell A Cell B Cell C Cell D Table T c Figure 6: Three Contingency Tables privacy can be made aware of prior data release simply by choosing the appropriate notion of neighbors in Definition 4.3 (such modifications have also been suggested by [20]). Based on this discussion, we can formally define the concept of neighbors induced by exact query answers (Definition 4.4). Plugging this into generic differential privacy (Definition 4.3), we arrive at the version of differential privacy that takes into account previously released exact query answers. Before presenting Definition 4.4, first note that if T a and T b are contingency tables, then the smallest number of moves needed to transform T a into T b is the sum of the absolute difference in cell counts. For example, in Figure 6, this smallest number of moves is 4: first +1 for cell A, then +1 for cell D, then 1 for cell C, and finally 1 for cell B. Note that any permutation of these moves also transforms T a into T b, and these permutations account for all of the shortest sequence of moves (i.e. of length 4) for turning T a into T b. Definition 4.4. (Induced Neighbors N Q). Given a set Q = {Q 1,..., Q k } of constraints, let T Q be the set of contingency tables satisfying those constraints. Let T a and T b be two contingency tables. Let n ab be the smallest number of moves necessary to transform T a into T b and let m 1,..., m nab be one such sequence of moves. 3 We say that T a and T b are neighbors induced by Q, denoted as N Q(T a, T b ) =true, if the following holds. T a T Q and T b T Q. No subset of {m 1,..., m nab } can transform T a into some T c T Q. We illustrate this notion of induced neighbors N Q (which can now be plugged into generic differential privacy in Definition 4.3) with the following example. Consider contingency table T a and T b from Figure 6. Suppose the constraints Q are imposed by our knowledge that the sum of cells A and B is 9 and the sum of cells C and D is 10. We can transform T a into T b using the following move sequence: first +1 for cell A, then +1 for cell D, then 1 for cell C, and finally 1 for cell B. However, using only a subset of these moves, namely +1 for cell A and 1 for cell B, we can transform T a into T c. Since T a, T b and T c satisfy all of our constraints, T a and T b are not neighbors. Continuing this example, it is easy to see that T a and T c are neighbors and T b and T c are neighbors (even though T a and T b are not). Thus T a and T c are as similar as possible while still satisfying constraints imposed by previously released exact query answers (and similarly for T b and T c). Thus using this definition of neighbors in generic differential privacy (Definition 4.3), we would be guaranteeing that an 3 Note that all other such sequences of moves are simply permutations of this sequence.

11 attacker has difficulty distinguishing between T a and T c and also between T b and T c. 4.4 Neighbor-based Algorithms for Differential Privacy In this section we adapt two algorithms for (generic) differential privacy that use the concept of induced neighbors N Q (Definition 4.4) to guarantee privacy. We also show that in general this problem (i.e. accounting for previously released exact query answers) is computationally hard. The first algorithm is an application of the exponential mechanism [26]. To use it, we first need to define a distance function d(t a, T b ) between two contingency tables. We define the distance function d(t a, T b ) as follows. Let T Q be the set of all contingency tables consistent with previously released query answers. Create a graph where the tables in T Q are the nodes, and where there is an edge between tables T a and T b if and only if T a and T b are induced neighbors. Define d(t a, T b ) to be the length of the shortest path from T a to T b in this graph. The exponential mechanism now works as follows. If T is the true table, we output a table T with probability proportional to e ɛd(t,t ). Following the proof in [26], we can show that this will guarantee 2ɛ-generic differential privacy (Definition 4.3, using N Q as the definition of neighbors). The exponential mechanism [26] generally does not lead to efficient algorithms, even without the complication of induced neighbors N Q. However, we can modify the Laplace mechanism [8] to suit our purposes. To do this, we first need the concept of sensitivity. Definition 4.5 (Sensitivity). Let Q = {Q 1,..., Q k } be a set of constraints, let T Q be the set of contingency tables satisfying those constraints, and let N Q be the associated induced neighbors relation. The sensitivity S Q(q) of a query q over T Q is defined as: S Q(q) = max q(t 1) q(t 2) (1) N Q (T 1,T 2 )=true The Laplace mechanism adds Laplace(S Q(q)/ɛ) noise (the ɛ probability density function is f(x) = 2S Q (q) e ɛx/s Q(q) ) to the query answer. The proof of correctness is similar to [8]. Thus we need to be able to compute (an upper bound on) the sensitivity of queries. In some cases (which we describe below) this leads to efficient algorithms, but in general the constraints Q imposed by previously released exact query answers will make the data release problem computationally intractable, as the following results show. We omit detailed proofs due to space constraints. Proposition 4.1. Let Q = {Q 1,..., Q k } be a set of constraints. Let S be a table with 3 attributes and let T be the associated contingency table. The following two decision problems are NP-hard in the number of possible attribute values. Checking whether T has at least one neighbor. Checking whether two tables T a, T b that are consistent with the constraints in Q are not neighbors. Proposition 4.2. Given a set Q = {Q 1,..., Q k } of constraints, let T Q be the set of contingency tables satisfying those constraints. For any count query q, deciding whether S Q(q) > 0 is NP-hard Figure 7: + + The above propositions show that the general problem of finding an upper bound on the sensitivity of a query (S Q(q) < d) is at least co-np-hard, and we suspect that the problem is Π p 2 -complete. In certain cases, efficient algorithms do exist. For example, consider a 2-dimensional table with attributes R 1 and R 2, and suppose all the row and columns sums are fixed. Let the query q all be defined as: SELECT R_1, R_2, COUNT(*) FROM T GROUP BY R_1, R_2. The following theorem computes the sensitivity of q all, which can then be answered using the Laplace mechanism. Lemma 4.1. Let Q be the row and column sums for a 2- dimensional r c table. The sensitivity S Q(q all ) is given by min(2r, 2c). Proof. (sketch) The key insight is that for a set of moves corresponding to the shortest path from table T 1 to T 2, there can only be exactly 1 tuple added and 1 tuple deleted from cells in the same row or same column. Suppose the contrary: i.e., without loss of generality, there is a designated row where more than 1 tuple is added. Since the row sums are constant, if m tuples are added in a row, then m tuples must have been deleted from cells in the same row. Now suppose we mark every addition to a cell with + and a deletion with -. Also, if + and - appear in the same row or column, then draw an edge from the + cell to the - cell. It is easy to see that if the row and column sums are fixed, then either all or a subset of the marked cells are part of a Hamiltonian cycle alternating in + and -. If a subset of the marked cells forms the Hamiltonian cycle then we can remove them to get a shorter path from T 1 to T 2 (the cells in the cycle have row and column sums equal to 0, by construction, so their removal preserves the row and column sums). Thus after finitely many such removals we may assume that all of the cells containing a + or - are part of a Hamiltonian cycle alternating in + and -. A sequential walk on this loop is a valid set of alternating tuple addition and deletion moves to change T 1 to T 2. This loop enters the designated row for the first time at cell c 1 and exits the designated row for the last time at cell c 2. Cells c 1 and c 2 must have distinct signs. If we drop the part of the loop from c 1 to c 2 and replace it with an edge from c 1 to c 2, we get a new valid cycle from T 1 to T 2. The new cycle is shorter and thus gives a smaller set of moves to change T 1 to T 2; contradicting the fact that the original set of moves was the smallest set of moves. Therefore, each row and column has either no marked cells or exactly one cell marked + and one cell marked -. Thus the maximum number of moves between any pair of neighbors is at most min(2r, 2c). To show that the sensitivity can achieve min(2r, 2c), we consider pairs of l l tables that differ in the counts as shown (for l = 3, 4, 5) in Figure 7. It is easy to show pairs of tables for which the smallest set of moves is as described in Figure 7.

12 5. RELATED WORK Differential privacy was developed in a series of papers [1, 10, 8] following the impossibility results of Dinur and Nissim [7] that showed that answering many queries with bounded noise can allow an attacker to reproduce the original database almost exactly. Since then, many additional variations of differential privacy have been proposed [9, 4, 29, 2, 23, 20, 28]. With the success of differentially private algorithms for tabular data [16, 8, 1, 25, 26, 13, 18, 5], there has been interest in developing differentially private mechanisms for other domains, such as social networks [12]. For social networks, examples include [17, 27, 30]. Rastogi et al. [30] proposed a relaxation of differential privacy for social networks by proving an equivalence to adversarial privacy (which makes assumptions about the data), and then adding further constraints on the data-generating mechanism. By the no-freelunch theorem, the other applications also require assumptions about the data to avoid non-privacy. Dwork and Naor [8, 11] have several results stating that any privacy definition which provides minimal utility guarantees cannot even safeguard the privacy of an individual whose records are not collected by the data curator (because there always exists some auxiliary information that can be combined with query answers to create a threat to privacy). Thus Dwork proposed the principle that privacy definitions should only guarantee privacy for individuals in the data. Kasiviswanathan and Smith [19] formalized the notion of resistance to various attackers including those who know about all but one record in a table and showed an equivalence to differential privacy. 6. CONCLUSIONS In this paper we addressed several popular misconceptions about differential privacy. We showed that, without further assumptions about the data, its privacy guarantees can degrade when applied to social networks or when deterministic statistics have been previously released. We proposed a principle for evaluating the suitability of a privacy definition to an application: can it hide evidence of an individual s participation in the data generating process from an attacker? When tuples are believed to be independent, deleting one tuple is equivalent to hiding evidence of participation, in which case differential privacy is a suitable definition to use. Providing privacy for correlated data is an interesting direction for future work. 7. REFERENCES [1] A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: the SuLQ framework. In PODS, [2] A. Blum, K. Ligett, and A. Roth. A learning theory approach to non-interactive database privacy. In STOC, [3] P. J. Cantwell, H. Hogan, and K. M. Styles. The use of statistical methods in the u.s. census: ütah v. evans. The American Statistician, 58(3): , [4] K. Chaudhuri and N. Mishra. When random sampling preserves privacy. In CRYPTO, [5] K. Chaudhuri, C. Monteleoni, and A. D. Sarwate. Differentially private empirical risk minimization. [6] T. Dalenius. Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, [7] I. Dinur and K. Nissim. Revealing information while preserving privacy. In PODS, [8] C. Dwork. Differential privacy. In ICALP, [9] C. Dwork, K. Kenthapadi, F. McSherry, and I. Mironov. Our data, ourselves: Privacy via distributed noise generation. In EUROCRYPT, [10] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, pages , [11] C. Dwork and M. Naor. On the difficulties of disclosure prevention in statistical databases or the case for differential privacy. JPC, 2(1), [12] C. Dwork and A. Smith. Differential privacy for statistics: What we know and what we want to learn. JPC, 1(2), [13] A. Friedman and A. Schuster. Data mining with differential privacy. In KDD, [14] S. R. Ganta, S. P. Kasiviswanathan, and A. Smith. Composition attacks and auxiliary information in data privacy. In KDD, [15] G. Ghinita, P. Karras, P. Kalnis, and N. Mamoulis. A framework for efficient data anonymization under privacy and accuracy constraints. ACM TODS, 34(2), [16] M. Hardt and K. Talwar. On the geometry of differential privacy. In STOC, [17] M. Hay, C. Li, G. Miklau, and D. Jensen. Accurate estimation of the degree distribution of private networks. In ICDM, [18] G. Jagannathan, K. Pillaipakkamnatt, and R. N. Wright. A practical differentially private random decision tree classifier. In ICDM Workshops, [19] S. P. Kasiviswanathan and A. Smith. A note on differential privacy: Defining resistance to arbitrary side information [20] D. Kifer and B.-R. Lin. Towards an axiomatization of statistical privacy and utility. In PODS, [21] R. Kumar, P. Raghavan, S. Rajagopalan, D. Sivakumar, A. Tomkins, and E. Upfal. Stochastic models for the web graph. In FOCS, [22] J. Leskovec, J. Kleinberg, and C. Faloutsos. Graph evolution: Densification and shrinking diameters. TKDD, 1(1), [23] A. Machanavajjhala, D. Kifer, J. Abowd, J. Gehrke, and L. Vihuber. Privacy: From theory to practice on the map. In ICDE, [24] M. Marsili, F. Vega-Redondo, and F. Slanina. The rise and fall of a networked society: A formal model. PNAS, 101(6), [25] F. McSherry and I. Mironov. Differentially private recommender systems: building privacy into the net. In KDD, [26] F. McSherry and K. Talwar. Mechanism design via differential privacy. In FOCS, [27] D. J. Mir and R. N. Wright. A differentially private graph estimator. In ICDM Workshops, [28] I. Mironov, O. Pandey, O. Reingold, and S. Vadhan. Computational differential privacy. In CRYPTO, [29] K. Nissim, S. Raskhodnikova, and A. Smith. Smooth sensitivity and sampling in private data analysis. In STOC, [30] V. Rastogi, M. Hay, G. Miklau, and D. Suciu. Relationship privacy: Output perturbation for queries with joins. In PODS, [31] P. Samarati. Protecting respondents identities in microdata release. TKDE, pages , [32] S. L. Warner. Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 1965.

### A Practical Application of Differential Privacy to Personalized Online Advertising

A Practical Application of Differential Privacy to Personalized Online Advertising Yehuda Lindell Eran Omri Department of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il,omrier@gmail.com

CS346: Advanced Databases Alexandra I. Cristea A.I.Cristea@warwick.ac.uk Data Security and Privacy Outline Chapter: Database Security in Elmasri and Navathe (chapter 24, 6 th Edition) Brief overview of

### Practicing Differential Privacy in Health Care: A Review

TRANSACTIONS ON DATA PRIVACY 5 (2013) 35 67 Practicing Differential Privacy in Health Care: A Review Fida K. Dankar*, and Khaled El Emam* * CHEO Research Institute, 401 Smyth Road, Ottawa, Ontario E mail

### Privacy-Preserving Aggregation of Time-Series Data

Privacy-Preserving Aggregation of Time-Series Data Elaine Shi PARC/UC Berkeley elaines@eecs.berkeley.edu Richard Chow PARC rchow@parc.com T-H. Hubert Chan The University of Hong Kong hubert@cs.hku.hk Dawn

### arxiv:1402.3426v3 [cs.cr] 27 May 2015

Privacy Games: Optimal User-Centric Data Obfuscation arxiv:1402.3426v3 [cs.cr] 27 May 2015 Abstract Consider users who share their data (e.g., location) with an untrusted service provider to obtain a personalized

### Differential privacy in health care analytics and medical research An interactive tutorial

Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could

### MINITAB ASSISTANT WHITE PAPER

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### 1 Construction of CCA-secure encryption

CSCI 5440: Cryptography Lecture 5 The Chinese University of Hong Kong 10 October 2012 1 Construction of -secure encryption We now show how the MAC can be applied to obtain a -secure encryption scheme.

### Challenges of Data Privacy in the Era of Big Data. Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014

Challenges of Data Privacy in the Era of Big Data Rebecca C. Steorts, Vishesh Karwa Carnegie Mellon University November 18, 2014 1 Outline Why should we care? What is privacy? How do achieve privacy? Big

### Big Data & Scripting Part II Streaming Algorithms

Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),

### Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints

Adaptive Tolerance Algorithm for Distributed Top-K Monitoring with Bandwidth Constraints Michael Bauer, Srinivasan Ravichandran University of Wisconsin-Madison Department of Computer Sciences {bauer, srini}@cs.wisc.edu

### Database Security. The Need for Database Security

Database Security Public domain NASA image L-1957-00989 of people working with an IBM type 704 electronic data processing machine. 1 The Need for Database Security Because databases play such an important

### Data Preprocessing. Week 2

Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

### 1 Message Authentication

Theoretical Foundations of Cryptography Lecture Georgia Tech, Spring 200 Message Authentication Message Authentication Instructor: Chris Peikert Scribe: Daniel Dadush We start with some simple questions

### Applied Algorithm Design Lecture 5

Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design

### A Sublinear Bipartiteness Tester for Bounded Degree Graphs

A Sublinear Bipartiteness Tester for Bounded Degree Graphs Oded Goldreich Dana Ron February 5, 1998 Abstract We present a sublinear-time algorithm for testing whether a bounded degree graph is bipartite

Load Balancing Backtracking, branch & bound and alpha-beta pruning: how to assign work to idle processes without much communication? Additionally for alpha-beta pruning: implementing the young-brothers-wait

### Lecture 10: CPA Encryption, MACs, Hash Functions. 2 Recap of last lecture - PRGs for one time pads

CS 7880 Graduate Cryptography October 15, 2015 Lecture 10: CPA Encryption, MACs, Hash Functions Lecturer: Daniel Wichs Scribe: Matthew Dippel 1 Topic Covered Chosen plaintext attack model of security MACs

### PUBLIC HEALTH OPTOMETRY ECONOMICS. Kevin D. Frick, PhD

Chapter Overview PUBLIC HEALTH OPTOMETRY ECONOMICS Kevin D. Frick, PhD This chapter on public health optometry economics describes the positive and normative uses of economic science. The terms positive

### COLORED GRAPHS AND THEIR PROPERTIES

COLORED GRAPHS AND THEIR PROPERTIES BEN STEVENS 1. Introduction This paper is concerned with the upper bound on the chromatic number for graphs of maximum vertex degree under three different sets of coloring

### Social Media Mining. Data Mining Essentials

Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

### Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA

A THEORETICAL COMPARISON OF DATA MASKING TECHNIQUES FOR NUMERICAL MICRODATA Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University Agency Internal User Unmasked Result Subjects

### l-diversity: Privacy Beyond k-anonymity

l-diversity: Privacy Beyond k-anonymity Ashwin Machanavajjhala Daniel Kifer Johannes Gehrke Muthuramakrishnan Venkitasubramaniam Department of Computer Science, Cornell University {mvnak, dkifer, johannes,

### Chapter 20: Data Analysis

Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

### (Big) Data Anonymization Claude Castelluccia Inria, Privatics

(Big) Data Anonymization Claude Castelluccia Inria, Privatics BIG DATA: The Risks Singling-out/ Re-Identification: ADV is able to identify the target s record in the published dataset from some know information

### Efficient Algorithms for Masking and Finding Quasi-Identifiers

Efficient Algorithms for Masking and Finding Quasi-Identifiers Rajeev Motwani Stanford University rajeev@cs.stanford.edu Ying Xu Stanford University xuying@cs.stanford.edu ABSTRACT A quasi-identifier refers

### ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

### Lecture 11: Graphical Models for Inference

Lecture 11: Graphical Models for Inference So far we have seen two graphical models that are used for inference - the Bayesian network and the Join tree. These two both represent the same joint probability

### An Empirical Study of Two MIS Algorithms

An Empirical Study of Two MIS Algorithms Email: Tushar Bisht and Kishore Kothapalli International Institute of Information Technology, Hyderabad Hyderabad, Andhra Pradesh, India 32. tushar.bisht@research.iiit.ac.in,

### Privacy-preserving Data Mining: current research and trends

Privacy-preserving Data Mining: current research and trends Stan Matwin School of Information Technology and Engineering University of Ottawa, Canada stan@site.uottawa.ca Few words about our research Universit[é

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### The Basics of Graphical Models

The Basics of Graphical Models David M. Blei Columbia University October 3, 2015 Introduction These notes follow Chapter 2 of An Introduction to Probabilistic Graphical Models by Michael Jordan. Many figures

### Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.

Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 rkennedy@ix.netcom.com This paper contains a collection of 31 theorems, lemmas,

### Lecture 3: Linear Programming Relaxations and Rounding

Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can

### OPRE 6201 : 2. Simplex Method

OPRE 6201 : 2. Simplex Method 1 The Graphical Method: An Example Consider the following linear program: Max 4x 1 +3x 2 Subject to: 2x 1 +3x 2 6 (1) 3x 1 +2x 2 3 (2) 2x 2 5 (3) 2x 1 +x 2 4 (4) x 1, x 2

### On Correlating Performance Metrics

On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

### Min-cost flow problems and network simplex algorithm

Min-cost flow problems and network simplex algorithm The particular structure of some LP problems can be sometimes used for the design of solution techniques more efficient than the simplex algorithm.

### KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

### Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

### Why the Normal Distribution?

Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

### Simulation-Based Security with Inexhaustible Interactive Turing Machines

Simulation-Based Security with Inexhaustible Interactive Turing Machines Ralf Küsters Institut für Informatik Christian-Albrechts-Universität zu Kiel 24098 Kiel, Germany kuesters@ti.informatik.uni-kiel.de

### Approximation Algorithms

Approximation Algorithms or: How I Learned to Stop Worrying and Deal with NP-Completeness Ong Jit Sheng, Jonathan (A0073924B) March, 2012 Overview Key Results (I) General techniques: Greedy algorithms

### Database security. André Zúquete Security 1. Advantages of using databases. Shared access Many users use one common, centralized data set

Database security André Zúquete Security 1 Advantages of using databases Shared access Many users use one common, centralized data set Minimal redundancy Individual users do not have to collect and maintain

### What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

What mathematical optimization can, and cannot, do for biologists Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL Introduction There is no shortage of literature about the

Systems Dynamics Using Personal Learning Edition (PLE) Download PLE at http://vensim.com/freedownload.html Quick Start Tutorial Preliminaries PLE is software designed for modeling one or more quantities

### ! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. #-approximation algorithm.

Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of three

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### Tomislav Križan Consultancy Poslovna Inteligencija d.o.o

In-Situ Anonymization of Big Data Tomislav Križan Consultancy Director @ Poslovna Inteligencija d.o.o Abstract Vast amount of data is being generated from versatile sources and organizations are primarily

### Attacking Anonymized Social Network

Attacking Anonymized Social Network From: Wherefore Art Thou RX3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography Presented By: Machigar Ongtang (Ongtang@cse.psu.edu ) Social

Policy-based Pre-Processing in Hadoop Yi Cheng, Christian Schaefer Ericsson Research Stockholm, Sweden yi.cheng@ericsson.com, christian.schaefer@ericsson.com Abstract While big data analytics provides

### Robust Blind Watermarking Mechanism For Point Sampled Geometry

Robust Blind Watermarking Mechanism For Point Sampled Geometry Parag Agarwal Balakrishnan Prabhakaran Department of Computer Science, University of Texas at Dallas MS EC 31, PO Box 830688, Richardson,

### Offline sorting buffers on Line

Offline sorting buffers on Line Rohit Khandekar 1 and Vinayaka Pandit 2 1 University of Waterloo, ON, Canada. email: rkhandekar@gmail.com 2 IBM India Research Lab, New Delhi. email: pvinayak@in.ibm.com

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS

International Scientific Conference & International Workshop Present Day Trends of Innovations 2012 28 th 29 th May 2012 Łomża, Poland DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS Lubos Takac 1 Michal Zabovsky

### Differentially Private Analysis of

Title: Name: Affil./Addr. Keywords: SumOriWork: Differentially Private Analysis of Graphs Sofya Raskhodnikova, Adam Smith Pennsylvania State University Graphs, privacy, subgraph counts, degree distribution

### Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

### Factoring & Primality

Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount

### MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

### Non-Black-Box Techniques In Crytpography. Thesis for the Ph.D degree Boaz Barak

Non-Black-Box Techniques In Crytpography Introduction Thesis for the Ph.D degree Boaz Barak A computer program (or equivalently, an algorithm) is a list of symbols a finite string. When we interpret a

### A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases

A Bayesian Approach for on-line max auditing of Dynamic Statistical Databases Gerardo Canfora Bice Cavallo University of Sannio, Benevento, Italy, {gerardo.canfora,bice.cavallo}@unisannio.it ABSTRACT In

### Information Leakage in Encrypted Network Traffic

Information Leakage in Encrypted Network Traffic Attacks and Countermeasures Scott Coull RedJack Joint work with: Charles Wright (MIT LL) Lucas Ballard (Google) Fabian Monrose (UNC) Gerald Masson (JHU)

### ! Solve problem to optimality. ! Solve problem in poly-time. ! Solve arbitrary instances of the problem. !-approximation algorithm.

Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NP-hard problem What should I do? A Theory says you're unlikely to find a poly-time algorithm Must sacrifice one of

### Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

### Computational Soundness of Symbolic Security and Implicit Complexity

Computational Soundness of Symbolic Security and Implicit Complexity Bruce Kapron Computer Science Department University of Victoria Victoria, British Columbia NII Shonan Meeting, November 3-7, 2013 Overview

### Database and Data Mining Security

Database and Data Mining Security 1 Threats/Protections to the System 1. External procedures security clearance of personnel password protection controlling application programs Audit 2. Physical environment

### Lecture 7: Approximation via Randomized Rounding

Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining

### A Practical Differentially Private Random Decision Tree Classifier

273 295 A Practical Differentially Private Random Decision Tree Classifier Geetha Jagannathan, Krishnan Pillaipakkamnatt, Rebecca N. Wright Department of Computer Science, Columbia University, NY, USA.

### 136 CHAPTER 4. INDUCTION, GRAPHS AND TREES

136 TER 4. INDUCTION, GRHS ND TREES 4.3 Graphs In this chapter we introduce a fundamental structural idea of discrete mathematics, that of a graph. Many situations in the applications of discrete mathematics

### Globally Optimal Crowdsourcing Quality Management

Globally Optimal Crowdsourcing Quality Management Akash Das Sarma Stanford University akashds@stanford.edu Aditya G. Parameswaran University of Illinois (UIUC) adityagp@illinois.edu Jennifer Widom Stanford

### Life Insurance Modelling: Notes for teachers. Overview

Life Insurance Modelling: Notes for teachers In this mathematics activity, students model the following situation: An investment fund is set up Investors pay a set amount at the beginning of 20 years.

### max cx s.t. Ax c where the matrix A, cost vector c and right hand side b are given and x is a vector of variables. For this example we have x

Linear Programming Linear programming refers to problems stated as maximization or minimization of a linear function subject to constraints that are linear equalities and inequalities. Although the study

### STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

### The Relative Worst Order Ratio for On-Line Algorithms

The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, joan@imada.sdu.dk

### Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract

Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,rubing@bellcore.com Abstract In this paper, we investigate a method by which smart

### Organizing Your Approach to a Data Analysis

Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize

### A Model For Revelation Of Data Leakage In Data Distribution

A Model For Revelation Of Data Leakage In Data Distribution Saranya.R Assistant Professor, Department Of Computer Science and Engineering Lord Jegannath college of Engineering and Technology Nagercoil,

### Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations

Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns

### Max Flow, Min Cut, and Matchings (Solution)

Max Flow, Min Cut, and Matchings (Solution) 1. The figure below shows a flow network on which an s-t flow is shown. The capacity of each edge appears as a label next to the edge, and the numbers in boxes

Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NP-hard problem. What should I do? A. Theory says you're unlikely to find a poly-time algorithm. Must sacrifice one

### Data Sanitization Techniques

Abstract Data Sanitization is the process of making sensitive information in non-production databases safe for wider visibility. This White Paper is an overview of various techniques which can be used

### Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

### Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases. Andreas Züfle

Similarity Search and Mining in Uncertain Spatial and Spatio Temporal Databases Andreas Züfle Geo Spatial Data Huge flood of geo spatial data Modern technology New user mentality Great research potential

### Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities

Another Look at Sensitivity of Bayesian Networks to Imprecise Probabilities Oscar Kipersztok Mathematics and Computing Technology Phantom Works, The Boeing Company P.O.Box 3707, MC: 7L-44 Seattle, WA 98124

### Part 1: Link Analysis & Page Rank

Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please

MACs Message authentication and integrity Foundations of Cryptography Computer Science Department Wellesley College Table of contents Introduction MACs Constructing Secure MACs Secure communication and

### Solving for Voltage and Current

Chapter 3 Solving for Voltage and Current Nodal Analysis If you know Ohm s Law, you can solve for all the voltages and currents in simple resistor circuits, like the one shown below. In this chapter, we

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Handling missing data in large data sets Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza The problem Often in official statistics we have large data sets with many variables and

### Quiggin, J. (1991), Too many proposals pass the benefit cost test: comment, TOO MANY PROPOSALS PASS THE BENEFIT COST TEST - COMMENT.

Quiggin, J. (1991), Too many proposals pass the benefit cost test: comment, American Economic Review 81(5), 1446 9. TOO MANY PROPOSALS PASS THE BENEFIT COST TEST - COMMENT by John Quiggin University of

### Load Balancing and Switch Scheduling

EE384Y Project Final Report Load Balancing and Switch Scheduling Xiangheng Liu Department of Electrical Engineering Stanford University, Stanford CA 94305 Email: liuxh@systems.stanford.edu Abstract Load

### Section Notes 3. The Simplex Algorithm. Applied Math 121. Week of February 14, 2011

Section Notes 3 The Simplex Algorithm Applied Math 121 Week of February 14, 2011 Goals for the week understand how to get from an LP to a simplex tableau. be familiar with reduced costs, optimal solutions,

### 7. Tests of association and Linear Regression

7. Tests of association and Linear Regression In this chapter we consider 1. Tests of Association for 2 qualitative variables. 2. Measures of the strength of linear association between 2 quantitative variables.

### Introduction to Flocking {Stochastic Matrices}

Supelec EECI Graduate School in Control Introduction to Flocking {Stochastic Matrices} A. S. Morse Yale University Gif sur - Yvette May 21, 2012 CRAIG REYNOLDS - 1987 BOIDS The Lion King CRAIG REYNOLDS

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### Persuasion by Cheap Talk - Online Appendix

Persuasion by Cheap Talk - Online Appendix By ARCHISHMAN CHAKRABORTY AND RICK HARBAUGH Online appendix to Persuasion by Cheap Talk, American Economic Review Our results in the main text concern the case