On the Performance Measurements for Privacy Preserving Data Mining

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "On the Performance Measurements for Privacy Preserving Data Mining"

Transcription

1 On the Performance Measurements for Privacy Preserving Data Mining Nan Zhang, Wei Zhao, and Jianer Chen Department of Computer Science, Texas A&M University College Station, TX 77843, USA {nzhang, zhao, Abstract. This paper establishes the foundation for the performance measurements of privacy preserving data mining techniques. The performance is measured in terms of the accuracy of data mining results and the privacy protection of sensitive data. On the accuracy side, we address the problem of previous measures and propose a new measure, named effective sample size, to solve this problem. We show that our new measure can be bounded without any knowledge of the data being mined and discuss when the bound can be met. On the privacy side, we identify a tacit assumption made by previous measures and show that the assumption is unrealistic in many situations. To solve the problem, we introduce a game theoretic framework for the measurement of privacy. 1 Introduction In this paper, we address issues related to the performance measurements of privacy preserving data mining techniques. The purpose of data mining is to discover patterns and extract knowledge from large amounts of data. The objective of privacy preserving data mining is to enable data mining without invading the privacy of the data being mined. We consider a distributed environment where the data being mined are stored in multiple autonomous entities. We can classify privacy preserving data mining systems into two categories based on their infrastructures: Server-to-Server (S2S) and Client-to- Server (C2S), respectively. In the first category (S2S), the data being mined are distributed across several servers. Each server holds numerous private data points. The servers collaborate with each other to enable data mining across all servers without letting either server know the private data of the other servers. Since the number of servers in a system is usually small, the problem is often modeled as a variation of secure multi-party computation problem, which has been extensively studied in cryptography [12]. Existing privacy preserving algorithms in this category serve a wide variety of data mining tasks including data classification [7, 14, 15, 20], association rule mining [13, 19], and statistical analysis [6]. In the second category (C2S), a system usually consists of a data miner (server) and numerous data providers (clients). Each data provider holds only one data point. The data miner performs data mining tasks on the aggregated (possibly perturbed) data provided by the data providers. A typical example of this kind of system is online survey,

2 as the survey analyzer (data miner) collects data from thousands of survey respondents (data providers). Most existing privacy preserving algorithms in C2S systems use an randomization approach which randomizes the original data to protect the privacy of data providers [1, 2, 5, 8 10, 18]. Both S2S and C2S systems have a broad range of applications. Nevertheless, we focus on studying C2S systems where the randomization approach is used. In particular, we establish the foundation for analyzing the tradeoff between the accuracy of data mining results and the privacy protection of sensitive data. Our contributions in this paper are summarized as follows. On accuracy side, we address the problem of previous measures and propose a new accuracy measure named effective sample size to solve this problem. We show that our new measure can be upper bounded without any knowledge of the data being mined and discuss when the bound can be met. On privacy protection side, we show that a tacit assumption made by previous measures is that all adversaries use the same intrusion technique to invade privacy. We address the problems of this assumption and propose a game theoretic formulation which takes the adversary behavior into consideration. The rest of the paper is organized as follows. In Section 2, we introduce our models of data, data providers, and data miners. Based on these models, we briefly review the literature in Section 3. In Section 4, we propose our new accuracy measure. Analytical bound on the new measure is derived in this section. In Section 5, we propose a game theoretic formulation on the measurement of privacy and define our new privacy measure. Section 6 concludes the paper with some final remarks. 2 System Model Let there be n data providers (clients) C 1,...,C n and one data miner (server) S in the system. Each client C i has a private data point (e.g., transaction, data tuple, etc) x i.we view the original data values x 1,...,x n as n independent and identically distributed (i.i.d.) variables that have the same distribution as a random variable X. Let the domain of X (i.e., the set of all possible values of X) bev X and the distribution of X be p X. Each data point x i is i.i.d. on V X with distribution p X. Due to the privacy concern of data providers, we classify the data miners into two categories. One category is honest data miners. These data miners always acts honestly in that they only perform regular data mining tasks and have no intention to invade privacy. The other category is malicious data miners. These data miners would purposely compromise the privacy of data providers. 3 Related Work To protect the data providers from privacy invasion, countermeasures must be implemented in the data mining system. Randomization is a commonly used approach. We briefly review it as follows.

3 The randomization approach is based on an assumption that accurate data mining results can be obtained from a robust estimation of the data distribution. Previous work showed that this assumption is reasonable in many situations [2]. Thus, the basic idea of the randomization approach is to distort the individual data values but keep an (statistically) accurate estimation of the original data distribution. Based on the randomization approach, the privacy preserving data mining process can be considered as a two-step process. In the first step, each data provider C i perturbs its data x i by applying a predetermined randomization operator R( ) on x i, and then transfers the randomized data R(x i ) to the data miner. We note that the randomization operator is known by both the data providers and the data miner. Let the domain of R(x i ) be V Y. The randomization operator R( ) is a function from V X to V Y with transition probability p[x y]. In previous studies, several randomization operators have been proposed, including random perturbation operator [2], random response operator [8], MASK distortion operator [18], and select-a-size operator [10]. For example, the random perturbation operator and the random response operator are listed in (1) and (2), respectively. R(x i )=x i + r i. (1) { xi, if r R(x i )= i θ i, (2) x i, if r i <θ i. Here, x i is the original data value, r i is the noise randomly generated from a predetermined distribution, and θ i is a parameter set by each data provider individually. As we can see, the random response operator only applies to binary data. In the second step, the honest data miner first employs a distribution reconstruction algorithm on the aggregate data, which intends to recover the original data distribution from the randomized data. Then, the honest data miner performs the data mining task on the reconstructed distribution. Several distribution reconstruction algorithms have been proposed [1,2,8,10,18]. In particular, the expectation maximization (EM) algorithm [1] reconstructs the distribution to converge to the maximum likelihood estimate of the original data distribution. For example, suppose that the data providers randomize their data using the random response operator in (2). Let r i be random variables uniformly distributed on [0, 1]. Let θ i be 0.3. The distribution reconstructed by EM algorithm is stated as follows. Pr{x i =0} = 7 4 Pr{R(x i)=1} 3 4 Pr{R(x i)=0}, (3) Pr{x i =1} = 7 4 Pr{R(x i)=0} 3 4 Pr{R(x i)=1}. (4) Also in the second step, a malicious data miner may invade privacy by using a private data recovery algorithm. This algorithm is used to recover individual data values from the randomized data supplied by the data providers. Figure 1 depicts the architecture of the system. Clearly, any privacy preserving data mining system should be measured by its capacity of both constructing the accurate data mining results and protecting individual data values from being compromised by the malicious data miners.

4 Fig. 1. System Model 4 Quantification of Accuracy In this section, we study the measure of the accuracy of data mining results. First, we briefly review previous accuracy measures and address their problem. Then, we propose a new accuracy measure named effective sample size and derive an analytical bound on it. 4.1 Previous Measures In previous studies, several accuracy measures have been proposed. We classify these measures into two categories. One category is application-specified accuracy measures. Measures in this category are specified to particular data mining applications. For example, in the MASK system [18] for privacy preserving association rule mining, the measurement of accuracy includes two measures, named support error and identity error, respectively. Support error is the average error on the support of identified frequent itemsets. Identity error measures the average probability of that frequent itemset is not identified. These measures are specified to association rule mining and cannot be applied to other data mining applications (e.g., data classification). The other category is general accuracy measures. Measures in this category can be applied to any privacy preserving data mining systems based on the randomization approach. An existing measure in this category is information loss measure [1]. Let p be the reconstructed distribution. The information loss measure I(p X, p) is defined as I(p X, p) = 1 ] 2 [ V E p X (x) p(x) dx, (5) X which is in proportion to the expected error of the reconstructed distribution.

5 4.2 Problem of Previous Measures We remark that the ultimate goal of the performance measurements is to help the system designers to choose the optimal randomization operator. As we can see from the privacy preserving data mining process in Section 3, the randomization operator has to be determined before any data is transferred from the data providers to the data miner. Thus, in order to reach its goal, a performance measure must be estimated or bounded without any knowledge of the data being mined. As we can see, the application-specified accuracy measures depend on both the reconstructed data distribution and the performance of data mining algorithm. The information loss measure depends on both the original distribution and the reconstructed distribution. Neither measure can be estimated or bounded when the data distribution is not known. Thus, previous measures cannot be used by the system designers to choose the optimal randomization operator. 4.3 Effective Sample Size We now propose effective sample size as our new accuracy measure. Roughly speaking, given the number of the randomized data points, the effective sample size is in proportion to the minimum number of original data points that can make an estimate of the data distribution as accurate as the distribution reconstructed from the randomized data points. The formal definition is stated as follows. Definition 1. Suppose that the system consists of n data providers and one data miner. Given randomization operator R : V X V Y, let p be the maximum likelihood estimate of the distribution of x i reconstructed from R(x 1 ),..., R(x n ). Recall that p X is the original distribution of x i. Let p 0 (k) be the maximum likelihood estimate of the distribution based on k random variables generated from distribution p X. We define the effective sample size r as the minimum value of k/n such that D Kol ( p 0 (k),p X ) D Kol ( p, p X ) (6) where D Kol is the Kolmogorov distance [16], which measures the distance between an estimated distribution and the theoretical distribution 1. As we can see, effective sample size is a general accuracy measure which measures the accuracy of the reconstructed distribution. Effective sample size is a function of three parameters: n, R, and p X. As we can see from the simulation result in Figure 2, the minimum value of k is (almost) in proportion to n. Thus, we can reduce the effective sample size to a function of R and p X. We now show that the effective sample size can be strictly bounded without any knowledge of p X. Theorem 1. Recall that p[x y] is the probability transition function of R : V X V Y. An upper bound on the effective sample size r is given as follows. r 1 y V Y min p[x y]. (7) x V X 1 Other measures of such distance (e.g., Kuiper distance, Anderson-Darling distance, etc) can also be used to define the effective sample size. The use of other measures does not influence the results in this paper.

6 15 10 min k Number of Data Providers (n) 10 4 Fig. 2. Relationship between min k and n Proof. We denote Pr{x i = x} and Pr{R(x i )=y} by p(x) and p(y), respectively. We have p(y) = p(x)p[x y] (8) x V X =minp[x y]+ p(x)(p[x y] min p[x y]) (9) x V X x V X x V X We separate R into two operators, R 1 and R 2, such that R( ) = R 2 (R 1 ( )). Let y V Y min x VX p[x y] be p 0. Note that p 0 1. Let e V X V Y be a symbol which represents a denial-of-service. Note that no private information can be infered from e. R 1 and R 2 are stated as follows. e, with probability p 0, y R 1 (x) = 1, with probability p[x y 1 ] min x VX p[x y 1 ], (10),, y VY, with probability p[x y VY ] min x VX p[x y VY ], z, if z e, y R 2 (z) = 1, if z = e and with probability (min x VX p[x y 1 ])/p 0, (11),, y VY, if z = e and with probability (min x VX p[x y VY ])/p 0, Here, y 1,,y VY are all possible values occur in V Y. That is, V Y = {y 1,,y VY }. We now show the equivalence between R( ) and R 2 (R 1 ( )). For all x V X,y V Y, we have Pr{R 2 (R 1 (x)) = y} (12) =Pr{R 1 (x) =e} Pr{R 2 (R 1 (x)) = y R 1 (x) =e} +Pr{R 1 (x) =y} Pr{R 2 (R 1 (x)) = y R 1 (x) =y} (13) min x VX p[x y] =p 0 + p[x y] min p[x y], p 0 x V X (14) =p[x y]. (15)

7 Note that R 2 is only determined by p[x y], which is the probability transition function of R. Suppose that the data providers use R 1 to randomized their data. The data miner can always construct R(x i ) from R 1 (x i ) using its knowledge of R. Thus, the effective sample size when R is used is always less than or equal to the effective sample size when R 1 is used. That is, r 1 p 0 =1 min p[x y]. (16) x V X y V Y This bound only depends on the randomization operator R. It is independent of the number of data providers n and the original data distribution p X. As we can see, the bound can be met if and only if for any given x V X, there exists no more than one y i V Y, such that 5 Quantification of Privacy Protection p[x y i ] > p 0 V Y. (17) In this section, we address issues related to the measurement of privacy protection in privacy preserving data mining. First, we briefly review the previous measures of privacy protection. Then, we identify a tacit assumption made by previous measures which is unrealistic in practice. To solve the problem, we propose a new privacy measure based on a game theoretic framework. 5.1 Previous Measures In previous studies, two kinds of privacy measures have been proposed. One kind of measure is information theoretic measure [1], which measures privacy by the mutual information between the original data x i and the randomized data R(x i ) (i.e., I(x i ; R(x i ))). This measure is a statistical measurement of the privacy disclosure. In [9], the authors challenge the information theoretic measure and remark that there exist certain kinds of privacy disclosure that cannot be captured by this measure. For example, suppose that for a certain y V Y, a data miner can almost certainly infer that x i = y from R(x i )=y (i.e., Pr{x i = y R(x i )=y} 1). This privacy disclosure is serious because if a data provider knows the disclosure, it will purposely change its randomized data if the randomized data value happens to be y. However, the information theoretic measure cannot capture this privacy disclosure if the occurrence of y has a fairly low probability (i.e., Pr{R(x i )=y} 0). The reason is that the mutual information only measures the average information that is disclosed to the data miner. The other kind of privacy measure is proposed to solve the problem of the information theoretic measure. Privacy measures of this kind includes privacy breach measure [9] and interval-based privacy measures [3,21]. We use the privacy breach measure as an example. Due to the privacy breach measure, the level of privacy protection is determined by p[x y] max x,x V X p[x (18) y]

8 for any given y V Y. This measure captures the worst case privacy disclosure and can guarantee a bound on the level of privacy protection without any knowledge of the original data distribution. However, we remark that this measure solves the problem of the information theoretic measure by an exact reverse. That is, the privacy breach measure is (almost) independent of the average information disclosure and only depends on the privacy disclosure in the worst case. We will show the problem of previous measures as follows. 5.2 Problem of Previous Measures For the measurement of privacy, we need to define the privacy of data providers first. In the dictionary, privacy is defined to be the capacity of the data providers to be freedom from unauthorized intrusion [17]. As we can see from the definition, the effectiveness of privacy protection depends on whether a malicious data miner can perform unauthorized intrusion to the data providers. The privacy loss of the data providers is measured by the gain of the data miner from unauthorized intrusions. Thus, the privacy protection measure depends on two important factors: a) the privacy protection mechanism of the data providers, and b) the unauthorized intrusion technique of the data miner. The data miner has the freedom to choose different intrusion techniques in different circumstances. Thus, the intrusion technique of the data miner should always be considered in the measurement of privacy. However, previous measures do not follow this principle. Both information theoretic measure and privacy breach measure do not address the variety of intrusion techniques. Instead, they make a tacit assumption that all data miners will use the same intrusion technique. This assumption seems to be reasonable as a (rational) data miner will always choose the intrusion technique that compromises the most private information. However, as we will show below, the optimal intrusion technique varies in different circumstances. Thereby, the absence of consideration of intrusion techniques results in problems of the privacy measurement. Example 1. Suppose that V X = {0, 1}. The original data x i is uniformly distributed on V X. The system designer needs to determine which of the following two randomization operators, R 1 and R 2, discloses less private information. { x, with probability 0.70, R 1 (x) = (19) x, with probability , if x =0, R 2 (x) = 1, if x =1and with probability 0.01, (20) 0, if x =1and with probability In the example, the mutual information I(x; R 1 (x)) is much greater than I(x; R 2 (x)). That is, the average amount of private information disclosed by R 1 is much greater than R 2. Due to the information theoretic measure, R 2 is better than R 1 in the privacy protection perspective. The result is different when the privacy breach measure is used. As we can see, if the data miner receives R 2 (x i )=1, then it can always infer that x i =1with probability of 1. Thus, the worst-case privacy loss of R 2 is much greater than that of R 1. Due to the privacy breach measure, R 1 is better than R 2 in the privacy protection perspective.

9 We now show that whether R 1 or R 2 is better actually depends on the system setting. In particular, we consider the following two system settings. 1. The system is an online survey system where the survey analyzer and the survey respondents are the data miner and the data providers, respectively. The value of x i indicates whether a survey respondent is interested in buying certain merchandise. The intrusion performed by a malicious data miner is to make unauthorized advertisement to data providers with such interest. 2. The system consists of n companies as the data providers and a management consulting firm as the data miner. The consulting firm performs statistical analysis on the financial data of the companies. The original data x i contains the expected profit of the company which has not been published yet. As the unauthorized intrusion, a malicious data miner may use x i to make investment on a high-risk stock market. The profit from a successful investment is great. However, a failed investment results in a loss five times greater than the profit the data miner may obtain from a successful investment. In the first case, an advertisement to a wrong person costs the data miner little. A reasonable strategy for the data miner is to make advertisement to all data providers. In fact, if the expected loss from an incorrect estimate (i.e., advertisement to a person without interest) is equal to 0, this is the optimal intrusion technique for the data miner. Consider the two randomization operators, R 1 discloses the original data value with probability of 0.7, which is greater than that of R 2 (0.501). Thus, R 2 is better than R 1 in the privacy protection perspective. In the second case, the data miner will not perform the intrusion when R 1 is used by the data providers. The reason is that the loss from a failed investment (i.e., an incorrect estimate on x i ) is unaffordable. Even if the profit from a successful investment is fairly high, the loss from a wrong decision is too high to risk. That is, for the data miner, the expected net benefit from an unauthorized intrusion is less than 0. However, the data miner will perform the intrusion if a randomized data R 2 (x i )=1 is received the data miner. The reason is that the data miner has a fairly high probability (99%) to make a successful investment. If a randomized data R 2 (x i )=0is received, the data miner will simply ignore it. Thus, in this case, R 1 is better than R 2 in the privacy protection perspective. As we can see from the example, the data miner will choose different privacy intrusion techniques in different system settings. This will result in different performance of the randomization operators. Thus, the system setting and the privacy intrusion technique has to be considered in the measurement of privacy. 5.3 A Game Theoretic Framework In order to introduce the system setting and the privacy intrusion technique to our privacy measure, we first propose a game theoretic framework to analyze the strategies of the data miner (i.e., privacy intrusion technique). Since we are studying the privacy protection performance of the randomization operator, we consider the randomization operator as the strategy of the data providers.

10 We model the privacy preserving data mining process as a non-cooperative game between the data providers and the data miner. There are two players in the game. One is the data providers. The other is the data miner. Since we only consider the privacy measure, the game is zero-sum in that the benefit obtained by the server from unauthorized intrusions always results in an invasion of the privacy of the data providers. Let S c be the set of randomization operators that the data providers can choose from. Let S s be the set of the intrusion techniques that the data miner can choose from. Let u c and u s be the payoffs (i.e., expected benefits) of the data providers and the data miner, respectively. Since the game is zero-sum, we have u c + u s =0. We remark that the payoffs depend on both the strategies of the players and the system setting. We assume that both the data providers and the data miner are rational. That is, given a certain randomization operator, the data miner always choose the optimal privacy intrusion technique that can maximize its payoff u s. Given a certain privacy intrusion technique, the data providers always choose the optimal randomization operator that can maximize u c. Due to game theory, if a Nash equilibrium 2 exists in the game, it contains the optimal strategies for both the data providers and the data miner [11]. 5.4 Our Privacy Measure Now we will define our privacy measure based on the game theoretic formulation. Definition 2. Given a privacy preserving data mining system G (S s,s c,u s,u c ),we define the privacy measure l p of a randomization operator R as l p (R) =u c (R, L 0 ), (21) where L 0 is the optimal privacy intrusion technique for the data miner when R is used by the data providers, u c is the payoff of the data providers when R and L 0 are used. As we can see, the smaller l p is, the more benefit is obtained by the data miner from the unauthorized intrusion. We now use an example to illustrate the definition. Example 2. Let V X be {0, 1}. Support that the original data x i is uniformly distributed on V X. A system designer wants to make a comparison between the privacy preserving capacity of randomization operators R 1 and R 2, which are shown as follows. { xi, with probability 0.60; R 1 (x i )= x i, with probability 0.40; { xi, with probability 0.01; R 2 (x i )= e, with probability 0.99; (22) (23) where e is a denial-of-service signal which satisfies e {0, 1}. As we can see, no private information can be inferred from e. Thus, without loss of generality, we suppose that the data miner ignores a data point if it has a value of e. Due to the information theoretic measure, R 2 is better than R 1. Due to the private breach measure, R 1 is better 2 Roughly speaking, a Nash equilibrium is a condition where no player can benefit by changing its own strategy unilaterally while the other player keeps its current strategy.

11 than R 2. We will analyze the problem based on our privacy measure in a game theoretic formulation. Since the comparison is between R 1 and R 2, we assume that the data providers can only choose the randomization operator from {R 1,R 2 }. That is, S c = {R 1,R 2 }. For a given system setting, let the optimal intrusion technique for the data miner be L 0. We now propose a specific intrusion technique L 1. Roughly speaking, L 1 represents an intrusion technique that infers x i = R (x i ) if and only if R (x i ) e. Wehave {L 0,L 1 } S s. Since Pr{R 2 (x i )= x i } =0, L 1 is the optimal intrusion technique for the data miner when R 2 is the randomization operator. That is, L 0 = L 1 when R 2 is used by the data providers. The strategies and payoffs are listed in Table 1, where Table 1. Strategies and Payoffs L 0 L 1 R 1 u 0/u 0 u 1/u 1 R 2 u 2/u 2 u 2/u 2 u 0, u 1 and u 2 are the payoffs of the data miner in different circumstances. Due to the assumption that L 0 is the optimal intrusion technique, we always have u 0 u 1. The comparison between u 1 and u 2 depends on the system setting. Recall the two system settings in Example 1. In the online survey example, we have u 1 >u 2. In the stock market example, a reasonable estimation is u 1 u 0 u 2. Let C, S be the strategies of the data providers and the data miner, respectively. We consider the comparison between R 1 and R 2 in different cases as follows. 1. u 1 >u 2 There are two Nash equilibria in the game: C, S = R 2,L 0 and C, S = R 2,L 1. Thus, R 2 is a better choice for the data providers in the privacy protection perspective. 2. u 1 <u 2,u 0 >u 2 Only one Nash equilibrium C, S = R 2,L 0 exists in the game. Thus, R 2 is a better choice for the data providers in the privacy protection perspective. 3. u 1 <u 2,u 0 <u 2 Only one Nash equilibrium C, S = R 1,L 0 exists in the game. Thus, R 1 is a better choice for the data providers in the privacy protection perspective. As we can see, the comparison between the privacy preserving capacity of R 1 and R 2 depends on the comparison between u 1 and u 2, which is determined by the ratio between the benefit from a correct estimate and the loss from an incorrect estimate. Let the ratio be σ. In the above case, we have σ = A useful theorem is provided as follows. gain from a correct estimate loss from an incorrect estimate = 40u 2. (24) 60u 2 u 1

12 Theorem 2. Suppose that in the original data distribution, we have max Pr{x i = x 0 } = p m. (25) x 0 V X If the randomization operator R : V X V Y satisfies max x VX p[x y] max y V Y min x VX p[x y] 1 p m, (26) σp m then the privacy measure l p (R) =0. The proof of Theorem 2 is omitted due to space limit. 6 Conclusion In this paper, we establish the foundation for the measurements of accuracy and privacy protection in privacy preserving data mining. On accuracy side, we address the problem of previous accuracy measures and solve the problem by introducing an effective sample size measure. On privacy protection side, we first identify an unrealistic assumption tacitly made by previous measures. After that, we present a game theoretic formulation of the system and propose a privacy protection measure based on the formulation. We conclude this paper with some future research directions. Design the optimal randomization operator based on the new accuracy and privacy protection measures. Further analysis on the performance of data mining algorithms. Most existing theoretic analysis on the performance of privacy preserving data mining techniques is based on the assumption of ideal data mining algorithm. The performance of practical data mining algorithms has only been analyzed through heuristic results. However, as shown in [4], the difference between practical and ideal data mining algorithms can be nontrivial. Further analysis on this issue is needed to measure the performance of randomization operators more precisely. References 1. D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages ACM Press, R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 26th ACM SIGMOD International Conference on Management of Data, pages ACM Press, R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 26th ACM SIGMOD Conference on Management of Data, pages ACM Press, C. Clifton. Using sample size to limit exposure to data mining. Journal of Computer Security, 8(4): , W. Du and M. Atallah. Privacy-preserving cooperative statistical analysis. In Proceedings of the 17th Annual Computer Security Applications Conference, page 102. IEEE Computer Society, 2001.

13 6. W. Du, Y. S. Han, and S. Chen. Privacy-preserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the 4th SIAM International Conference on Data Mining, pages SIAM Press, W. Du and Z. Zhan. Building decision tree classifier on private data. In Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, pages 1 8. Australian Computer Society, Inc., W. Du and Z. Zhan. Using randomized response techniques for privacy-preserving data mining. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pages ACM Press, A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, R. Gibbons. A Primer in Game Theory. Harvester Wheatsheaf, New York, O. Goldreich. Secure Multi-Party Computation. Cambridge University Press, M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9): , M. Kantarcioglu and J. Vaidya. Privacy preserving naïve bayes classifier for horizontally partitioned data. In Workshop on Privacy Preserving Data Mining held in association with The 3rd IEEE International Conference on Data Mining, Y. Lindell and B. Pinkas. Privacy preserving data mining. In Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology, pages Springer Verlag, F. J. Massey. The Kolmogorov-Smirnov test for goodness of fit. Journal of American Statistical Association, 46(253):68 78, Merriam-Webster. Merriam-Webster s Collegiate Dictionary. Merriam-Webster, Inc., S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th International Conference on Very Large Data Bases, pages Morgan Kaufmann, J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, J. Vaidya and C. Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In Proceedings of the 4th SIAM Conference on Data Mining, pages SIAM Press, Y. Zhu and L. Liu. Optimal randomization for privacy preserving data mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, 2004.

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,

More information

Data mining successfully extracts knowledge to

Data mining successfully extracts knowledge to C O V E R F E A T U R E Privacy-Preserving Data Mining Systems Nan Zhang University of Texas at Arlington Wei Zhao Rensselaer Polytechnic Institute Although successful in many applications, data mining

More information

Preserving Privacy in Data Preparation for. Association Rule Mining

Preserving Privacy in Data Preparation for. Association Rule Mining Preserving Privacy in Data Preparation for Association Rule Mining Nan Zhang, Shengquan Wang, and Wei Zhao, Fellow, IEEE Abstract We address the privacy preserving association rule mining problem in a

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582 1582 AN EFFICIENT CRYPTOGRAPHIC APPROACH FOR PRESERVING PRIVACY IN DATA MINING T.Sujitha 1, V.Saravanakumar 2, C.Saravanabhavan 3 1. M.E. Student, Sujiraj.me@gmail.com 2. Assistant Professor, visaranams@yahoo.co.in

More information

A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

More information

Building Decision Tree Classifier on Private Data

Building Decision Tree Classifier on Private Data Building Decision Tree Classifier on Private Data Wenliang Du Zhijun Zhan Center for Systems Assurance Department of Electrical Engineering and Computer Science Syracuse University, Syracuse, NY 13244,

More information

A Secure Model for Medical Data Sharing

A Secure Model for Medical Data Sharing International Journal of Database Theory and Application 45 A Secure Model for Medical Data Sharing Wong Kok Seng 1,1,Myung Ho Kim 1, Rosli Besar 2, Fazly Salleh 2 1 Department of Computer, Soongsil University,

More information

Privacy Preserving Outsourcing for Frequent Itemset Mining

Privacy Preserving Outsourcing for Frequent Itemset Mining Privacy Preserving Outsourcing for Frequent Itemset Mining M. Arunadevi 1, R. Anuradha 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1 Assistant

More information

Privacy Preserved Association Rule Mining For Attack Detection and Prevention

Privacy Preserved Association Rule Mining For Attack Detection and Prevention Privacy Preserved Association Rule Mining For Attack Detection and Prevention V.Ragunath 1, C.R.Dhivya 2 P.G Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

Information Security in Big Data using Encryption and Decryption

Information Security in Big Data using Encryption and Decryption International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor

More information

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology

More information

Is Privacy Still an Issue for Data Mining? (Extended Abstract)

Is Privacy Still an Issue for Data Mining? (Extended Abstract) Is Privacy Still an Issue for Data Mining? (Extended Abstract) Chris Clifton Wei Jiang Mummoorthy Muruguesan M. Ercan Nergiz Department of Computer Science Purdue University 305 North University Street

More information

Enhancement of Security in Distributed Data Mining

Enhancement of Security in Distributed Data Mining Enhancement of Security in Distributed Data Mining Sharda Darekar 1, Prof.D.K.Chitre, 2 1,2 Department Of Computer Engineering, Terna Engineering College,Nerul,Navi Mumbai. 1 sharda.darekar@gmail.com,

More information

Experimental Analysis of Privacy-Preserving Statistics Computation

Experimental Analysis of Privacy-Preserving Statistics Computation Experimental Analysis of Privacy-Preserving Statistics Computation Hiranmayee Subramaniam 1, Rebecca N. Wright 2, and Zhiqiang Yang 2 1 Stevens Institute of Technology graduate, hiran@polypaths.com. 2

More information

Data Outsourcing based on Secure Association Rule Mining Processes

Data Outsourcing based on Secure Association Rule Mining Processes , pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim

More information

Performing Data Mining in (SRMS) through Vertical Approach with Association Rules

Performing Data Mining in (SRMS) through Vertical Approach with Association Rules Performing Data Mining in (SRMS) through Vertical Approach with Association Rules Mr. Ambarish S. Durani 1 and Miss. Rashmi B. Sune 2 MTech (III rd Sem), Vidharbha Institute of Technology, Nagpur, Nagpur

More information

Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2

Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 04, 2014 ISSN (online): 2321-0613 Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2 1 M.Tech

More information

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

More information

PBKM: A Secure Knowledge Management Framework

PBKM: A Secure Knowledge Management Framework PBKM: A Secure Knowledge Management Framework (extended abstract) Shouhuai Xu and Weining Zhang Department of Computer Science, University of Texas at San Antonio {shxu,wzhang}@cs.utsa.edu Abstract In

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Sharing Online Advertising Revenue with Consumers

Sharing Online Advertising Revenue with Consumers Sharing Online Advertising Revenue with Consumers Yiling Chen 2,, Arpita Ghosh 1, Preston McAfee 1, and David Pennock 1 1 Yahoo! Research. Email: arpita, mcafee, pennockd@yahoo-inc.com 2 Harvard University.

More information

A Survey of Quantification of Privacy Preserving Data Mining Algorithms

A Survey of Quantification of Privacy Preserving Data Mining Algorithms A Survey of Quantification of Privacy Preserving Data Mining Algorithms Elisa Bertino, Dan Lin, and Wei Jiang Abstract The aim of privacy preserving data mining (PPDM) algorithms is to extract relevant

More information

Sharing Online Advertising Revenue with Consumers

Sharing Online Advertising Revenue with Consumers Sharing Online Advertising Revenue with Consumers Yiling Chen 2,, Arpita Ghosh 1, Preston McAfee 1, and David Pennock 1 1 Yahoo! Research. Email: arpita, mcafee, pennockd@yahoo-inc.com 2 Harvard University.

More information

Privacy-preserving Data Mining: current research and trends

Privacy-preserving Data Mining: current research and trends Privacy-preserving Data Mining: current research and trends Stan Matwin School of Information Technology and Engineering University of Ottawa, Canada stan@site.uottawa.ca Few words about our research Universit[é

More information

Privacy Preserving Clustering on Horizontally Partitioned Data

Privacy Preserving Clustering on Horizontally Partitioned Data Privacy Preserving Clustering on Horizontally Partitioned Data Ali nan, Yücel Saygın, Erkay Sava, Ayça Azgın Hintolu, Albert Levi Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul,

More information

Template-Based Privacy Preservation in Classification Problems

Template-Based Privacy Preservation in Classification Problems Template-Based Privacy Preservation in Classification Problems Ke Wang Simon Fraser University BC, Canada V5A S6 wangk@cs.sfu.ca Benjamin C. M. Fung Simon Fraser University BC, Canada V5A S6 bfung@cs.sfu.ca

More information

PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM

PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM Pasupuleti Rajesh 1 and Gugulothu Narsimha 2 1 Department of Computer Science and Engineering, VVIT College, Guntur, India rajesh.pleti@gmail.com

More information

OLAP Online Privacy Control

OLAP Online Privacy Control OLAP Online Privacy Control M. Ragul Vignesh and C. Senthil Kumar Abstract--- The major issue related to the protection of private information in online analytical processing system (OLAP), is the privacy

More information

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information Finance 400 A. Penati - G. Pennacchi Notes on On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information by Sanford Grossman This model shows how the heterogeneous information

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

Analysis of Privacy-Preserving Element Reduction of Multiset

Analysis of Privacy-Preserving Element Reduction of Multiset Analysis of Privacy-Preserving Element Reduction of Multiset Jae Hong Seo 1, HyoJin Yoon 2, Seongan Lim 3, Jung Hee Cheon 4 and Dowon Hong 5 1,4 Department of Mathematical Sciences and ISaC-RIM, Seoul

More information

Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation

Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Keng-Pei Lin Ming-Syan Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Research Center

More information

DATA MINING - 1DL105, 1DL025

DATA MINING - 1DL105, 1DL025 DATA MINING - 1DL105, 1DL025 Fall 2009 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht09 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Preserving Privacy in Supply Chain Management: a Challenge for Next Generation Data Mining 1

Preserving Privacy in Supply Chain Management: a Challenge for Next Generation Data Mining 1 Preserving Privacy in Supply Chain Management: a Challenge for Next Generation Data Mining 1 Madhu Ahluwalia, Zhiyuan Chen, Aryya Gangopadhyay 2, Zhiling Guo {madhu.is, zhchen, gangopad, zgou}@umbc.edu

More information

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases A Novel Technique of Privacy Protection Mining of Association Rules from Outsource Transaction Databases 1 Dhananjay D. Wadkar, 2 Santosh N. Shelke 1 Computer Engineering, Sinhgad Academy of Engineering

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

PRIVACY PRESERVING DATA MINING OVER VERTICALLY PARTITIONED DATA. A Thesis. Submitted to the Faculty. Purdue University. Jaideep Shrikant Vaidya

PRIVACY PRESERVING DATA MINING OVER VERTICALLY PARTITIONED DATA. A Thesis. Submitted to the Faculty. Purdue University. Jaideep Shrikant Vaidya PRIVACY PRESERVING DATA MINING OVER VERTICALLY PARTITIONED DATA A Thesis Submitted to the Faculty of Purdue University by Jaideep Shrikant Vaidya In Partial Fulfillment of the Requirements for the Degree

More information

A Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks Lin Chen, Member, IEEE, and Jean Leneutre

A Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks Lin Chen, Member, IEEE, and Jean Leneutre IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL 4, NO 2, JUNE 2009 165 A Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks Lin Chen, Member, IEEE, and Jean Leneutre

More information

Accuracy in Privacy-Preserving Data Mining Using the Paradigm of Cryptographic Elections

Accuracy in Privacy-Preserving Data Mining Using the Paradigm of Cryptographic Elections Accuracy in Privacy-Preserving Data Mining Using the Paradigm of Cryptographic Elections Emmanouil Magkos 1, Manolis Maragoudakis 2, Vassilis Chrissikopoulos 1, and Stefanos Gridzalis 2 1 Department of

More information

Barriers to Adopting Privacy-preserving Data Mining

Barriers to Adopting Privacy-preserving Data Mining Barriers to Adopting Privacy-preserving Data Mining Richard Huebner Norwich University ABSTRACT The primary issue examined in this research is that privacy-preserving data mining (PPDM) research has produced

More information

A GENERAL SURVEY OF PRIVACY-PRESERVING DATA MINING MODELS AND ALGORITHMS

A GENERAL SURVEY OF PRIVACY-PRESERVING DATA MINING MODELS AND ALGORITHMS Chapter 2 A GENERAL SURVEY OF PRIVACY-PRESERVING DATA MINING MODELS AND ALGORITHMS Charu C. Aggarwal IBM T. J. Watson Research Center Hawthorne, NY 10532 charu@us.ibm.com Philip S. Yu IBM T. J. Watson

More information

Optimization Based Data Mining in Business Research

Optimization Based Data Mining in Business Research Optimization Based Data Mining in Business Research Praveen Gujjar J 1, Dr. Nagaraja R 2 Lecturer, Department of ISE, PESITM, Shivamogga, Karnataka, India 1,2 ABSTRACT: Business research is a process of

More information

PRIVACY PRESERVING ASSOCIATION RULE MINING

PRIVACY PRESERVING ASSOCIATION RULE MINING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

Current Developments of k-anonymous Data Releasing

Current Developments of k-anonymous Data Releasing Current Developments of k-anonymous Data Releasing Jiuyong Li 1 Hua Wang 1 Huidong Jin 2 Jianming Yong 3 Abstract Disclosure-control is a traditional statistical methodology for protecting privacy when

More information

Using Data Mining Methods to Predict Personally Identifiable Information in Emails

Using Data Mining Methods to Predict Personally Identifiable Information in Emails Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,

More information

Encyclopedia of Information Ethics and Security

Encyclopedia of Information Ethics and Security Encyclopedia of Information Ethics and Security Marian Quigley Monash University, Australia InformatIon ScIence reference Hershey New York Acquisitions Editor: Development Editor: Senior Managing Editor:

More information

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015

City University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:

More information

Enhancing Wireless Security with Physical Layer Network Cooperation

Enhancing Wireless Security with Physical Layer Network Cooperation Enhancing Wireless Security with Physical Layer Network Cooperation Amitav Mukherjee, Ali Fakoorian, A. Lee Swindlehurst University of California Irvine The Physical Layer Outline Background Game Theory

More information

Self-Disciplinary Worms and Countermeasures: Modeling and Analysis

Self-Disciplinary Worms and Countermeasures: Modeling and Analysis Self-Disciplinary Worms and Countermeasures: Modeling and Analysis Wei Yu, Nan Zhang, Xinwen Fu, and Wei Zhao Abstract In this paper, we address issues related to the modeling, analysis, and countermeasures

More information

Equilibrium computation: Part 1

Equilibrium computation: Part 1 Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium

More information

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health

Example application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining

More information

Network Security Validation Using Game Theory

Network Security Validation Using Game Theory Network Security Validation Using Game Theory Vicky Papadopoulou and Andreas Gregoriades Computer Science and Engineering Dep., European University Cyprus, Cyprus {v.papadopoulou,a.gregoriades}@euc.ac.cy

More information

Bargaining Solutions in a Social Network

Bargaining Solutions in a Social Network Bargaining Solutions in a Social Network Tanmoy Chakraborty and Michael Kearns Department of Computer and Information Science University of Pennsylvania Abstract. We study the concept of bargaining solutions,

More information

Probabilistic Graphical Models Final Exam

Probabilistic Graphical Models Final Exam Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The

More information

Nash Equilibrium. Ichiro Obara. January 11, 2012 UCLA. Obara (UCLA) Nash Equilibrium January 11, 2012 1 / 31

Nash Equilibrium. Ichiro Obara. January 11, 2012 UCLA. Obara (UCLA) Nash Equilibrium January 11, 2012 1 / 31 Nash Equilibrium Ichiro Obara UCLA January 11, 2012 Obara (UCLA) Nash Equilibrium January 11, 2012 1 / 31 Best Response and Nash Equilibrium In many games, there is no obvious choice (i.e. dominant action).

More information

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS Nitin Trivedi, Research Scholar, Manav Bharti University, Solan HP ABSTRACT The purpose of this study is not to delve deeply into the technical

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Security Analysis for Order Preserving Encryption Schemes

Security Analysis for Order Preserving Encryption Schemes Security Analysis for Order Preserving Encryption Schemes Liangliang Xiao University of Texas at Dallas Email: xll052000@utdallas.edu Osbert Bastani Harvard University Email: obastani@fas.harvard.edu I-Ling

More information

A Protocol for Privacy Preserving Neural Network Learning on Horizontally Partitioned Data

A Protocol for Privacy Preserving Neural Network Learning on Horizontally Partitioned Data A Protocol for Privacy Preserving Neural Network Learning on Horizontally Partitioned Data Nico Schlitter Faculty of Computer Science Otto-von-Guericke-University Magdeburg, Germany nico.schlitter@iti.cs.uni-magdeburg.de

More information

Online Adwords Allocation

Online Adwords Allocation Online Adwords Allocation Shoshana Neuburger May 6, 2009 1 Overview Many search engines auction the advertising space alongside search results. When Google interviewed Amin Saberi in 2004, their advertisement

More information

arxiv:1402.3426v3 [cs.cr] 27 May 2015

arxiv:1402.3426v3 [cs.cr] 27 May 2015 Privacy Games: Optimal User-Centric Data Obfuscation arxiv:1402.3426v3 [cs.cr] 27 May 2015 Abstract Consider users who share their data (e.g., location) with an untrusted service provider to obtain a personalized

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress)

DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) DATA MINING, DIRTY DATA, AND COSTS (Research-in-Progress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Compact Representations and Approximations for Compuation in Games

Compact Representations and Approximations for Compuation in Games Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015 ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1

More information

DSL Spectrum Management

DSL Spectrum Management DSL Spectrum Management Dr. Jianwei Huang Department of Electrical Engineering Princeton University Guest Lecture of ELE539A March 2007 Jianwei Huang (Princeton) DSL Spectrum Management March 2007 1 /

More information

Network Security A Decision and Game-Theoretic Approach

Network Security A Decision and Game-Theoretic Approach Network Security A Decision and Game-Theoretic Approach Tansu Alpcan Deutsche Telekom Laboratories, Technical University of Berlin, Germany and Tamer Ba ar University of Illinois at Urbana-Champaign, USA

More information

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory

CSC2420 Fall 2012: Algorithm Design, Analysis and Theory CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms

More information

IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA

IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA International Conference on Computer Science, Electronics & Electrical Engineering-0 IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA Pavan M N, Manjula G Dept Of ISE,

More information

Introduction to Learning & Decision Trees

Introduction to Learning & Decision Trees Artificial Intelligence: Representation and Problem Solving 5-38 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning? - more than just memorizing

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Priority Based Load Balancing in a Self-Interested P2P Network

Priority Based Load Balancing in a Self-Interested P2P Network Priority Based Load Balancing in a Self-Interested P2P Network Xuan ZHOU and Wolfgang NEJDL L3S Research Center Expo Plaza 1, 30539 Hanover, Germany {zhou,nejdl}@l3s.de Abstract. A fundamental issue in

More information

Online Scheduling for Cloud Computing and Different Service Levels

Online Scheduling for Cloud Computing and Different Service Levels 2012 IEEE 201226th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum Online Scheduling for

More information

6.896 Probability and Computation February 14, Lecture 4

6.896 Probability and Computation February 14, Lecture 4 6.896 Probability and Computation February 14, 2011 Lecture 4 Lecturer: Constantinos Daskalakis Scribe: Georgios Papadopoulos NOTE: The content of these notes has not been formally reviewed by the lecturer.

More information

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,

More information

Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Kun Liu Hillol Kargupta and Jessica Ryan Abstract This paper explores the possibility of using multiplicative

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

An Introduction to Competitive Analysis for Online Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002)

An Introduction to Competitive Analysis for Online Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002) An Introduction to Competitive Analysis for Online Optimization Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002) IMA, December 11, 2002 Overview 1 Online Optimization sequences

More information

Survey on Data Privacy in Big Data with K- Anonymity

Survey on Data Privacy in Big Data with K- Anonymity Survey on Data Privacy in Big Data with K- Anonymity Salini. S, Sreetha. V. Kumar, Neevan.R M.Tech Student, Dept of CSE, Marian Engineering College, Trivandrum, Kerala, India Asst. Professor, Dept of CSE,

More information

When Random Sampling Preserves Privacy

When Random Sampling Preserves Privacy When Random Sampling Preserves Privacy Kamalika Chaudhuri 1 and Nina Mishra 2 1 Computer Science Department, UC Berkeley, Berkeley, CA 94720 2 Computer Science Department, University of Virginia, Charlottesville,

More information

Globe Tech, Inc. 76 Northeastern Blvd., Suite #30B Nashua, NH Fax PrivGuard an eprivacy Solution

Globe Tech, Inc. 76 Northeastern Blvd., Suite #30B Nashua, NH Fax PrivGuard an eprivacy Solution Globe Tech, Inc. 76 Northeastern Blvd., Suite #30B Nashua, NH 03062 603-889-8833 Fax 603-579-0892 www.gti.com Protecting Private Healthcare Information (PHI) PrivGuard an eprivacy Solution As a result

More information

Sharing Online Advertising Revenue with Consumers

Sharing Online Advertising Revenue with Consumers Sharing Online Advertising Revenue with Consumers Yiling Chen 2,, Arpita Ghosh 1, Preston McAfee 1, and David Pennock 1 1 Yahoo! Research. Email: arpita, mcafee, pennockd@yahoo-inc.com 2 Harvard University.

More information

Chapter 4: Non-Parametric Classification

Chapter 4: Non-Parametric Classification Chapter 4: Non-Parametric Classification Introduction Density Estimation Parzen Windows Kn-Nearest Neighbor Density Estimation K-Nearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

COMPUTING SCIENCE. Determining User Passwords From Partial Information. Dylan Clarke, Ehsan Toreini and Feng Hao TECHNICAL REPORT SERIES

COMPUTING SCIENCE. Determining User Passwords From Partial Information. Dylan Clarke, Ehsan Toreini and Feng Hao TECHNICAL REPORT SERIES COMPUTING SCIENCE Determining User Passwords From Partial Information Dylan Clarke, Ehsan Toreini and Feng Hao TECHNICAL REPORT SERIES No. CS-TR-1461 April 2015 TECHNICAL REPORT SERIES No. CS-TR-1461 April,

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Why the Normal Distribution?

Why the Normal Distribution? Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.

More information

Competitive Analysis of On line Randomized Call Control in Cellular Networks

Competitive Analysis of On line Randomized Call Control in Cellular Networks Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising

More information

Vishnu Swaroop Computer Science and Engineering College Madan Mohan Malaviya Engineering College Gorakhpur, India Email: rsvsgkp@rediffmail.

Vishnu Swaroop Computer Science and Engineering College Madan Mohan Malaviya Engineering College Gorakhpur, India Email: rsvsgkp@rediffmail. Review and Analysis of Data Security in Data Mining Dileep Kumar Singh IT Resource Centre Madan Mohan Malaviya Engineering College Gorakhpur, India Email : gkp.dks@gmail.com Abstract In new era the information

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

A Three-Dimensional Conceptual Framework for Database Privacy

A Three-Dimensional Conceptual Framework for Database Privacy A Three-Dimensional Conceptual Framework for Database Privacy Josep Domingo-Ferrer Rovira i Virgili University UNESCO Chair in Data Privacy Department of Computer Engineering and Mathematics Av. Països

More information

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider

More information

Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation

Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation Author: TAHIR NISAR - Email: t.m.nisar@soton.ac.uk University: SOUTHAMPTON UNIVERSITY BUSINESS SCHOOL Track:

More information

Weighted Congestion Games

Weighted Congestion Games Players have different weights. A weighted congestion game is a tuple Γ = (N, R, (Σ i ) i N, (d r ) r R, (w i ) i inn ) with N = {1,..., n}, set of players R = {1,..., m}, set of resources Σ i 2 R, strategy

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

An Attacker s View of Distance Preserving Maps For Privacy Preserving Data Mining

An Attacker s View of Distance Preserving Maps For Privacy Preserving Data Mining An Attacker s View of Distance Preserving Maps For Privacy Preserving Data Mining Kun Liu, Chris Giannella, and Hillol Kargupta Department of Computer Science and Electrical Engineering, University of

More information