On the Performance Measurements for Privacy Preserving Data Mining


 Regina Terry
 2 years ago
 Views:
Transcription
1 On the Performance Measurements for Privacy Preserving Data Mining Nan Zhang, Wei Zhao, and Jianer Chen Department of Computer Science, Texas A&M University College Station, TX 77843, USA {nzhang, zhao, Abstract. This paper establishes the foundation for the performance measurements of privacy preserving data mining techniques. The performance is measured in terms of the accuracy of data mining results and the privacy protection of sensitive data. On the accuracy side, we address the problem of previous measures and propose a new measure, named effective sample size, to solve this problem. We show that our new measure can be bounded without any knowledge of the data being mined and discuss when the bound can be met. On the privacy side, we identify a tacit assumption made by previous measures and show that the assumption is unrealistic in many situations. To solve the problem, we introduce a game theoretic framework for the measurement of privacy. 1 Introduction In this paper, we address issues related to the performance measurements of privacy preserving data mining techniques. The purpose of data mining is to discover patterns and extract knowledge from large amounts of data. The objective of privacy preserving data mining is to enable data mining without invading the privacy of the data being mined. We consider a distributed environment where the data being mined are stored in multiple autonomous entities. We can classify privacy preserving data mining systems into two categories based on their infrastructures: ServertoServer (S2S) and Clientto Server (C2S), respectively. In the first category (S2S), the data being mined are distributed across several servers. Each server holds numerous private data points. The servers collaborate with each other to enable data mining across all servers without letting either server know the private data of the other servers. Since the number of servers in a system is usually small, the problem is often modeled as a variation of secure multiparty computation problem, which has been extensively studied in cryptography [12]. Existing privacy preserving algorithms in this category serve a wide variety of data mining tasks including data classification [7, 14, 15, 20], association rule mining [13, 19], and statistical analysis [6]. In the second category (C2S), a system usually consists of a data miner (server) and numerous data providers (clients). Each data provider holds only one data point. The data miner performs data mining tasks on the aggregated (possibly perturbed) data provided by the data providers. A typical example of this kind of system is online survey,
2 as the survey analyzer (data miner) collects data from thousands of survey respondents (data providers). Most existing privacy preserving algorithms in C2S systems use an randomization approach which randomizes the original data to protect the privacy of data providers [1, 2, 5, 8 10, 18]. Both S2S and C2S systems have a broad range of applications. Nevertheless, we focus on studying C2S systems where the randomization approach is used. In particular, we establish the foundation for analyzing the tradeoff between the accuracy of data mining results and the privacy protection of sensitive data. Our contributions in this paper are summarized as follows. On accuracy side, we address the problem of previous measures and propose a new accuracy measure named effective sample size to solve this problem. We show that our new measure can be upper bounded without any knowledge of the data being mined and discuss when the bound can be met. On privacy protection side, we show that a tacit assumption made by previous measures is that all adversaries use the same intrusion technique to invade privacy. We address the problems of this assumption and propose a game theoretic formulation which takes the adversary behavior into consideration. The rest of the paper is organized as follows. In Section 2, we introduce our models of data, data providers, and data miners. Based on these models, we briefly review the literature in Section 3. In Section 4, we propose our new accuracy measure. Analytical bound on the new measure is derived in this section. In Section 5, we propose a game theoretic formulation on the measurement of privacy and define our new privacy measure. Section 6 concludes the paper with some final remarks. 2 System Model Let there be n data providers (clients) C 1,...,C n and one data miner (server) S in the system. Each client C i has a private data point (e.g., transaction, data tuple, etc) x i.we view the original data values x 1,...,x n as n independent and identically distributed (i.i.d.) variables that have the same distribution as a random variable X. Let the domain of X (i.e., the set of all possible values of X) bev X and the distribution of X be p X. Each data point x i is i.i.d. on V X with distribution p X. Due to the privacy concern of data providers, we classify the data miners into two categories. One category is honest data miners. These data miners always acts honestly in that they only perform regular data mining tasks and have no intention to invade privacy. The other category is malicious data miners. These data miners would purposely compromise the privacy of data providers. 3 Related Work To protect the data providers from privacy invasion, countermeasures must be implemented in the data mining system. Randomization is a commonly used approach. We briefly review it as follows.
3 The randomization approach is based on an assumption that accurate data mining results can be obtained from a robust estimation of the data distribution. Previous work showed that this assumption is reasonable in many situations [2]. Thus, the basic idea of the randomization approach is to distort the individual data values but keep an (statistically) accurate estimation of the original data distribution. Based on the randomization approach, the privacy preserving data mining process can be considered as a twostep process. In the first step, each data provider C i perturbs its data x i by applying a predetermined randomization operator R( ) on x i, and then transfers the randomized data R(x i ) to the data miner. We note that the randomization operator is known by both the data providers and the data miner. Let the domain of R(x i ) be V Y. The randomization operator R( ) is a function from V X to V Y with transition probability p[x y]. In previous studies, several randomization operators have been proposed, including random perturbation operator [2], random response operator [8], MASK distortion operator [18], and selectasize operator [10]. For example, the random perturbation operator and the random response operator are listed in (1) and (2), respectively. R(x i )=x i + r i. (1) { xi, if r R(x i )= i θ i, (2) x i, if r i <θ i. Here, x i is the original data value, r i is the noise randomly generated from a predetermined distribution, and θ i is a parameter set by each data provider individually. As we can see, the random response operator only applies to binary data. In the second step, the honest data miner first employs a distribution reconstruction algorithm on the aggregate data, which intends to recover the original data distribution from the randomized data. Then, the honest data miner performs the data mining task on the reconstructed distribution. Several distribution reconstruction algorithms have been proposed [1,2,8,10,18]. In particular, the expectation maximization (EM) algorithm [1] reconstructs the distribution to converge to the maximum likelihood estimate of the original data distribution. For example, suppose that the data providers randomize their data using the random response operator in (2). Let r i be random variables uniformly distributed on [0, 1]. Let θ i be 0.3. The distribution reconstructed by EM algorithm is stated as follows. Pr{x i =0} = 7 4 Pr{R(x i)=1} 3 4 Pr{R(x i)=0}, (3) Pr{x i =1} = 7 4 Pr{R(x i)=0} 3 4 Pr{R(x i)=1}. (4) Also in the second step, a malicious data miner may invade privacy by using a private data recovery algorithm. This algorithm is used to recover individual data values from the randomized data supplied by the data providers. Figure 1 depicts the architecture of the system. Clearly, any privacy preserving data mining system should be measured by its capacity of both constructing the accurate data mining results and protecting individual data values from being compromised by the malicious data miners.
4 Fig. 1. System Model 4 Quantification of Accuracy In this section, we study the measure of the accuracy of data mining results. First, we briefly review previous accuracy measures and address their problem. Then, we propose a new accuracy measure named effective sample size and derive an analytical bound on it. 4.1 Previous Measures In previous studies, several accuracy measures have been proposed. We classify these measures into two categories. One category is applicationspecified accuracy measures. Measures in this category are specified to particular data mining applications. For example, in the MASK system [18] for privacy preserving association rule mining, the measurement of accuracy includes two measures, named support error and identity error, respectively. Support error is the average error on the support of identified frequent itemsets. Identity error measures the average probability of that frequent itemset is not identified. These measures are specified to association rule mining and cannot be applied to other data mining applications (e.g., data classification). The other category is general accuracy measures. Measures in this category can be applied to any privacy preserving data mining systems based on the randomization approach. An existing measure in this category is information loss measure [1]. Let p be the reconstructed distribution. The information loss measure I(p X, p) is defined as I(p X, p) = 1 ] 2 [ V E p X (x) p(x) dx, (5) X which is in proportion to the expected error of the reconstructed distribution.
5 4.2 Problem of Previous Measures We remark that the ultimate goal of the performance measurements is to help the system designers to choose the optimal randomization operator. As we can see from the privacy preserving data mining process in Section 3, the randomization operator has to be determined before any data is transferred from the data providers to the data miner. Thus, in order to reach its goal, a performance measure must be estimated or bounded without any knowledge of the data being mined. As we can see, the applicationspecified accuracy measures depend on both the reconstructed data distribution and the performance of data mining algorithm. The information loss measure depends on both the original distribution and the reconstructed distribution. Neither measure can be estimated or bounded when the data distribution is not known. Thus, previous measures cannot be used by the system designers to choose the optimal randomization operator. 4.3 Effective Sample Size We now propose effective sample size as our new accuracy measure. Roughly speaking, given the number of the randomized data points, the effective sample size is in proportion to the minimum number of original data points that can make an estimate of the data distribution as accurate as the distribution reconstructed from the randomized data points. The formal definition is stated as follows. Definition 1. Suppose that the system consists of n data providers and one data miner. Given randomization operator R : V X V Y, let p be the maximum likelihood estimate of the distribution of x i reconstructed from R(x 1 ),..., R(x n ). Recall that p X is the original distribution of x i. Let p 0 (k) be the maximum likelihood estimate of the distribution based on k random variables generated from distribution p X. We define the effective sample size r as the minimum value of k/n such that D Kol ( p 0 (k),p X ) D Kol ( p, p X ) (6) where D Kol is the Kolmogorov distance [16], which measures the distance between an estimated distribution and the theoretical distribution 1. As we can see, effective sample size is a general accuracy measure which measures the accuracy of the reconstructed distribution. Effective sample size is a function of three parameters: n, R, and p X. As we can see from the simulation result in Figure 2, the minimum value of k is (almost) in proportion to n. Thus, we can reduce the effective sample size to a function of R and p X. We now show that the effective sample size can be strictly bounded without any knowledge of p X. Theorem 1. Recall that p[x y] is the probability transition function of R : V X V Y. An upper bound on the effective sample size r is given as follows. r 1 y V Y min p[x y]. (7) x V X 1 Other measures of such distance (e.g., Kuiper distance, AndersonDarling distance, etc) can also be used to define the effective sample size. The use of other measures does not influence the results in this paper.
6 15 10 min k Number of Data Providers (n) 10 4 Fig. 2. Relationship between min k and n Proof. We denote Pr{x i = x} and Pr{R(x i )=y} by p(x) and p(y), respectively. We have p(y) = p(x)p[x y] (8) x V X =minp[x y]+ p(x)(p[x y] min p[x y]) (9) x V X x V X x V X We separate R into two operators, R 1 and R 2, such that R( ) = R 2 (R 1 ( )). Let y V Y min x VX p[x y] be p 0. Note that p 0 1. Let e V X V Y be a symbol which represents a denialofservice. Note that no private information can be infered from e. R 1 and R 2 are stated as follows. e, with probability p 0, y R 1 (x) = 1, with probability p[x y 1 ] min x VX p[x y 1 ], (10),, y VY, with probability p[x y VY ] min x VX p[x y VY ], z, if z e, y R 2 (z) = 1, if z = e and with probability (min x VX p[x y 1 ])/p 0, (11),, y VY, if z = e and with probability (min x VX p[x y VY ])/p 0, Here, y 1,,y VY are all possible values occur in V Y. That is, V Y = {y 1,,y VY }. We now show the equivalence between R( ) and R 2 (R 1 ( )). For all x V X,y V Y, we have Pr{R 2 (R 1 (x)) = y} (12) =Pr{R 1 (x) =e} Pr{R 2 (R 1 (x)) = y R 1 (x) =e} +Pr{R 1 (x) =y} Pr{R 2 (R 1 (x)) = y R 1 (x) =y} (13) min x VX p[x y] =p 0 + p[x y] min p[x y], p 0 x V X (14) =p[x y]. (15)
7 Note that R 2 is only determined by p[x y], which is the probability transition function of R. Suppose that the data providers use R 1 to randomized their data. The data miner can always construct R(x i ) from R 1 (x i ) using its knowledge of R. Thus, the effective sample size when R is used is always less than or equal to the effective sample size when R 1 is used. That is, r 1 p 0 =1 min p[x y]. (16) x V X y V Y This bound only depends on the randomization operator R. It is independent of the number of data providers n and the original data distribution p X. As we can see, the bound can be met if and only if for any given x V X, there exists no more than one y i V Y, such that 5 Quantification of Privacy Protection p[x y i ] > p 0 V Y. (17) In this section, we address issues related to the measurement of privacy protection in privacy preserving data mining. First, we briefly review the previous measures of privacy protection. Then, we identify a tacit assumption made by previous measures which is unrealistic in practice. To solve the problem, we propose a new privacy measure based on a game theoretic framework. 5.1 Previous Measures In previous studies, two kinds of privacy measures have been proposed. One kind of measure is information theoretic measure [1], which measures privacy by the mutual information between the original data x i and the randomized data R(x i ) (i.e., I(x i ; R(x i ))). This measure is a statistical measurement of the privacy disclosure. In [9], the authors challenge the information theoretic measure and remark that there exist certain kinds of privacy disclosure that cannot be captured by this measure. For example, suppose that for a certain y V Y, a data miner can almost certainly infer that x i = y from R(x i )=y (i.e., Pr{x i = y R(x i )=y} 1). This privacy disclosure is serious because if a data provider knows the disclosure, it will purposely change its randomized data if the randomized data value happens to be y. However, the information theoretic measure cannot capture this privacy disclosure if the occurrence of y has a fairly low probability (i.e., Pr{R(x i )=y} 0). The reason is that the mutual information only measures the average information that is disclosed to the data miner. The other kind of privacy measure is proposed to solve the problem of the information theoretic measure. Privacy measures of this kind includes privacy breach measure [9] and intervalbased privacy measures [3,21]. We use the privacy breach measure as an example. Due to the privacy breach measure, the level of privacy protection is determined by p[x y] max x,x V X p[x (18) y]
8 for any given y V Y. This measure captures the worst case privacy disclosure and can guarantee a bound on the level of privacy protection without any knowledge of the original data distribution. However, we remark that this measure solves the problem of the information theoretic measure by an exact reverse. That is, the privacy breach measure is (almost) independent of the average information disclosure and only depends on the privacy disclosure in the worst case. We will show the problem of previous measures as follows. 5.2 Problem of Previous Measures For the measurement of privacy, we need to define the privacy of data providers first. In the dictionary, privacy is defined to be the capacity of the data providers to be freedom from unauthorized intrusion [17]. As we can see from the definition, the effectiveness of privacy protection depends on whether a malicious data miner can perform unauthorized intrusion to the data providers. The privacy loss of the data providers is measured by the gain of the data miner from unauthorized intrusions. Thus, the privacy protection measure depends on two important factors: a) the privacy protection mechanism of the data providers, and b) the unauthorized intrusion technique of the data miner. The data miner has the freedom to choose different intrusion techniques in different circumstances. Thus, the intrusion technique of the data miner should always be considered in the measurement of privacy. However, previous measures do not follow this principle. Both information theoretic measure and privacy breach measure do not address the variety of intrusion techniques. Instead, they make a tacit assumption that all data miners will use the same intrusion technique. This assumption seems to be reasonable as a (rational) data miner will always choose the intrusion technique that compromises the most private information. However, as we will show below, the optimal intrusion technique varies in different circumstances. Thereby, the absence of consideration of intrusion techniques results in problems of the privacy measurement. Example 1. Suppose that V X = {0, 1}. The original data x i is uniformly distributed on V X. The system designer needs to determine which of the following two randomization operators, R 1 and R 2, discloses less private information. { x, with probability 0.70, R 1 (x) = (19) x, with probability , if x =0, R 2 (x) = 1, if x =1and with probability 0.01, (20) 0, if x =1and with probability In the example, the mutual information I(x; R 1 (x)) is much greater than I(x; R 2 (x)). That is, the average amount of private information disclosed by R 1 is much greater than R 2. Due to the information theoretic measure, R 2 is better than R 1 in the privacy protection perspective. The result is different when the privacy breach measure is used. As we can see, if the data miner receives R 2 (x i )=1, then it can always infer that x i =1with probability of 1. Thus, the worstcase privacy loss of R 2 is much greater than that of R 1. Due to the privacy breach measure, R 1 is better than R 2 in the privacy protection perspective.
9 We now show that whether R 1 or R 2 is better actually depends on the system setting. In particular, we consider the following two system settings. 1. The system is an online survey system where the survey analyzer and the survey respondents are the data miner and the data providers, respectively. The value of x i indicates whether a survey respondent is interested in buying certain merchandise. The intrusion performed by a malicious data miner is to make unauthorized advertisement to data providers with such interest. 2. The system consists of n companies as the data providers and a management consulting firm as the data miner. The consulting firm performs statistical analysis on the financial data of the companies. The original data x i contains the expected profit of the company which has not been published yet. As the unauthorized intrusion, a malicious data miner may use x i to make investment on a highrisk stock market. The profit from a successful investment is great. However, a failed investment results in a loss five times greater than the profit the data miner may obtain from a successful investment. In the first case, an advertisement to a wrong person costs the data miner little. A reasonable strategy for the data miner is to make advertisement to all data providers. In fact, if the expected loss from an incorrect estimate (i.e., advertisement to a person without interest) is equal to 0, this is the optimal intrusion technique for the data miner. Consider the two randomization operators, R 1 discloses the original data value with probability of 0.7, which is greater than that of R 2 (0.501). Thus, R 2 is better than R 1 in the privacy protection perspective. In the second case, the data miner will not perform the intrusion when R 1 is used by the data providers. The reason is that the loss from a failed investment (i.e., an incorrect estimate on x i ) is unaffordable. Even if the profit from a successful investment is fairly high, the loss from a wrong decision is too high to risk. That is, for the data miner, the expected net benefit from an unauthorized intrusion is less than 0. However, the data miner will perform the intrusion if a randomized data R 2 (x i )=1 is received the data miner. The reason is that the data miner has a fairly high probability (99%) to make a successful investment. If a randomized data R 2 (x i )=0is received, the data miner will simply ignore it. Thus, in this case, R 1 is better than R 2 in the privacy protection perspective. As we can see from the example, the data miner will choose different privacy intrusion techniques in different system settings. This will result in different performance of the randomization operators. Thus, the system setting and the privacy intrusion technique has to be considered in the measurement of privacy. 5.3 A Game Theoretic Framework In order to introduce the system setting and the privacy intrusion technique to our privacy measure, we first propose a game theoretic framework to analyze the strategies of the data miner (i.e., privacy intrusion technique). Since we are studying the privacy protection performance of the randomization operator, we consider the randomization operator as the strategy of the data providers.
10 We model the privacy preserving data mining process as a noncooperative game between the data providers and the data miner. There are two players in the game. One is the data providers. The other is the data miner. Since we only consider the privacy measure, the game is zerosum in that the benefit obtained by the server from unauthorized intrusions always results in an invasion of the privacy of the data providers. Let S c be the set of randomization operators that the data providers can choose from. Let S s be the set of the intrusion techniques that the data miner can choose from. Let u c and u s be the payoffs (i.e., expected benefits) of the data providers and the data miner, respectively. Since the game is zerosum, we have u c + u s =0. We remark that the payoffs depend on both the strategies of the players and the system setting. We assume that both the data providers and the data miner are rational. That is, given a certain randomization operator, the data miner always choose the optimal privacy intrusion technique that can maximize its payoff u s. Given a certain privacy intrusion technique, the data providers always choose the optimal randomization operator that can maximize u c. Due to game theory, if a Nash equilibrium 2 exists in the game, it contains the optimal strategies for both the data providers and the data miner [11]. 5.4 Our Privacy Measure Now we will define our privacy measure based on the game theoretic formulation. Definition 2. Given a privacy preserving data mining system G (S s,s c,u s,u c ),we define the privacy measure l p of a randomization operator R as l p (R) =u c (R, L 0 ), (21) where L 0 is the optimal privacy intrusion technique for the data miner when R is used by the data providers, u c is the payoff of the data providers when R and L 0 are used. As we can see, the smaller l p is, the more benefit is obtained by the data miner from the unauthorized intrusion. We now use an example to illustrate the definition. Example 2. Let V X be {0, 1}. Support that the original data x i is uniformly distributed on V X. A system designer wants to make a comparison between the privacy preserving capacity of randomization operators R 1 and R 2, which are shown as follows. { xi, with probability 0.60; R 1 (x i )= x i, with probability 0.40; { xi, with probability 0.01; R 2 (x i )= e, with probability 0.99; (22) (23) where e is a denialofservice signal which satisfies e {0, 1}. As we can see, no private information can be inferred from e. Thus, without loss of generality, we suppose that the data miner ignores a data point if it has a value of e. Due to the information theoretic measure, R 2 is better than R 1. Due to the private breach measure, R 1 is better 2 Roughly speaking, a Nash equilibrium is a condition where no player can benefit by changing its own strategy unilaterally while the other player keeps its current strategy.
11 than R 2. We will analyze the problem based on our privacy measure in a game theoretic formulation. Since the comparison is between R 1 and R 2, we assume that the data providers can only choose the randomization operator from {R 1,R 2 }. That is, S c = {R 1,R 2 }. For a given system setting, let the optimal intrusion technique for the data miner be L 0. We now propose a specific intrusion technique L 1. Roughly speaking, L 1 represents an intrusion technique that infers x i = R (x i ) if and only if R (x i ) e. Wehave {L 0,L 1 } S s. Since Pr{R 2 (x i )= x i } =0, L 1 is the optimal intrusion technique for the data miner when R 2 is the randomization operator. That is, L 0 = L 1 when R 2 is used by the data providers. The strategies and payoffs are listed in Table 1, where Table 1. Strategies and Payoffs L 0 L 1 R 1 u 0/u 0 u 1/u 1 R 2 u 2/u 2 u 2/u 2 u 0, u 1 and u 2 are the payoffs of the data miner in different circumstances. Due to the assumption that L 0 is the optimal intrusion technique, we always have u 0 u 1. The comparison between u 1 and u 2 depends on the system setting. Recall the two system settings in Example 1. In the online survey example, we have u 1 >u 2. In the stock market example, a reasonable estimation is u 1 u 0 u 2. Let C, S be the strategies of the data providers and the data miner, respectively. We consider the comparison between R 1 and R 2 in different cases as follows. 1. u 1 >u 2 There are two Nash equilibria in the game: C, S = R 2,L 0 and C, S = R 2,L 1. Thus, R 2 is a better choice for the data providers in the privacy protection perspective. 2. u 1 <u 2,u 0 >u 2 Only one Nash equilibrium C, S = R 2,L 0 exists in the game. Thus, R 2 is a better choice for the data providers in the privacy protection perspective. 3. u 1 <u 2,u 0 <u 2 Only one Nash equilibrium C, S = R 1,L 0 exists in the game. Thus, R 1 is a better choice for the data providers in the privacy protection perspective. As we can see, the comparison between the privacy preserving capacity of R 1 and R 2 depends on the comparison between u 1 and u 2, which is determined by the ratio between the benefit from a correct estimate and the loss from an incorrect estimate. Let the ratio be σ. In the above case, we have σ = A useful theorem is provided as follows. gain from a correct estimate loss from an incorrect estimate = 40u 2. (24) 60u 2 u 1
12 Theorem 2. Suppose that in the original data distribution, we have max Pr{x i = x 0 } = p m. (25) x 0 V X If the randomization operator R : V X V Y satisfies max x VX p[x y] max y V Y min x VX p[x y] 1 p m, (26) σp m then the privacy measure l p (R) =0. The proof of Theorem 2 is omitted due to space limit. 6 Conclusion In this paper, we establish the foundation for the measurements of accuracy and privacy protection in privacy preserving data mining. On accuracy side, we address the problem of previous accuracy measures and solve the problem by introducing an effective sample size measure. On privacy protection side, we first identify an unrealistic assumption tacitly made by previous measures. After that, we present a game theoretic formulation of the system and propose a privacy protection measure based on the formulation. We conclude this paper with some future research directions. Design the optimal randomization operator based on the new accuracy and privacy protection measures. Further analysis on the performance of data mining algorithms. Most existing theoretic analysis on the performance of privacy preserving data mining techniques is based on the assumption of ideal data mining algorithm. The performance of practical data mining algorithms has only been analyzed through heuristic results. However, as shown in [4], the difference between practical and ideal data mining algorithms can be nontrivial. Further analysis on this issue is needed to measure the performance of randomization operators more precisely. References 1. D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the 20th ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, pages ACM Press, R. Agrawal and R. Srikant. Privacypreserving data mining. In Proceedings of the 26th ACM SIGMOD International Conference on Management of Data, pages ACM Press, R. Agrawal and R. Srikant. Privacypreserving data mining. In Proceedings of the 26th ACM SIGMOD Conference on Management of Data, pages ACM Press, C. Clifton. Using sample size to limit exposure to data mining. Journal of Computer Security, 8(4): , W. Du and M. Atallah. Privacypreserving cooperative statistical analysis. In Proceedings of the 17th Annual Computer Security Applications Conference, page 102. IEEE Computer Society, 2001.
13 6. W. Du, Y. S. Han, and S. Chen. Privacypreserving multivariate statistical analysis: Linear regression and classification. In Proceedings of the 4th SIAM International Conference on Data Mining, pages SIAM Press, W. Du and Z. Zhan. Building decision tree classifier on private data. In Proceedings of the IEEE International Conference on Privacy, Security and Data Mining, pages 1 8. Australian Computer Society, Inc., W. Du and Z. Zhan. Using randomized response techniques for privacypreserving data mining. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacy breaches in privacy preserving data mining. In Proceedings of the 22nd ACM SIGMODSIGACTSIGART Symposium on Principles of Database Systems, pages ACM Press, A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke. Privacy preserving mining of association rules. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, R. Gibbons. A Primer in Game Theory. Harvester Wheatsheaf, New York, O. Goldreich. Secure MultiParty Computation. Cambridge University Press, M. Kantarcioglu and C. Clifton. Privacypreserving distributed mining of association rules on horizontally partitioned data. IEEE Transactions on Knowledge and Data Engineering, 16(9): , M. Kantarcioglu and J. Vaidya. Privacy preserving naïve bayes classifier for horizontally partitioned data. In Workshop on Privacy Preserving Data Mining held in association with The 3rd IEEE International Conference on Data Mining, Y. Lindell and B. Pinkas. Privacy preserving data mining. In Proceedings of the 20th Annual International Cryptology Conference on Advances in Cryptology, pages Springer Verlag, F. J. Massey. The KolmogorovSmirnov test for goodness of fit. Journal of American Statistical Association, 46(253):68 78, MerriamWebster. MerriamWebster s Collegiate Dictionary. MerriamWebster, Inc., S. J. Rizvi and J. R. Haritsa. Maintaining data privacy in association rule mining. In Proceedings of the 28th International Conference on Very Large Data Bases, pages Morgan Kaufmann, J. Vaidya and C. Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, J. Vaidya and C. Clifton. Privacy preserving naïve bayes classifier for vertically partitioned data. In Proceedings of the 4th SIAM Conference on Data Mining, pages SIAM Press, Y. Zhu and L. Liu. Optimal randomization for privacy preserving data mining. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages ACM Press, 2004.
A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment
www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,
More informationData mining successfully extracts knowledge to
C O V E R F E A T U R E PrivacyPreserving Data Mining Systems Nan Zhang University of Texas at Arlington Wei Zhao Rensselaer Polytechnic Institute Although successful in many applications, data mining
More informationPreserving Privacy in Data Preparation for. Association Rule Mining
Preserving Privacy in Data Preparation for Association Rule Mining Nan Zhang, Shengquan Wang, and Wei Zhao, Fellow, IEEE Abstract We address the privacy preserving association rule mining problem in a
More informationInternational Journal of Scientific & Engineering Research, Volume 4, Issue 10, October2013 ISSN 22295518 1582
1582 AN EFFICIENT CRYPTOGRAPHIC APPROACH FOR PRESERVING PRIVACY IN DATA MINING T.Sujitha 1, V.Saravanakumar 2, C.Saravanabhavan 3 1. M.E. Student, Sujiraj.me@gmail.com 2. Assistant Professor, visaranams@yahoo.co.in
More informationA Game Theoretical Framework for Adversarial Learning
A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,
More informationBuilding Decision Tree Classifier on Private Data
Building Decision Tree Classifier on Private Data Wenliang Du Zhijun Zhan Center for Systems Assurance Department of Electrical Engineering and Computer Science Syracuse University, Syracuse, NY 13244,
More informationA Secure Model for Medical Data Sharing
International Journal of Database Theory and Application 45 A Secure Model for Medical Data Sharing Wong Kok Seng 1,1,Myung Ho Kim 1, Rosli Besar 2, Fazly Salleh 2 1 Department of Computer, Soongsil University,
More informationPrivacy Preserving Outsourcing for Frequent Itemset Mining
Privacy Preserving Outsourcing for Frequent Itemset Mining M. Arunadevi 1, R. Anuradha 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1 Assistant
More informationPrivacy Preserved Association Rule Mining For Attack Detection and Prevention
Privacy Preserved Association Rule Mining For Attack Detection and Prevention V.Ragunath 1, C.R.Dhivya 2 P.G Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,
More informationInformation Security in Big Data using Encryption and Decryption
International Research Journal of Computer Science (IRJCS) ISSN: 23939842 Information Security in Big Data using Encryption and Decryption SHASHANK PG Student II year MCA S.K.Saravanan, Assistant Professor
More informationHomomorphic Encryption Schema for Privacy Preserving Mining of Association Rules
Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology
More informationIs Privacy Still an Issue for Data Mining? (Extended Abstract)
Is Privacy Still an Issue for Data Mining? (Extended Abstract) Chris Clifton Wei Jiang Mummoorthy Muruguesan M. Ercan Nergiz Department of Computer Science Purdue University 305 North University Street
More informationEnhancement of Security in Distributed Data Mining
Enhancement of Security in Distributed Data Mining Sharda Darekar 1, Prof.D.K.Chitre, 2 1,2 Department Of Computer Engineering, Terna Engineering College,Nerul,Navi Mumbai. 1 sharda.darekar@gmail.com,
More informationExperimental Analysis of PrivacyPreserving Statistics Computation
Experimental Analysis of PrivacyPreserving Statistics Computation Hiranmayee Subramaniam 1, Rebecca N. Wright 2, and Zhiqiang Yang 2 1 Stevens Institute of Technology graduate, hiran@polypaths.com. 2
More informationData Outsourcing based on Secure Association Rule Mining Processes
, pp. 4148 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Taihoon Kim
More informationPerforming Data Mining in (SRMS) through Vertical Approach with Association Rules
Performing Data Mining in (SRMS) through Vertical Approach with Association Rules Mr. Ambarish S. Durani 1 and Miss. Rashmi B. Sune 2 MTech (III rd Sem), Vidharbha Institute of Technology, Nagpur, Nagpur
More informationPrivacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2
IJSRD  International Journal for Scientific Research & Development Vol. 2, Issue 04, 2014 ISSN (online): 23210613 Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2 1 M.Tech
More informationInternational Journal of Advanced Computer Technology (IJACT) ISSN:23197900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS
PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.
More informationPBKM: A Secure Knowledge Management Framework
PBKM: A Secure Knowledge Management Framework (extended abstract) Shouhuai Xu and Weining Zhang Department of Computer Science, University of Texas at San Antonio {shxu,wzhang}@cs.utsa.edu Abstract In
More informationDATA MINING  1DL360
DATA MINING  1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationSharing Online Advertising Revenue with Consumers
Sharing Online Advertising Revenue with Consumers Yiling Chen 2,, Arpita Ghosh 1, Preston McAfee 1, and David Pennock 1 1 Yahoo! Research. Email: arpita, mcafee, pennockd@yahooinc.com 2 Harvard University.
More informationA Survey of Quantification of Privacy Preserving Data Mining Algorithms
A Survey of Quantification of Privacy Preserving Data Mining Algorithms Elisa Bertino, Dan Lin, and Wei Jiang Abstract The aim of privacy preserving data mining (PPDM) algorithms is to extract relevant
More informationSharing Online Advertising Revenue with Consumers
Sharing Online Advertising Revenue with Consumers Yiling Chen 2,, Arpita Ghosh 1, Preston McAfee 1, and David Pennock 1 1 Yahoo! Research. Email: arpita, mcafee, pennockd@yahooinc.com 2 Harvard University.
More informationPrivacypreserving Data Mining: current research and trends
Privacypreserving Data Mining: current research and trends Stan Matwin School of Information Technology and Engineering University of Ottawa, Canada stan@site.uottawa.ca Few words about our research Universit[é
More informationPrivacy Preserving Clustering on Horizontally Partitioned Data
Privacy Preserving Clustering on Horizontally Partitioned Data Ali nan, Yücel Saygın, Erkay Sava, Ayça Azgın Hintolu, Albert Levi Faculty of Engineering and Natural Sciences, Sabancı University, Istanbul,
More informationTemplateBased Privacy Preservation in Classification Problems
TemplateBased Privacy Preservation in Classification Problems Ke Wang Simon Fraser University BC, Canada V5A S6 wangk@cs.sfu.ca Benjamin C. M. Fung Simon Fraser University BC, Canada V5A S6 bfung@cs.sfu.ca
More informationPRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM
PRIVACY PRESERVING DATA MINING BY USING IMPLICIT FUNCTION THEOREM Pasupuleti Rajesh 1 and Gugulothu Narsimha 2 1 Department of Computer Science and Engineering, VVIT College, Guntur, India rajesh.pleti@gmail.com
More informationOLAP Online Privacy Control
OLAP Online Privacy Control M. Ragul Vignesh and C. Senthil Kumar Abstract The major issue related to the protection of private information in online analytical processing system (OLAP), is the privacy
More informationOn the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information
Finance 400 A. Penati  G. Pennacchi Notes on On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information by Sanford Grossman This model shows how the heterogeneous information
More informationInternational Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
More informationAnalysis of PrivacyPreserving Element Reduction of Multiset
Analysis of PrivacyPreserving Element Reduction of Multiset Jae Hong Seo 1, HyoJin Yoon 2, Seongan Lim 3, Jung Hee Cheon 4 and Dowon Hong 5 1,4 Department of Mathematical Sciences and ISaCRIM, Seoul
More informationPrivacyPreserving Outsourcing Support Vector Machines with Random Transformation
PrivacyPreserving Outsourcing Support Vector Machines with Random Transformation KengPei Lin MingSyan Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Research Center
More informationDATA MINING  1DL105, 1DL025
DATA MINING  1DL105, 1DL025 Fall 2009 An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/ht09 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
More informationPreserving Privacy in Supply Chain Management: a Challenge for Next Generation Data Mining 1
Preserving Privacy in Supply Chain Management: a Challenge for Next Generation Data Mining 1 Madhu Ahluwalia, Zhiyuan Chen, Aryya Gangopadhyay 2, Zhiling Guo {madhu.is, zhchen, gangopad, zgou}@umbc.edu
More informationA Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases
A Novel Technique of Privacy Protection Mining of Association Rules from Outsource Transaction Databases 1 Dhananjay D. Wadkar, 2 Santosh N. Shelke 1 Computer Engineering, Sinhgad Academy of Engineering
More informationThe Optimality of Naive Bayes
The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most
More informationPRIVACY PRESERVING DATA MINING OVER VERTICALLY PARTITIONED DATA. A Thesis. Submitted to the Faculty. Purdue University. Jaideep Shrikant Vaidya
PRIVACY PRESERVING DATA MINING OVER VERTICALLY PARTITIONED DATA A Thesis Submitted to the Faculty of Purdue University by Jaideep Shrikant Vaidya In Partial Fulfillment of the Requirements for the Degree
More informationA Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks Lin Chen, Member, IEEE, and Jean Leneutre
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL 4, NO 2, JUNE 2009 165 A Game Theoretical Framework on Intrusion Detection in Heterogeneous Networks Lin Chen, Member, IEEE, and Jean Leneutre
More informationAccuracy in PrivacyPreserving Data Mining Using the Paradigm of Cryptographic Elections
Accuracy in PrivacyPreserving Data Mining Using the Paradigm of Cryptographic Elections Emmanouil Magkos 1, Manolis Maragoudakis 2, Vassilis Chrissikopoulos 1, and Stefanos Gridzalis 2 1 Department of
More informationBarriers to Adopting Privacypreserving Data Mining
Barriers to Adopting Privacypreserving Data Mining Richard Huebner Norwich University ABSTRACT The primary issue examined in this research is that privacypreserving data mining (PPDM) research has produced
More informationA GENERAL SURVEY OF PRIVACYPRESERVING DATA MINING MODELS AND ALGORITHMS
Chapter 2 A GENERAL SURVEY OF PRIVACYPRESERVING DATA MINING MODELS AND ALGORITHMS Charu C. Aggarwal IBM T. J. Watson Research Center Hawthorne, NY 10532 charu@us.ibm.com Philip S. Yu IBM T. J. Watson
More informationOptimization Based Data Mining in Business Research
Optimization Based Data Mining in Business Research Praveen Gujjar J 1, Dr. Nagaraja R 2 Lecturer, Department of ISE, PESITM, Shivamogga, Karnataka, India 1,2 ABSTRACT: Business research is a process of
More informationPRIVACY PRESERVING ASSOCIATION RULE MINING
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,
More informationCurrent Developments of kanonymous Data Releasing
Current Developments of kanonymous Data Releasing Jiuyong Li 1 Hua Wang 1 Huidong Jin 2 Jianming Yong 3 Abstract Disclosurecontrol is a traditional statistical methodology for protecting privacy when
More informationUsing Data Mining Methods to Predict Personally Identifiable Information in Emails
Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,
More informationEncyclopedia of Information Ethics and Security
Encyclopedia of Information Ethics and Security Marian Quigley Monash University, Australia InformatIon ScIence reference Hershey New York Acquisitions Editor: Development Editor: Senior Managing Editor:
More informationCity University of Hong Kong. Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015
City University of Hong Kong Information on a Course offered by Department of Computer Science with effect from Semester A in 2014 / 2015 Part I Course Title: Fundamentals of Data Science Course Code:
More informationEnhancing Wireless Security with Physical Layer Network Cooperation
Enhancing Wireless Security with Physical Layer Network Cooperation Amitav Mukherjee, Ali Fakoorian, A. Lee Swindlehurst University of California Irvine The Physical Layer Outline Background Game Theory
More informationSelfDisciplinary Worms and Countermeasures: Modeling and Analysis
SelfDisciplinary Worms and Countermeasures: Modeling and Analysis Wei Yu, Nan Zhang, Xinwen Fu, and Wei Zhao Abstract In this paper, we address issues related to the modeling, analysis, and countermeasures
More informationEquilibrium computation: Part 1
Equilibrium computation: Part 1 Nicola Gatti 1 Troels Bjerre Sorensen 2 1 Politecnico di Milano, Italy 2 Duke University, USA Nicola Gatti and Troels Bjerre Sørensen ( Politecnico di Milano, Italy, Equilibrium
More informationExample application (1) Telecommunication. Lecture 1: Data Mining Overview and Process. Example application (2) Health
Lecture 1: Data Mining Overview and Process What is data mining? Example applications Definitions Multi disciplinary Techniques Major challenges The data mining process History of data mining Data mining
More informationNetwork Security Validation Using Game Theory
Network Security Validation Using Game Theory Vicky Papadopoulou and Andreas Gregoriades Computer Science and Engineering Dep., European University Cyprus, Cyprus {v.papadopoulou,a.gregoriades}@euc.ac.cy
More informationBargaining Solutions in a Social Network
Bargaining Solutions in a Social Network Tanmoy Chakraborty and Michael Kearns Department of Computer and Information Science University of Pennsylvania Abstract. We study the concept of bargaining solutions,
More informationProbabilistic Graphical Models Final Exam
Introduction to Probabilistic Graphical Models Final Exam, March 2007 1 Probabilistic Graphical Models Final Exam Please fill your name and I.D.: Name:... I.D.:... Duration: 3 hours. Guidelines: 1. The
More informationNash Equilibrium. Ichiro Obara. January 11, 2012 UCLA. Obara (UCLA) Nash Equilibrium January 11, 2012 1 / 31
Nash Equilibrium Ichiro Obara UCLA January 11, 2012 Obara (UCLA) Nash Equilibrium January 11, 2012 1 / 31 Best Response and Nash Equilibrium In many games, there is no obvious choice (i.e. dominant action).
More informationA RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS
A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS Nitin Trivedi, Research Scholar, Manav Bharti University, Solan HP ABSTRACT The purpose of this study is not to delve deeply into the technical
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationSecurity Analysis for Order Preserving Encryption Schemes
Security Analysis for Order Preserving Encryption Schemes Liangliang Xiao University of Texas at Dallas Email: xll052000@utdallas.edu Osbert Bastani Harvard University Email: obastani@fas.harvard.edu ILing
More informationA Protocol for Privacy Preserving Neural Network Learning on Horizontally Partitioned Data
A Protocol for Privacy Preserving Neural Network Learning on Horizontally Partitioned Data Nico Schlitter Faculty of Computer Science OttovonGuerickeUniversity Magdeburg, Germany nico.schlitter@iti.cs.unimagdeburg.de
More informationOnline Adwords Allocation
Online Adwords Allocation Shoshana Neuburger May 6, 2009 1 Overview Many search engines auction the advertising space alongside search results. When Google interviewed Amin Saberi in 2004, their advertisement
More informationarxiv:1402.3426v3 [cs.cr] 27 May 2015
Privacy Games: Optimal UserCentric Data Obfuscation arxiv:1402.3426v3 [cs.cr] 27 May 2015 Abstract Consider users who share their data (e.g., location) with an untrusted service provider to obtain a personalized
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationDATA MINING, DIRTY DATA, AND COSTS (ResearchinProgress)
DATA MINING, DIRTY DATA, AND COSTS (ResearchinProgress) Leo Pipino University of Massachusetts Lowell Leo_Pipino@UML.edu David Kopcso Babson College Kopcso@Babson.edu Abstract: A series of simulations
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationCompact Representations and Approximations for Compuation in Games
Compact Representations and Approximations for Compuation in Games Kevin Swersky April 23, 2008 Abstract Compact representations have recently been developed as a way of both encoding the strategic interactions
More informationNon Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
More informationECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015
ECON 459 Game Theory Lecture Notes Auctions Luca Anderlini Spring 2015 These notes have been used before. If you can still spot any errors or have any suggestions for improvement, please let me know. 1
More informationDSL Spectrum Management
DSL Spectrum Management Dr. Jianwei Huang Department of Electrical Engineering Princeton University Guest Lecture of ELE539A March 2007 Jianwei Huang (Princeton) DSL Spectrum Management March 2007 1 /
More informationNetwork Security A Decision and GameTheoretic Approach
Network Security A Decision and GameTheoretic Approach Tansu Alpcan Deutsche Telekom Laboratories, Technical University of Berlin, Germany and Tamer Ba ar University of Illinois at UrbanaChampaign, USA
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationIMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA
International Conference on Computer Science, Electronics & Electrical Engineering0 IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA Pavan M N, Manjula G Dept Of ISE,
More informationIntroduction to Learning & Decision Trees
Artificial Intelligence: Representation and Problem Solving 538 April 0, 2007 Introduction to Learning & Decision Trees Learning and Decision Trees to learning What is learning?  more than just memorizing
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationPredict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons
More informationPriority Based Load Balancing in a SelfInterested P2P Network
Priority Based Load Balancing in a SelfInterested P2P Network Xuan ZHOU and Wolfgang NEJDL L3S Research Center Expo Plaza 1, 30539 Hanover, Germany {zhou,nejdl}@l3s.de Abstract. A fundamental issue in
More informationOnline Scheduling for Cloud Computing and Different Service Levels
2012 IEEE 201226th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum Online Scheduling for
More information6.896 Probability and Computation February 14, Lecture 4
6.896 Probability and Computation February 14, 2011 Lecture 4 Lecturer: Constantinos Daskalakis Scribe: Georgios Papadopoulos NOTE: The content of these notes has not been formally reviewed by the lecturer.
More informationExplanationOriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
ExplanationOriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
More informationRandom Projectionbased Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining
Random Projectionbased Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Kun Liu Hillol Kargupta and Jessica Ryan Abstract This paper explores the possibility of using multiplicative
More informationData Mining Project Report. Document Clustering. Meryem UzunPer
Data Mining Project Report Document Clustering Meryem UzunPer 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. Kmeans algorithm...
More informationAn Introduction to Competitive Analysis for Online Optimization. Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002)
An Introduction to Competitive Analysis for Online Optimization Maurice Queyranne University of British Columbia, and IMA Visitor (Fall 2002) IMA, December 11, 2002 Overview 1 Online Optimization sequences
More informationSurvey on Data Privacy in Big Data with K Anonymity
Survey on Data Privacy in Big Data with K Anonymity Salini. S, Sreetha. V. Kumar, Neevan.R M.Tech Student, Dept of CSE, Marian Engineering College, Trivandrum, Kerala, India Asst. Professor, Dept of CSE,
More informationWhen Random Sampling Preserves Privacy
When Random Sampling Preserves Privacy Kamalika Chaudhuri 1 and Nina Mishra 2 1 Computer Science Department, UC Berkeley, Berkeley, CA 94720 2 Computer Science Department, University of Virginia, Charlottesville,
More informationGlobe Tech, Inc. 76 Northeastern Blvd., Suite #30B Nashua, NH Fax PrivGuard an eprivacy Solution
Globe Tech, Inc. 76 Northeastern Blvd., Suite #30B Nashua, NH 03062 6038898833 Fax 6035790892 www.gti.com Protecting Private Healthcare Information (PHI) PrivGuard an eprivacy Solution As a result
More informationSharing Online Advertising Revenue with Consumers
Sharing Online Advertising Revenue with Consumers Yiling Chen 2,, Arpita Ghosh 1, Preston McAfee 1, and David Pennock 1 1 Yahoo! Research. Email: arpita, mcafee, pennockd@yahooinc.com 2 Harvard University.
More informationChapter 4: NonParametric Classification
Chapter 4: NonParametric Classification Introduction Density Estimation Parzen Windows KnNearest Neighbor Density Estimation KNearest Neighbor (KNN) Decision Rule Gaussian Mixture Model A weighted combination
More informationIntroduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Preprocessing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
More informationCOMPUTING SCIENCE. Determining User Passwords From Partial Information. Dylan Clarke, Ehsan Toreini and Feng Hao TECHNICAL REPORT SERIES
COMPUTING SCIENCE Determining User Passwords From Partial Information Dylan Clarke, Ehsan Toreini and Feng Hao TECHNICAL REPORT SERIES No. CSTR1461 April 2015 TECHNICAL REPORT SERIES No. CSTR1461 April,
More informationThe Probit Link Function in Generalized Linear Models for Data Mining Applications
Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications
More informationWhy the Normal Distribution?
Why the Normal Distribution? Raul Rojas Freie Universität Berlin Februar 2010 Abstract This short note explains in simple terms why the normal distribution is so ubiquitous in pattern recognition applications.
More informationCompetitive Analysis of On line Randomized Call Control in Cellular Networks
Competitive Analysis of On line Randomized Call Control in Cellular Networks Ioannis Caragiannis Christos Kaklamanis Evi Papaioannou Abstract In this paper we address an important communication issue arising
More informationVishnu Swaroop Computer Science and Engineering College Madan Mohan Malaviya Engineering College Gorakhpur, India Email: rsvsgkp@rediffmail.
Review and Analysis of Data Security in Data Mining Dileep Kumar Singh IT Resource Centre Madan Mohan Malaviya Engineering College Gorakhpur, India Email : gkp.dks@gmail.com Abstract In new era the information
More informationCollege information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining Anyi Lan 1, Jie Li 2 1 Hebei
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationA ThreeDimensional Conceptual Framework for Database Privacy
A ThreeDimensional Conceptual Framework for Database Privacy Josep DomingoFerrer Rovira i Virgili University UNESCO Chair in Data Privacy Department of Computer Engineering and Mathematics Av. Països
More informationOnline Appendix to Stochastic Imitative Game Dynamics with Committed Agents
Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents William H. Sandholm January 6, 22 O.. Imitative protocols, mean dynamics, and equilibrium selection In this section, we consider
More informationPurchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation
Purchase Conversions and Attribution Modeling in Online Advertising: An Empirical Investigation Author: TAHIR NISAR  Email: t.m.nisar@soton.ac.uk University: SOUTHAMPTON UNIVERSITY BUSINESS SCHOOL Track:
More informationWeighted Congestion Games
Players have different weights. A weighted congestion game is a tuple Γ = (N, R, (Σ i ) i N, (d r ) r R, (w i ) i inn ) with N = {1,..., n}, set of players R = {1,..., m}, set of resources Σ i 2 R, strategy
More informationA Network Flow Approach in Cloud Computing
1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges
More informationAn Attacker s View of Distance Preserving Maps For Privacy Preserving Data Mining
An Attacker s View of Distance Preserving Maps For Privacy Preserving Data Mining Kun Liu, Chris Giannella, and Hillol Kargupta Department of Computer Science and Electrical Engineering, University of
More information