Preserving Privacy in Data Preparation for. Association Rule Mining

Size: px
Start display at page:

Download "Preserving Privacy in Data Preparation for. Association Rule Mining"

Transcription

1 Preserving Privacy in Data Preparation for Association Rule Mining Nan Zhang, Shengquan Wang, and Wei Zhao, Fellow, IEEE Abstract We address the privacy preserving association rule mining problem in a distributed system with one data miner and multiple data providers, each holds one transaction. The literature has tacitly assumed that randomization is the only effective approach to preserve privacy in such circumstances. We challenge this assumption by introducing an algebraic techniques based scheme in the data preparation phase. Compared to previous approaches, our new scheme can identify association rules more accurately but disclose less private information. Furthermore, our new scheme can be readily integrated as a middleware with existing systems. Index Terms Data mining; clustering, classification, and association rules; privacy; singular value decomposition. I. INTRODUCTION The goal of data mining is to extract interesting knowledge from large amounts of data [1]. Traditional data mining algorithms deal with centralized data. Recently, a number of applications on the Internet lead to a need for mining distributed data. In this circumstance, a privacy concern arises from the distributed The authors are with the Department of Computer Science, Texas A&M University, College Station, TX {nzhang, swang, zhao}@cs.tamu.edu. A preliminary version of this paper is to be presented at the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), September 2004.

2 2 data providers. In this paper, we address issues related to production of accurate data mining results, while preserving the private information in the data being mined. We will focus on association rule mining, which will be briefly reviewed in the next section. Since Agrawal, Imielinski, and Swami addressed this problem in [2], association rule mining has been an active research area due to its wide applications and the challenges it presents. Many algorithms have been proposed and analyzed [3] [5]. However, few of them have addressed the issue of privacy protection. We can classify privacy preserving association rule mining systems into two classes based on their infrastructures, named Server to Server (S2S) and Client-to-Server (C2S), respectively. In the first category (S2S), data are distributed across several autonomous entities (servers) [6], [7]. Each server holds a private database that contains numerous data points (i.e., transactions). The servers collaborate with each other to identify association rules spanning multiple databases. Since usually only a few servers are involved in a system (e.g., less than 10), the problem can be modeled as a variation of the secured multi-party computation [8], which has been extensively studied in cryptography [9]. In the second category (C2S), a system consists of one data miner (server) and large amounts of data providers (clients) [10], [11]. Each data provider holds only one transaction. Association rule mining is performed by the data miner on the aggregate transactions provided by the data providers. Online survey is a typical example of this kind of system, as the system can be modeled as consisting of one data miner (i.e., the survey analyzer) and thousands of data providers (i.e., the survey respondents). To ensure the effectiveness of the survey results (e.g., to block multiple votes from a unique IP), the identity of the data providers cannot be hidden from the data miner. Thus, privacy is of particular concern in this kind of system. In fact, there has been wide media coverage of the public debate of protecting privacy in online surveys [12]. Both S2S and C2S systems have wide applications. Nevertheless, we will focus on studying C2S systems in this paper. Several studies have been carried out on privacy preserving association rule mining in C2S systems. Most of them have tacitly assumed that an effective approach to preserving privacy is

3 3 randomization. If we consider the data mining as a two-phase process, which consists of the the data prepartion phase and the data mining phase, the randomization approach is involved in both phases. We Fig phase Data Mining challenge this assumption by introducing a new scheme that only occurs in the data preparation phase. Our new scheme integrates algebraic techniques with random noise perturbation. It has the following important features that distinguish it from previous approaches: Our scheme is easy to implement and flexible. Our scheme is involved only in the data preparation phase and does not require a support recovery procedure in the data mining phase. Thus, our new scheme is transparent to the data mining process. It can be readily integrated as a middleware with existing systems. Our scheme can identify association rules more accurately but disclose less private information. Roughly speaking, our scheme selects the most useful features for asssociation rule mining from the original data and transmits only these features to the data miner. Our simulation data show that for the same level of accuracy, our system discloses private information about five times less than the randomization approach. We allow explicit negotiation between the data providers and the data miner in terms of the tradeoff between the accuracy of data mining results and the privacy of data providers. Instead of following the rules set by the data miner, a data provider can play a role in determining the tradeoff between accuracy and privacy. This is an important feature because people have a wide variety of attitudes

4 4 towards privacy. Due to survey results in [13], the net users attritudes towards privacy can be divided into three categories: 7% privacy fundamentalists who are extremely concerned about the use of their private data, 56% pragmatic majority who are generally willing to provide data if privacy protection measures can be offered; and 27% marginally concerned who barely care about privacy. The negotiation feature in our scheme can help the data miner to collaborate with both hard-core privacy fundamentalists and people comfortable with limited privacy disclosure. The rest of this paper is organized as follows: In Section II, we brief review previous approaches. We present our models and introduce our new scheme in Section III. The communication protocol of our new scheme and its basic components are also provided in this section. In Section IV, we present the theoretical analysis of the tradeoff between accuracy and privacy in our scheme. The simulation results are presented in Section V. In Section VI, we present the experimental results on the performance evaluation of our scheme. Implementation and overhead issues are discussed in Section VII, followed by a final remark in Section VIII. II. APPROACHES In this section, we first overview the definition of association rule mining. Then, we introduce our models of the data miners. Based on the model, we review the randomization approach, which has been widely used in the literature of privacy preserving data mining. We will address the problems associated with the randomization approach, which motivates us to design a new privacy preserving scheme. A. Association Rule Mining A motivating example for association rule mining is a survey of hobbies. Each survey respondent chooses arbitrary number of hobbies from five options: football, soccer, beauty, video games and PC games. These options are called items. For a survey respondent, the answer to the survey is called a transaction. As we can see, a transaction is a set of items (e.g., t ={football, soccer}). Given a number of transactions, association rule mining finds interesting correlation (relationship) between items [1]. For

5 5 example, suppose that there are 5, 000 survey respondents. The survey analyzer finds that within the 1, 000 respondents who choose video games as their hobbies, there are 800 respondents who also choose PC games as their hobbies. Based on the survey results, the survey analyzer may infer an association rule that can be represented as follows. video games PC games [support = 16%, confidence = 80%]. (1) The support of this rule is the percentage of the respondents who choose both video games and PC games. Roughly speaking, the confidence is the probability that a respondent who chooses video games also chooses PC games. Both support and confidence are measures of the validity and trustworthiness of the association rules. Technically speaking, the main task of association rule mining is to find all frequent itemsets, which are itemsets that have support larger than a threshold determined by the data miner. B. Model of Data Miners Due to the privacy concern introduced to the system, we classify the data miners into two categories. One category is legal data miners. These data miners always act legally in that they only perform regular data mining tasks and would never intentionally invade the privacy of the data providers. The other category is illegal data miners. These data miners would purposely compromise the privacy of the data providers. Like adversaries in distributed systems, illegal data miners come in many forms. In most forms, their behavior is restricted from arbitrarily deviating from the protocol. In this paper, we focus on a particular sub-class of illegal miners. That is, in our system, illegal data miners are honest but curious: they follow proper protocols (i.e., they are honest), but they analyze all intermediate communications and received transactions (i.e., they are curious) to discover private information [9]. Even though it is a relaxation from the Byzantine behavior, this kind of honest but curious (nevertheless illegal) behavior is common and has been widely used as the adversary model in the literature.

6 6 C. Randomization Approach To prevent invasion of privacy due to the existence of illegal data miners, countermeasures must be implemented in the data mining system. We briefly review the randomization approach, which is currently used to preserve privacy in association rule mining. Based on the randomization approach, the entire data mining process is a two-step process. The first step is in the data preparation phase. In this step, a data provider first applies the randomization algorithm to the transaction it holds. Then, the data provider transforms the randomized transaction to the data miner. In previous studies, several randomization algorithms have been proposed including the cut-andpaste operator [10] and MASK operator [11]. For example, when the cut-and-paste operator is used, the data provider first randomly choose an integer j as the number of items that occur in both the original transaction t and the randomized transaction R(t). After that, the data provider randomly choose j items from the t and place these items into R(t). Then, for every item a t, the data provider tosses a coin with probability ρ to place a into R(t). In the second step, the data miner performs association rule mining on the aggregate data. With the randomization approach, the data miner must first employ a support recovery algorithm which intends to reconstruct the support of candidate itemsets. Also in the second step, an illegal data miner may invade privacy by using a privacy data recovery algorithm on the randomized transactions supplied by the data providers. Figure 1 depicts the privacy preserving association rule mining system with the randomization approach. Clearly, any such system should be measured by its capability of both generating accurate association rules and preventing invasion of privacy. D. Problems of the Randomization Approach While the randomization approach is intuitive, researchers have recently identified some problems of the randomization approach as follows.

7 7 Fig. 2. randomization approach In [10], the authors remarked that if the cut-and-paste operator is applied to a transaction with 10 or more items, it is difficult, if not impossible, for the data provider to contribute to the association rule mining with its privacy preserved. Furthermore, large itemsets have exceedingly high variances on the recovered support. Similar problems exist with other randomization operators as they share the similar scheme on the randomization of the original data. An approach was proposed in [10] to solve this problem. In the approach, all data providers that hold transactions with 10 or more items do not transfer the randomized transaction to the data miner. Unfortunately, this approach prevents many frequent itemsets that contain 4 or more items from being discovered by the data miner. In [14], the authors showed that the spectral properties of randomized data could help curious data miners to separate noise from private data. Based on random matrix theory, they proposed a filtering method to reconstruct private data from the randomized data set. They demonstrated that the randomization approach preserves very little privacy in many cases. Although their work is based

8 8 on the randomization approach for privacy preserving data classification, we believe that the similarity between randomization operators in association rule mining and data classification makes the problem inherent in the randomization approach. The randomization approach also suffers in efficiency. Since the privacy preserving mechanism is not restricted to the data preparation phase, it puts a heavy load on (legal) data miners at run time (because of the support recovery) [15]. It is shown that the cost of mining randomized data set is well within an order of magnitude in respect to that of mining original data set. We explore the reasons behind these problems as given below. We note that previous randomization approaches are transaction-invariant. That is, the same perturbation algorithm is applied to all data. Since more items in the original transaction always result in more real items to be included in the randomized transaction, privacy protection on transactions with a large size (e.g., t > 10) are doomed to failure. Previous randomization approaches are item-invariant. All items in the original transaction have the same probability of being included in the perturbed transaction. No specific operation is performed to preserve the correlation between different items. Thus, a lot of real items in the perturbed transactions may not appear in any frequent itemset. That is, the disclosure of these items does not contribute to the mining of association rules. We remark that the transaction-invariant and item-invariant properties are inherent in the randomization approach. The reason is that in a system using randomization approach, the communication is one-way: from the data providers to the data miner. As such, a data provider cannot obtain any specific guidance on the perturbation of its transaction from the (legal) data miner, nor can the data providers learn the correlation between the items. Thus, a data provider has no choice but to use a transaction-invariant and item-invariant approach. This observation motivates us to develop a new scheme that allows two-way communication between the data miner and the data providers. The two-way communication helps preserving privacy while not

9 9 incurring too much overhead. Thereby, we significantly improve the performance in terms of accuracy, privacy, and efficiency. We describe the new scheme in the next section. III. COMMUNICATION PROTOCOL AND RELATED COMPONENTS In this section, we introduce our new scheme including the communication protocol and its basic components. A. Description of Our New Scheme Fig. 3. our new scheme Figure 2 depicts the infrastructure of a system using our new scheme. Our scheme has two key components, perturbation guidance (PG) in the data miner server side and perturbation in the data provider client side. Compared to the randomization approach, our scheme does not have the support recovery component. Instead, the association rule mining is performed on the perturbed transactions (R(t)) directly. Thus, our scheme is restricted to the data preparation stage and does not put a heavy load on the data miner by recovering the support at run time.

10 10 Our scheme is a three-step process. In the first step, the data miner negotiates different perturbation level k with different data providers. The larger k is, the more contribution R(t) will make to the association rule mining task. The smaller k is, the more private information is preserved. Thus, a privacy fundamentalist can choose a small k to preserve its privacy while a privacy unconcerned data provider can choose a large k to contribute to the association rule mining. The second step is to transmit the perturbed transactions from the data providers to the data miner. Since each data provider comes at a different time (e.g., different survey respondents take the survey at different time), this step can be considered as an iterative process. In each stage, the data miner dispatches a reference (perturbation guidance) V k to a data provider P i. Here V k depends on the perturbation level kthat is negotiated by the data miner and the data provider P i in the first step. Based on the received V k, the perturbation component of P i computes the perturbed transaction R(t) from its original transaction t. Then, P i transmits R(t) to the perturbation guidance (PG) component of the data miner. The PG component then updates V k based on R(t) and forwards R(t) to the association rule mining process. A curious data miner can also obtain R(t). In this case, the curious data miner uses private data recovery algorithm to compromise privacy in R(t). In the third step, the perturbed transactions received by the data miner are used by the association rule mining process. Association rules are identified and delivered to the data miner. The key here is to properly design V k so that correct guidance can be given to the data providers on how to perturb the transactions. In our scheme, we let V k be an algebraic quantity derived from the currently received, yet perturbed transactions. The details of computing V k will be presented as a basic component of our scheme. B. Notions of Transactions Before presenting the details of the communication protocol and its basic components, we first introduce some notions of the data set. Let I be a set of n items (i.e., I = {a 1,..., a n }. Suppose that there are m data providers in the system. Each data provider C i holds a transaction t i, which is a subset of I. We

11 11 represent the data set by an m n matrix T = [a 1,..., a n ] = [t 1 ;... ; t m ] 1. An example of the transaction matrix T is shown in Table I. Let T ij be the element of T with indices i and j. The element T ij represents whether item j is in transaction t i. Suppose that transaction t 1 contains items a 1 and a 2. The first row of the matrix has T 1,1 = T 1,2 = 1. TABLE I AN EXAMPLE OF A TRANSACTION MATRIX a 1 a 2 a n t t m An itemset B I is an h-itemset if and only if B contains h items (i.e., B = h). The support of B is the percentage of the transactions in the data set that contain B. That is, supp(b) = {t T B t}. (2) m An h-itemset B is frequent if supp(b) min supp, where min supp is a predefined minimum threshold of support. Refer to the survey of hobbies example in the above subsection, {video games, PC games} is a frequent 2-itemset with support The set of frequent h-itemsets is denoted by L h. C. The Communication Protocol We now describe the communication protocol of our new scheme. The negotiation between the data miner and data providers is shown in Protocol 1. On the side of data miner server, there are two threads that perform the operations in Protocol 2 and Protocol 3 iteratively after the negotiation. For a data provider, it performs the operations in Protocol 4 to perturb and transmit its transaction to the data miner. 1 Here t i and a i are used somewhat ambiguously. In the context of association rule mining, t i is a transaction and a i is an item. In the context of matrix, t i represents a row vector in T and a i represents a column vector in T.

12 12 Protocol 1 Negotiation NM1. Based on the SVD of T (T = U Σ V ), the data miner calculates S = Σ Σ 2 nn; NM2. Find the smallest k [1, n] such that k i=1 Σ 2 ii µ S; NM3. The data miner dispatches k to registered data providers; NP1. For a data provider C i, if C i receives k K t (K t is the threshold of truncation level set by C i ) then C i sends ready message to the data miner; end if Protocol 2 Thread of registering data provider R1. Negotiate on the truncation level k with a data provider; R2. Wait for a ready message from a data provider; R3. Upon receiving the ready message from a data provider, Register the data provider; Send the data provider current V k ; R4. Go to Step R1; D. Basic Components There are three key components in the communication protocol of our scheme: (a) the method of computing V k, (b) the perturbation function R( ), and (c) the negotiation on truncation level k. 1) Computation of V k : Recall that V k carries information from the data miner to data providers on how to perturb the original transactions to preserve privacy. In our scheme, V k is an estimate of the eigenvectors of A = T T (i.e., the right singular vectors of T ). The justification of V k on providing accurate mining results and preserving privacy is presented in Appendix I. As we are considering dynamic cases where the perturbed transactions are dynamically fed to the data miner, the data miner keeps a copy of all received (perturbed) transactions and updates it when a new perturbed transaction is received. Assume that the initial set of received (perturbed) transactions T is

13 13 Protocol 3 Thread of receiving transaction T1. Wait for a perturbed transaction R(t) from a data provider; T2. Upon receiving the transaction from a registered data provider, Update V k based on the recently received perturbed transaction; Deregister the data provider; T3. Go to Step T1; Protocol 4 Transaction perturbation P1. Negotiate on the truncation level k with the data miner; P2. Send the data miner a ready message indicating that this provider is ready to contribute to the mining process; P3. Wait for a message that contains V k from the data miner; P4. Upon receiving the message from the data miner, Compute R(t) based on t and V k ; P5. Transfer R(t) to the data miner; empty 2. Every time when a perturbed transaction R(t) is received, T is updated by appending R(t) to the bottom of T. Thus, T is the matrix of currently received (perturbed) transactions. We derive V k from T. In particular, the computation of V k is done in the following steps. Using singular value decomposition (SVD) [16], we decompose T as (3), where Σ = diag(s 1,..., s n ) is a diagonal matrix with s 1 s n. T = U Σ V. (3) The numbers s 2 i make up the eigenvalues of A = T T. V is an n n unitary matrix composed of the eigenvectors of A. V k is composed of the k eigenvectors of A that are associated with the largest k eigenvalues of A. If 2 T may also contain some transactions provided by privacy-careless data providers

14 14 V = [v 1,..., v n ], we have V k = [v 1,..., v k ]. (4) Thus, we call V k as the k-truncation of V. Several incremental algorithms have been proposed to update V k when a perturbed transaction is received by the data miner [17], [18]. The computing cost of updating V k is addressed in Sect. VII. As we will see in Sect. IV, k plays a critical role in balancing accuracy and privacy. We will also show that by using V k in conjunction with R( ), which is to be discussed next, we can achieve both desired accuracy and privacy protection. 2) Perturbation function: Recall that once a data provider receives V k from the data miner, the data provider applies a perturbation function R( ) to its transaction t. The result is a perturbed transaction R(t) that will be transmitted to the data miner. The computation of R(t) is defined as follows. First, for a given V k, the transaction t is transformed by t = tv k V k. Note that the elements of the vector t may not be integers. Algorithm 5 is employed to round t to 0 or 1. In this algorithm, ρ t is a pre-defined parameter. Finally, for the completeness of our work, we introduce an optional procedure to enhance the privacy preserving capability of the system, which is shown in Algorithm 6. A data provider may use the protocol to insert additional noise into R(t). In the protocol, ρ m is a parameter determined by the data provider. The higher ρ m is, the more noise is inserted into the perturbed transaction. We remark that this protocol is optional and is only needed by privacy fundamentalists. An example of the perturbation process is provided in Appendix II. 3) Negociation on truncation level: In order to retain enough information for association rule mining after the transformation from t to R(t), a textbook heuristic is to make the sum of the k eigenvalues associated with the retained eigenvectors of A larger than 85% of the sum of the eigenvalues of A (i.e., µ = 85%) [16], [19]. The perturbation level k is usually large at the beginning but decreases and stabilizes to be fairly small (e.g., less than 1% of n) soon. Thus, in the theoretical analysis of our scheme, we consider the perturbation level k as a predetermined parameter rather than a variable updated throughout the data

15 15 Algorithm 5 Mapping Let t i be the element of vector t with index i. Similar notations apply to other vectors. for every element t i in t do if t i 1 ρ t then else R(t) i = 1 R(t) i = 0 end if end for Algorithm 6 Random-noise perturbation for every item a i / t do Choose a real number j uniformly at random on [0, 1] if j 1 ρ m then R(t) i = 1 end if end for preparation process. We have described the communication protocol of our scheme and its key components. We now discuss the accuracy and privacy measures of our scheme. IV. ANALYSIS ON ACCURACY AND PRIVACY In this section, we analyze our new scheme. We define measures for accuracy and privacy and derive their bounds, in order to provide guidelines for the tradeoff between these two measures and hence help system managers setting parameter. We also show the simulation and experimental results of our scheme on real datasets. For the simplicity of our discussion, we do not consider the optional random noise insertion procedure in Algorithm 6.

16 16 A. Accuracy Measure An accuracy measure should reflect the capability of the system that can correctly identify association rules in a given dataset. We define the accuracy measure as the error of the support of frequent itemsets. This is because that the main task of association rule mining is to identify frequent itemsets with support larger than a threshold min supp. There are two kinds of errors: a) false drops, which are unidentified frequent itemsets, and b) false positives, which are itemsets incorrectly identified to be frequent. We now formally define our accuracy measure. Given itemset I j, let supp(i j ) and supp (I j ) be the support of I j in the original transactions T and the perturbed transactions R(T ), respectively. Recall that the set of frequent h-itemsets in T is L h. We define the errors on false drops and false positives as follows. Definition 4.1: Given itemset size h, the error on false drops, ρ h 1, and the error on false positives, ρh 2, are defined as ρ h 1 = max I j L h (supp(i j ) supp (I j )), (5) ρ h 2 = max I j L h (supp (I j ) supp(i j )). (6) We define our accuracy measure, degree of accuracy, as the maximum value of ρ h 1 and ρh 2 itemsets. on all sizes of Definition 4.2: The degree of accuracy in a privacy preserving association rule mining system is defined as γ = max h 1 max(ρ h 1, ρh 2 ). Based on the definition, we derive an upper bound on the accuracy measure. Theorem 4.3: In our system, the degree of accuracy γ satisfies γ σ2 k+1 m ( { 1 (1 ρt ) max, 1 ρ }) t, (7) (1 ρ t ) 2 ρ t where σ 2 k+1 is the (k +1)th largest eigenvalue of A = T T. In particular, when ρ t = (3 5)/2, γ reaches its lowest upper bound at γ 2.618σ 2 k+1 /m. The proof of Theorem 4.3 can be found in Appendix III. Our bound on accuracy measure is fairly small when the number of transactions (m) is sufficiently large. This is usually the case in reality. Actually, our

17 17 scheme tends to enlarge the support of frequent itemsets and reduce the support of infrequent itemsets. Thus, the upper bound is not always tight. We can observe this trend from the experimental results, which will be presented in Section VI. B. Privacy Measure In our system, the data miner cannot infer the original transaction t from a perturbed transaction R(t) deterministically because V k V k is a singular matrix with its determinant det(v kv k ) = 0 (i.e., it does not have an inverse matrix). To measure the probability that an item in t can be identified from R(t), we need a privacy measure. Due to survey results in [12], a data provider (e.g., net user) always has a strong will of filtering out unwanted data (i.e., data not contribute to the mining of association rules) before transmitting its private data to the data miner. Given transaction t, an item a i in t is unwanted if a i does not appear in any frequent itemset (i.e., h 1, a i {L h L h t}). That is, the disclosure of a i (i.e., a i R(t)) does not contribute to the mining of association rules. We measure privacy by the probability that an unwanted item is included in the perturbed transaction. Formally speaking, our privacy measure, named level of privacy, is defined as follows. Definition 4.4: Given transaction t, an item a i t is unwanted if and only if there does not exist any frequent itemset I t such that a i I. We define the level of privacy as the probability that an infrequent item in t is included in the perturbed transaction R(t). That is, the level of privacy is defined as δ = Pr{a i R(t) a i is unwanted in t}. (8) As we can see from the definition, a higher level of privacy results in a higher probability of privacy invasion. With an approximation, we derive an upper bound on the level of privacy in our scheme. Theorem 4.5: With properly set parameters, the level of privacy in our system is bounded by σk+1 δ< σ2 n, (9) σ σn 2 where σ 2 i is the ith largest eigenvalue of A = T T.

18 18 The proof of Theorem 4.5 can be found in Appendix IV. Because of the uncertainty on the rounding off operation in Algorithm 5, this bound is not tight. As we can see from Theorem 4.3 and Theorem 4.5, there is a tradeoff between accuracy and privacy in our system. The upper bound of the degree of accuracy is in proportion to σk+1 2. The level of privacy is a decreasing function of σk+1 2. The larger the perturbation level k is, the more unwanted items are disclosed to the data miner, and the more frequent itemsets can be correctly identified. V. SIMULATION RESULTS In this section, we present the simulation results of our scheme on a randomly generated dataset. The experimental results on a real dataset will be presented in the next section. The randomly generated dataset consists of 2, 000 transactions. Let there are 20 items in the dataset. Each transaction is a subset of the 20 items. We represent the dataset by a 2, matrix T. The first 19 columns (items) of T, a 1,..., a 19, are independently generated. The 20th column is set to be the same as the 10th column (i.e., a 10 = a 20 ). The first item a 1 has a probability of 0.6 to be included in a transaction. All other columns have probability of 0.1 to be included in a transaction. Thus, the expected support of a 1 is 0.6. The expected support of every other item is 0.1. Since a 10 = a 20, the expected support of {a 10, a 20 } is 0.1. We set min supp = 0.09 as the threshold (lower bound) on the support of a frequent itemset. In the randomly generated dataset, the support of {a 1 } and {a 10, a 20 } are listed as follows. support of {a 1 } = , support of {a 10, a 20 } = (10) As we can see, these two itemsets have support much higher than other 1-itemsets and 2-itemsets, respectively. The left part of Figure 3 shows the original dataset. In the original dataset, the total number of appearance of all items is 4929 times including 1217 times of a 1, 197 times of a 10, 197 times of a 20, and 3318 times of the other items.

19 19 In our simulation, we set the parameters as µ = 0.85 (in Protocol 1), ρ t = 0.8 (in Algorithm 5), and µ m = 0 (in Algorithm 6). The data miner updates V k when every 10 transactions are received. The truncation level k calculated from Protocol 1 is listed as follows. The right part of Fig. 3 shows the TABLE II TRUNCATION LEVEL k k Transaction No dataset after perturbation. In the perturbed dataset, the total number of appearance of all items is 1646 times including 1217 times of a 1, 197 times of a 10, 197 times of a 20, and 35 times of the other items. The association rule mining results are stated as follows. {a 1 }, with support and {a 10, a 20 }, with support (11) As we can see, the support of interesting itemsets are perfectly recovered with the degree of accuracy γ = 0. The privacy is well preserved with the level of privacy δ = 1.05%. Fig. 4. comparison of between original and perturbed transactions

20 20 VI. EXPERIMENTAL RESULTS ON A REAL DATASET In this section, we make a comparison between our approach and the cut-and-paste randomization approach [10] by experimental results on a real dataset. We use a real world dataset named BMS Webview 1 [20], which contains web click stream data of several months from the e-commerce website of a leg-care company. The dataset contains 59, 602 transactions and 497 items. We randomly choose 10, 871 transactions from the dataset as our test band. The maximum transaction size is 181. The average transaction size is There are 325 transactions (2.74%) that contains 10 or more items. We set the support threshold min supp to be 0.2%. There are 798 frequent itemsets including itemset, itemsets, itemsets, 37 4-itemsets and two 5-itemsets. As a compromise between privacy and accuracy, the cutoff parameter K m of cut-and-paste randomization operator is set to 7. The truncation level k of our approach is set to 6. Since both our approach and the cut-and-paste operator use the same method to add random noise, we compare the results before noise is added. Thus we set ρ m = 0 for both our approach and the cut-and-paste randomization operator Our perturbation algorithm Previous approach: cut and paste degree of accuracy (%) ρ t Fig. 5. accuracy The solid line in Figure 4 shows the change of degree of accuracy (max{ρ 1, ρ 2 }) of our approach with the parameter ρ t. The dotted line shows the degree of accuracy when the cut-and-paste randomization operator is used. As we can see, our approach always identifies association rules more accurately than

21 21 the cut-and-paste approach Our perturbation algorithm Previous approach: cut and paste 0.6 Degree of Privacy ρ t Fig. 6. privacy The level of privacy in the same setting is presented in Fig. 5. As we can see, our approach preserves privacy better than the cut-and-paste operator when ρ t > 0.1. Thus, our approach is better on both privacy and accuracy issues when 0.1 ρ t 1. From the figures, we observe a recommendation for the data providers as that ρ t [0.7, 0.8] is suitable for hard-core privacy fundamentalists and ρ t [0.2, 0.3] is recommended to privacy marginally concerned people. In particular, we analyze all 2-itemsets in the real dataset to show that our scheme tends to enlarge the support of frequent itemsets and reduce the support of infrequent itemsets. The result is shown in Fig. 6. The x-axis is the support of 2-itemsets in the original dataset. The y-axis is the support of itemsets in the perturbed dataset. The figure intends to show how effectively our system blocks the unwanted items from being divulged. If a system preserves privacy perfectly, we should have y equal to zero when x is less than min supp. The data in Fig. 6 shows that almost all 2-itemsets with support less than 0.2% (233, 368 unwanted 2-itemsets) have been blocked. That is, the privacy has been successfully protected. Meanwhile, the supports of frequent 2-itemsets are enlarged. This should help the data miner to identify frequent itemsets from additional noises.

22 22 Support in Perturbed Transactions k = 6 out of 497, ρ t = 0.4 Support in R(T) ( 10 4 ) Support in T ( 10 4 ) 233,368 2 itemsets itemsets Support in Original Transactions Fig. 7. comparison of 2-itemsets supports between original and perturbed transactions VII. IMPLEMENTATION A prototypical system for privacy preserving mining of association rules has been implemented using our new scheme. The system is designed on web browsers and servers for the application of online surveys. Visitors taking surveys through web browsers are considered to be data providers. The web server conducting surveys is considered to be the data miner. We implement the data perturbation algorithm as custom code on the web browsers. The PG (perturbation guidance) part of the data miner is implemented as custom code on the web server. All custom codes are component-based plug-ins that one can easily install to existing systems. The components required for building the system is shown in Figure 7. Fig. 8. system implementation The infrastructure of our scheme is shown in Figure 8. There are three separate layers in our system: user interface layer, perturbation layer, and web layer. The top layer, named user interface layer, provides

23 23 interface to data providers and the data miner. The middle layer, named perturbation layer, realizes our privacy preserving scheme and exploits the bottom layer to transfer information. The bottom layer, named web layer, consists of web servers and web browsers. As an important feature of our system, the details of data perturbation on the middle layer are transparent to both data providers and the data miner. The Fig. 9. system infrastructure overhead of our implementation is substantially smaller than the randomization approach in the context of online survey. The time-consuming part of the cut-and-paste randomization approach, which is the support recovery procedure, occurs in the association rule mining process. The support recovery algorithm needs to compute the partial support of all candidate itemsets for each transaction size, which will result in a significant overhead. In our system, the only overhead (possibly) incurred on the data miner is to update the perturbation guidance V k. Many SVD updating algorithms have been proposed including SVD-updating, folding-in and recomputing the SVD [17], [18]. Since T is usually a sparse matrix, the complexity of updating SVD can be considerably reduced to O(n). Besides, this overhead is not on the critical time path of the mining process. It occurs during data collection instead of data mining process. Note that the transmitted perturbation guidance V k is of length k n. Since k is always a small number (e.g., k 10), the communication overhead incurred by our introduction of two-way communication is not significant. VIII. FINAL REMARKS In this paper, we propose a new scheme on privacy preserving mining of association rules. Compared with previous approaches, we introduce a two-way communication mechanism between the data miner

24 24 and data providers with little overhead. In particular, we let the data miner send a perturbation guidance to the data providers. Using this intelligence, the data providers distort the data transactions to be transmitted to the miner. As a result, our scheme identifies association rules more precisely than previous approaches and at the same time protects privacy more effectively. Our work is preliminary and many extensions can be made. In addition to using a similar approach in data classification [21], we are currently investigating how to apply the approach to clustering problems. We would like to investigate a new behavior model that is stronger than the honest-but-curious model, and can be dealt with by our scheme. REFERENCES [1] J. Han and M. Kamber, Data Mining Concepts and Techniques. Morgan Kaufmann, [2] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1993, pp [3] R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, in Proc. Int. Conf. on Very Large Data Bases, 1994, pp [4] J. S. Park, M. S. Chen, and P. S. Yu, An effective hash-based algorithm for mining association rules, in Proc. ACM SIGMOD Int. Conf. on Management of Data, 1995, pp [5] M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman, Computing Iceberg queries efficiently, in Proc. Int. Conf. on Very Large Data Bases, 1998, pp [6] J. Vaidya and C. Clifton, Privacy preserving association rule mining in vertically partitioned data, in Proc. ACM SIGKDD Int. Conf. on Knowledge discovery and data mining, 2002, pp [7] M. Kantarcioglu and C. Clifton, Privacy-preserving distributed mining of association rules on horizontally partitioned data, in Proc. ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery, 2002, pp [8] Y. Lindell and B. Pinkas, Privacy preserving data mining, Advances in Cryptology, vol. 1880, pp , [9] O. Goldreich, Secure Multi-Party Computation. Cambridge Univeristy Press, [10] A. Evfimievski, R. Srikant, R. Agrawal, and J. Gehrke, Privacy preserving mining of association rules, in Proc. ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2002, pp [11] S. J. Rizvi and J. R. Haritsa, Maintaining data privacy in association rule mining, in Proc. Int. Conf. on Very Large Data Bases, 2002, pp [12] J. Hagel and M. Singer, Net Worth. Harvard Business School Press, 1999.

25 25 [13] L. F. Cranor, J. Reagle, and M. S. Ackerman, Beyond concern: Understanding net users attitudes about online privacy, AT&T Labs-Research, Tech. Rep. TR , [14] H. Kargupta, S. Datta, Q. Wang, and K. Sivakumar, On the privacy preserving properties of random data perturbation techniques, in Proceedings of the 3rd IEEE International Conference on Data Mining. IEEE Press, 2003, pp [15] S. Agrawal, V. Krishnan, and J. R. Haritsa, On addressing efficiency concerns in privacy-preserving mining, in Proceedings of the 9th International Conference on Database Systems for Advanced Applications. Springer Verlag, 2004, pp [16] G. H. Golub and C. F. V. Loan, Matrix Computations. Johns Hopkins University Press, [17] J. R. Bunch and C. P. Nielsen, Updating the singular value decomposition, Numerische Mathematik, vol. 31, pp , [18] M. Gu and S. C. Eisenstat, A stable and fast algorithm for updating the singular value decomposition, Yale University, Tech. Rep. YALEU/DCS/RR-966, [19] I. T. Jolliffe, Principle Component Analysis. Springer Verlag, [20] Z. Zheng, R. Kohavi, and L. Mason, Real world performance of association rule algorithms, in Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, 2001, pp [21] N. Zhang, S. Wang, and W. Zhao, On a new scheme on privacy preserving data classification, Texas A&M University, Tech. Rep. TAMU/CS , [22] B. Nobel and J. W. Daniel, Applied linear algebra. Prentice-Hall, 1988.

26 26 APPENDIX I APPENDIX: JUSTIFICATION OF V k The main part of a privacy preserving association rule mining system is the data perturbation mechanism employed by the data providers. In current techniques, randomization operator is used to perturb original transactions. As we described in Section II, an item-invariant randomization operator leaves the correlation between different items unconsidered. We investigate the item correlation to improve accuracy of mining results. Specifically, our data perturbation algorithm preserves the support of 2-itemsets. The intuition behind our approach can be stated as follows. The PG part of the data miner maintains an estimation of the support of all 1-itemsets and 2-itemsets. Based on the estimation, PG tells the data providers which itemsets are more likely to be frequent itemsets. We know that no superset of an infrequent itemset is frequent (anti-monotone). A data provider may safely remove all infrequent 1-itemsets and 2-itemsets because they cannot appear in any frequent itemset 3 or more items either. Readers may raise a question that why we choose to maintain the support of 1-itemsets and 2-itemsets instead of 1, 2, and 3-itemsets or 1-itemsets only? For the first question, the main reason is that the large number of 3-itemsets (n 3 ) could impose serious overhead on the system. Besides, as we can see in Appendix III, our algorithm also preserves the support of itemsets with 3 or more items. On the other hand, if we choose to maintain the support of 1-itemsets only, we cannot remove many items from original transactions to protect privacy. Roughly speaking, if {a i, a j }, {a j, a k }, and {a i, a k } are all frequent, we have a fairly high probability that {a i, a j, a k } is also frequent. However, if both a i and a j are frequent, the probability that {a i, a j } is frequent is much smaller. Thus, in order to guarantee that all frequent 2-itemsets can be preserved, we cannot remove many items from the original transactions. Now we will show that our data perturbation algorithm successfully preserves the support of 2-itemsets. Here we consider the case when V k is the k-truncation of the accurate eigenvectors of T T. In reality, the value of V k has to be estimated from the current copy of T. Fortunately, the value of V k converges well

27 27 to its accurate value. Besides, the convergence is fairly fast in most cases. The proof of convergence is presented in Appendix V. Consider A = T T a 1 a 1 a 1 a n A = T T =. a i a j. a n a 1 a n a n (I.12) where a i a j is the dot product of a i and a j. Note that a i a j = m a i h a j h (I.13) h=1 a i a j m = support of {a i, a j }. (I.14) Thus, we may preserve the support of 2-itemsets by a precise approximation of A. In particular, we use truncated eigenvectors of A to perturb the original transactions T into T. Given transaction matrix T, define the perturbed transaction T = T V k V k. T = [t 1, t 2,... t m ] V k = [v 1, v 2,..., v k ] T = T V k V k = [t 1V k V k, t 2V k V k,..., t mv k V k ]. (I.15) (I.16) (I.17) With some algebriac manupulation we have à = T T = Vk V k T T V k V k = V k V k V (Σ Σ)V V k V k = V k Σ ΣV k (I.18) (I.19) (I.20) = σ 2 1 v 1v σ2 k v kv k (I.21) Thus, à is the optimal rank-k approximation of A [16]. In other words, we preserve the support of 2-itemsets while cutting off its eigenvectors.

28 28 Our data perturbation algorithm also preserves the privacy of data providers. The data miner cannot deduce the original t from t because V k V k is a singular matrix with det(v kv k ) = 0, which means that V k V k does not have an inverse matrix. APPENDIX II APPENDIX: EXAMPLE OF TRANSACTION PERTUBATION As we have shown in Appendix I, we can reach a precise approximation of A = T T by perturbing t to t = tv k V k. Besides this truncated eigenvectors perturbation, we need a function to map the values of t [0, 1] to integer values R(t) {0, 1}. Although the truncation of eigenvectors has already inserted some noise items into the transactions, we still need to add more noise items. These tasks are done by the Alg. 5 and Alg. 6. Here we use an example to illustrate the algorithms involved in the perturbation process. Example 2.1: A data provider C i holds a transaction t = [1, 1, 0, 1, 1, 1, 0, 0]. After negotiating with the data miner, C i receives V k (k = 2) from the data miner as follows. V k = (II.22) Using truncated eigenvectors V k as perturbation guidance, C i first calculates t by t = tv k V k. Now the perturbed transaction t is t = [0.54, 0.75, 0.67, 0.48, 0.56, 0.76, 0.63, 0.46]. (II.23) The step of mapping t to integer values can be stated as: 1) As stated in Alg. 5, C i rounds off t to integer. Let ρ t be 0.3. C i transforms t to R(t) = [1, 0, 0, 1, 0, 0, 0, 0] = {a 1, a 4 }; (II.24) 2) After that, as stated in Alg. 6, C i inserts some items into R(t) as random noise. ρ m is the probability of a false item to be placed into R(t). Now R(t) may become R(t) = [1, 0, 0, 1, 0, 0, 1, 0] = {a 1, a 4, a 7 }. (II.25)

29 29 TABLE III MAPPING FUNCTION t 1 i t 2 j t 1 i t 2 j R(t 1) i R(t 2) j 0 0 (1 ρ t) (1 ρ t) (1 ρ t) (1 ρ t) As shown in the above example, there are three parameters, k, ρ t, and ρ m, in the perturbation process. They all supply data providers controls to preserve privacy. The first parameter, k, is determined by the negotiation between the data provider and the data miner. The other parameters are determined by the data provider due to its sensitivity of privacy. APPENDIX III APPENDIX: PROOF OF THEOREM 4.3 We first consider the error on support of 2-itemsets. Then we will generalize the result to itemsets with other sizes. First, we consider the error introduced by the truncated eigenvectors perturbation (i.e., t = tv k V k ). As shown in (I.21) in Appendix I, à = T T is a rank-k approximation of A = T T. Since no entry of a matrix can exceeds the 2-norm of the matrix [22], we have max à A ij à A 2 = σk+1 2 i,j (III.26) For 1-itemsets, assume that the error on support of itemset {a i } introduced by the perturbation is ɛ T i. We have ɛ T i = à A ii σ 2 k+1 /m. For 2-itemsets, assume that the error on support of itemset {a i, a j } introduced by the perturbation is ɛ T ij. We have ɛt ij = à A ij σ 2 k+1 /m. Second, we consider the error introduced by the rounding off operation in Alg. 5. Let ɛ M ij be the error on support of {a i, a j } introduced by the rounding off operation. As we may can from Tab. III, for every

On the Performance Measurements for Privacy Preserving Data Mining

On the Performance Measurements for Privacy Preserving Data Mining On the Performance Measurements for Privacy Preserving Data Mining Nan Zhang, Wei Zhao, and Jianer Chen Department of Computer Science, Texas A&M University College Station, TX 77843, USA {nzhang, zhao,

More information

Data mining successfully extracts knowledge to

Data mining successfully extracts knowledge to C O V E R F E A T U R E Privacy-Preserving Data Mining Systems Nan Zhang University of Texas at Arlington Wei Zhao Rensselaer Polytechnic Institute Although successful in many applications, data mining

More information

Privacy Preserved Association Rule Mining For Attack Detection and Prevention

Privacy Preserved Association Rule Mining For Attack Detection and Prevention Privacy Preserved Association Rule Mining For Attack Detection and Prevention V.Ragunath 1, C.R.Dhivya 2 P.G Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582 1582 AN EFFICIENT CRYPTOGRAPHIC APPROACH FOR PRESERVING PRIVACY IN DATA MINING T.Sujitha 1, V.Saravanakumar 2, C.Saravanabhavan 3 1. M.E. Student, Sujiraj.me@gmail.com 2. Assistant Professor, visaranams@yahoo.co.in

More information

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

More information

Privacy Preserving Outsourcing for Frequent Itemset Mining

Privacy Preserving Outsourcing for Frequent Itemset Mining Privacy Preserving Outsourcing for Frequent Itemset Mining M. Arunadevi 1, R. Anuradha 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1 Assistant

More information

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology

More information

Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2

Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 04, 2014 ISSN (online): 2321-0613 Privacy Preserving Mining of Transaction Databases Sunil R 1 Dr. N P Kavya 2 1 M.Tech

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA

IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA International Conference on Computer Science, Electronics & Electrical Engineering-0 IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA Pavan M N, Manjula G Dept Of ISE,

More information

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com

More information

Data Outsourcing based on Secure Association Rule Mining Processes

Data Outsourcing based on Secure Association Rule Mining Processes , pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Security in Outsourcing of Association Rule Mining

Security in Outsourcing of Association Rule Mining Security in Outsourcing of Association Rule Mining W. K. Wong The University of Hong Kong wkwong2@cs.hku.hk David W. Cheung The University of Hong Kong dcheung@cs.hku.hk Ben Kao The University of Hong

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases A Novel Technique of Privacy Protection Mining of Association Rules from Outsource Transaction Databases 1 Dhananjay D. Wadkar, 2 Santosh N. Shelke 1 Computer Engineering, Sinhgad Academy of Engineering

More information

A Secure Model for Medical Data Sharing

A Secure Model for Medical Data Sharing International Journal of Database Theory and Application 45 A Secure Model for Medical Data Sharing Wong Kok Seng 1,1,Myung Ho Kim 1, Rosli Besar 2, Fazly Salleh 2 1 Department of Computer, Soongsil University,

More information

Information Security in Big Data using Encryption and Decryption

Information Security in Big Data using Encryption and Decryption International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor

More information

Prediction of DDoS Attack Scheme

Prediction of DDoS Attack Scheme Chapter 5 Prediction of DDoS Attack Scheme Distributed denial of service attack can be launched by malicious nodes participating in the attack, exploit the lack of entry point in a wireless network, and

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS Nitin Trivedi, Research Scholar, Manav Bharti University, Solan HP ABSTRACT The purpose of this study is not to delve deeply into the technical

More information

Experimental Analysis of Privacy-Preserving Statistics Computation

Experimental Analysis of Privacy-Preserving Statistics Computation Experimental Analysis of Privacy-Preserving Statistics Computation Hiranmayee Subramaniam 1, Rebecca N. Wright 2, and Zhiqiang Yang 2 1 Stevens Institute of Technology graduate, hiran@polypaths.com. 2

More information

Integrating Pattern Mining in Relational Databases

Integrating Pattern Mining in Relational Databases Integrating Pattern Mining in Relational Databases Toon Calders, Bart Goethals, and Adriana Prado University of Antwerp, Belgium {toon.calders, bart.goethals, adriana.prado}@ua.ac.be Abstract. Almost a

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

Preventing Denial-of-request Inference Attacks in Location-sharing Services

Preventing Denial-of-request Inference Attacks in Location-sharing Services Preventing Denial-of-request Inference Attacks in Location-sharing Services Kazuhiro Minami Institute of Statistical Mathematics, Tokyo, Japan Email: kminami@ism.ac.jp Abstract Location-sharing services

More information

Similarity and Diagonalization. Similar Matrices

Similarity and Diagonalization. Similar Matrices MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

More information

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks

A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks A Spectral Clustering Approach to Validating Sensors via Their Peers in Distributed Sensor Networks H. T. Kung Dario Vlah {htk, dario}@eecs.harvard.edu Harvard School of Engineering and Applied Sciences

More information

To Enhance The Security In Data Mining Using Integration Of Cryptograhic And Data Mining Algorithms

To Enhance The Security In Data Mining Using Integration Of Cryptograhic And Data Mining Algorithms IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 06 (June. 2014), V2 PP 34-38 www.iosrjen.org To Enhance The Security In Data Mining Using Integration Of Cryptograhic

More information

Near Sheltered and Loyal storage Space Navigating in Cloud

Near Sheltered and Loyal storage Space Navigating in Cloud IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 8 (August. 2013), V2 PP 01-05 Near Sheltered and Loyal storage Space Navigating in Cloud N.Venkata Krishna, M.Venkata

More information

Lecture 3: Finding integer solutions to systems of linear equations

Lecture 3: Finding integer solutions to systems of linear equations Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture

More information

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA natarajan.meghanathan@jsums.edu

More information

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS

CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS CREATING MINIMIZED DATA SETS BY USING HORIZONTAL AGGREGATIONS IN SQL FOR DATA MINING ANALYSIS Subbarao Jasti #1, Dr.D.Vasumathi *2 1 Student & Department of CS & JNTU, AP, India 2 Professor & Department

More information

OLAP Online Privacy Control

OLAP Online Privacy Control OLAP Online Privacy Control M. Ragul Vignesh and C. Senthil Kumar Abstract--- The major issue related to the protection of private information in online analytical processing system (OLAP), is the privacy

More information

Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining

Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Random Projection-based Multiplicative Data Perturbation for Privacy Preserving Distributed Data Mining Kun Liu Hillol Kargupta and Jessica Ryan Abstract This paper explores the possibility of using multiplicative

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

Lecture Topic: Low-Rank Approximations

Lecture Topic: Low-Rank Approximations Lecture Topic: Low-Rank Approximations Low-Rank Approximations We have seen principal component analysis. The extraction of the first principle eigenvalue could be seen as an approximation of the original

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract

Victor Shoup Avi Rubin. fshoup,rubing@bellcore.com. Abstract Session Key Distribution Using Smart Cards Victor Shoup Avi Rubin Bellcore, 445 South St., Morristown, NJ 07960 fshoup,rubing@bellcore.com Abstract In this paper, we investigate a method by which smart

More information

A Direct Numerical Method for Observability Analysis

A Direct Numerical Method for Observability Analysis IEEE TRANSACTIONS ON POWER SYSTEMS, VOL 15, NO 2, MAY 2000 625 A Direct Numerical Method for Observability Analysis Bei Gou and Ali Abur, Senior Member, IEEE Abstract This paper presents an algebraic method

More information

Using Data Mining Methods to Predict Personally Identifiable Information in Emails

Using Data Mining Methods to Predict Personally Identifiable Information in Emails Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,

More information

SYMMETRIC EIGENFACES MILI I. SHAH

SYMMETRIC EIGENFACES MILI I. SHAH SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Risk Management for IT Security: When Theory Meets Practice

Risk Management for IT Security: When Theory Meets Practice Risk Management for IT Security: When Theory Meets Practice Anil Kumar Chorppath Technical University of Munich Munich, Germany Email: anil.chorppath@tum.de Tansu Alpcan The University of Melbourne Melbourne,

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations

Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Dataset Preparation and Indexing for Data Mining Analysis Using Horizontal Aggregations Binomol George, Ambily Balaram Abstract To analyze data efficiently, data mining systems are widely using datasets

More information

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner

Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner 24 Horizontal Aggregations In SQL To Generate Data Sets For Data Mining Analysis In An Optimized Manner Rekha S. Nyaykhor M. Tech, Dept. Of CSE, Priyadarshini Bhagwati College of Engineering, Nagpur, India

More information

On the Interaction and Competition among Internet Service Providers

On the Interaction and Competition among Internet Service Providers On the Interaction and Competition among Internet Service Providers Sam C.M. Lee John C.S. Lui + Abstract The current Internet architecture comprises of different privately owned Internet service providers

More information

Functional-Repair-by-Transfer Regenerating Codes

Functional-Repair-by-Transfer Regenerating Codes Functional-Repair-by-Transfer Regenerating Codes Kenneth W Shum and Yuchong Hu Abstract In a distributed storage system a data file is distributed to several storage nodes such that the original file can

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk

APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION ANALYSIS. email paul@esru.strath.ac.uk Eighth International IBPSA Conference Eindhoven, Netherlands August -4, 2003 APPLICATION OF DATA MINING TECHNIQUES FOR BUILDING SIMULATION PERFORMANCE PREDICTION Christoph Morbitzer, Paul Strachan 2 and

More information

How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic Approach*

How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic Approach* Int. J. Engng Ed. Vol. 22, No. 6, pp. 1281±1286, 2006 0949-149X/91 $3.00+0.00 Printed in Great Britain. # 2006 TEMPUS Publications. How to Design and Interpret a Multiple-Choice-Question Test: A Probabilistic

More information

PRIVACY PRESERVING ASSOCIATION RULE MINING

PRIVACY PRESERVING ASSOCIATION RULE MINING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Mining an Online Auctions Data Warehouse

Mining an Online Auctions Data Warehouse Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance

More information

Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation

Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Privacy-Preserving Outsourcing Support Vector Machines with Random Transformation Keng-Pei Lin Ming-Syan Chen Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan Research Center

More information

The Trip Scheduling Problem

The Trip Scheduling Problem The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems

More information

Improving data integrity on cloud storage services

Improving data integrity on cloud storage services International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 Volume 2 Issue 2 ǁ February. 2013 ǁ PP.49-55 Improving data integrity on cloud storage services

More information

by the matrix A results in a vector which is a reflection of the given

by the matrix A results in a vector which is a reflection of the given Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that

More information

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Pramod S. Reader, Information Technology, M.P.Christian College of Engineering, Bhilai,C.G. INDIA.

More information

Privacy-preserving Data Mining: current research and trends

Privacy-preserving Data Mining: current research and trends Privacy-preserving Data Mining: current research and trends Stan Matwin School of Information Technology and Engineering University of Ottawa, Canada stan@site.uottawa.ca Few words about our research Universit[é

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

Enhancement of Security in Distributed Data Mining

Enhancement of Security in Distributed Data Mining Enhancement of Security in Distributed Data Mining Sharda Darekar 1, Prof.D.K.Chitre, 2 1,2 Department Of Computer Engineering, Terna Engineering College,Nerul,Navi Mumbai. 1 sharda.darekar@gmail.com,

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

A Game Theoretical Framework for Adversarial Learning

A Game Theoretical Framework for Adversarial Learning A Game Theoretical Framework for Adversarial Learning Murat Kantarcioglu University of Texas at Dallas Richardson, TX 75083, USA muratk@utdallas Chris Clifton Purdue University West Lafayette, IN 47907,

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

More information

Business Lead Generation for Online Real Estate Services: A Case Study

Business Lead Generation for Online Real Estate Services: A Case Study Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining

An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining An Efficient Multi-Keyword Ranked Secure Search On Crypto Drive With Privacy Retaining 1 B.Sahaya Emelda and 2 Mrs. P. Maria Jesi M.E.,Ph.D., 1 PG Student and 2 Associate Professor, Department of Computer

More information

International Journal of Advanced Research in Computer Science and Software Engineering

International Journal of Advanced Research in Computer Science and Software Engineering Volume, Issue, March 201 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Efficient Approach

More information

A Study of Data Perturbation Techniques For Privacy Preserving Data Mining

A Study of Data Perturbation Techniques For Privacy Preserving Data Mining A Study of Data Perturbation Techniques For Privacy Preserving Data Mining Aniket Patel 1, HirvaDivecha 2 Assistant Professor Department of Computer Engineering U V Patel College of Engineering Kherva-Mehsana,

More information

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor

More information

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis , 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA

A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA A NOVEL APPROACH FOR MULTI-KEYWORD SEARCH WITH ANONYMOUS ID ASSIGNMENT OVER ENCRYPTED CLOUD DATA U.Pandi Priya 1, R.Padma Priya 2 1 Research Scholar, Department of Computer Science and Information Technology,

More information

DATA MINING - 1DL360

DATA MINING - 1DL360 DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,

More information

Privacy-Preserving in Outsourced Transaction Databases from Association Rules Mining

Privacy-Preserving in Outsourced Transaction Databases from Association Rules Mining Privacy-Preserving in Outsourced Transaction Databases from Association Rules Mining Ms. Deokate Pallavi B., Prof. M.M. Waghmare 2 Student of ME (Information Technology), DGOFFE, University of Pune, Pune

More information

Secure Framework and Sparsity Structure of Linear Programming in Cloud Computing P.Shabana 1 *, P Praneel Kumar 2, K Jayachandra Reddy 3

Secure Framework and Sparsity Structure of Linear Programming in Cloud Computing P.Shabana 1 *, P Praneel Kumar 2, K Jayachandra Reddy 3 Proceedings of International Conference on Emerging Trends in Electrical, Communication and Information Technologies ICECIT, 2012 Secure Framework and Sparsity Structure of Linear Programming in Cloud

More information

Chapter 6: Episode discovery process

Chapter 6: Episode discovery process Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing

More information

Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test

Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test Privacy-Preserving Models for Comparing Survival Curves Using the Logrank Test Tingting Chen Sheng Zhong Computer Science and Engineering Department State University of New york at Buffalo Amherst, NY

More information

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6

International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6 International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: editor@ijermt.org November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition

P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition P164 Tomographic Velocity Model Building Using Iterative Eigendecomposition K. Osypov* (WesternGeco), D. Nichols (WesternGeco), M. Woodward (WesternGeco) & C.E. Yarman (WesternGeco) SUMMARY Tomographic

More information

SECRET sharing schemes were introduced by Blakley [5]

SECRET sharing schemes were introduced by Blakley [5] 206 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 1, JANUARY 2006 Secret Sharing Schemes From Three Classes of Linear Codes Jin Yuan Cunsheng Ding, Senior Member, IEEE Abstract Secret sharing has

More information

A Network Flow Approach in Cloud Computing

A Network Flow Approach in Cloud Computing 1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

More information

Concepts of digital forensics

Concepts of digital forensics Chapter 3 Concepts of digital forensics Digital forensics is a branch of forensic science concerned with the use of digital information (produced, stored and transmitted by computers) as source of evidence

More information

A Practical Application of Differential Privacy to Personalized Online Advertising

A Practical Application of Differential Privacy to Personalized Online Advertising A Practical Application of Differential Privacy to Personalized Online Advertising Yehuda Lindell Eran Omri Department of Computer Science Bar-Ilan University, Israel. lindell@cs.biu.ac.il,omrier@gmail.com

More information

DWMiner : A tool for mining frequent item sets efficiently in data warehouses

DWMiner : A tool for mining frequent item sets efficiently in data warehouses DWMiner : A tool for mining frequent item sets efficiently in data warehouses Bruno Kinder Almentero, Alexandre Gonçalves Evsukoff and Marta Mattoso COPPE/Federal University of Rio de Janeiro, P.O.Box

More information

Web Database Integration

Web Database Integration Web Database Integration Wei Liu School of Information Renmin University of China Beijing, 100872, China gue2@ruc.edu.cn Xiaofeng Meng School of Information Renmin University of China Beijing, 100872,

More information