A Novel Technique of Privacy Protection Mining of Association Rules from Outsource Transaction Databases 1 Dhananjay D. Wadkar, 2 Santosh N. Shelke 1 Computer Engineering, Sinhgad Academy of Engineering Pune, India 2 Assistant Professor, Sinhgad Academy of Engineering Pune, India Abstract - In developments such as cloud computing in data mining, it has been consider recent interest in the paradigm of data mining-as-a-service. A third party service provider needs in computational resources can outsource its mining needs to a third party service provider. In that case items and the association rules of the outsourced database are considered private property of the corporation. In that corporate property can protect to protect this privacy, first the data owner transforms its data and loads it to the server; server sends its mining queries to user and retrieves the true patterns from the extracted patterns received from the data owner. In the propose method we study the problem of Privacypreserving Mining of Association Rules from Outsourced Transaction Databases. In that we can propose a one dataset and this dataset representation we used the k means clustering algorithm. Here we can propose an attack model based on background knowledge and divided this scheme for privacy preserving outsourced data mining. Propose system ensures that each transformed item is identical and with respect to the attacker s background knowledge in the outsource transaction data mining. Keywords - Privacy Preserving Mining, Association Rule Mining, Data Perturbation, Encryption/Decryption module. 1. Introduction Data mining is to extract information from large databases in corporate data as well as data process. Data mining is the process of displaying new patterns from large data set marketing analysis, medical diagnosis etc. The data mining is under attack from privacy supporters because of a misunderstanding about what it actually is and a valid concern about how it will be actually works. This has caused concerns that personal data may be used for a variety of intrusive in the private property. In the privacy preserving data mining help to achieve data mining goals without resign the privacy of the individuals and without allowing learning underlying original data values. Association mining rule is a technology in data mining that identifies the regularities found in large volume of data [2]. Association mining rule technique could be compromised when allowing third party to identify and reveal hidden information that is private for an individual or corporate data. Privacyprotecting data mining using association rule mining refers to the area of data mining that seeks to safeguard sensitive information from unsanctioned disclosure. When people talk of privacy protection, they say keep information about me from being available to others or third party. The real concern is that people or corporate data information not be misused. When the once information is released, it will be impossible to prevent misuse. Utilizing this distinction ensuring that a data mining concept won t enable misuse of personal information in that complete privacy would prevent. We need technical and social solutions that ensure data will not be released or not be protected [5]. We see the existing method conservative frequencybased attack model in which the server knows the exact set of items in the owner s data and it also knows the exact support of every item in the original data. It has been developed the idea of using fake items to defend against the frequency-based attack and it was lacking a formal theoretical analysis of privacy guarantees and has been shown to be flawed very recently in, the method for breaking the proposed encryption is given. Therefore, in our previous and preliminary work, we proposed to solve this problem by using k-privacy, in that each item in the outsourced dataset should be identical from at least k 1 items regarding their support. 196
Fig.1. Architecture of Data Mining as a service. Fig. shows the architecture of the mining as a service Fig 1 [1] the client encrypts its data using an encrypt/decrypt module in privacy preserving outsource transaction databases. The third party conducts data mining and sends the encrypted patterns to the owner. The existing system encryption scheme has the property that the returned supports are not true supports. In the propose system the E/D module which is useful to recover the true identity of the returned patterns as well their true supports. The (E/D) module trivial to show that if the data is encrypted using 1-1 substitution ciphers, In the cipher text there are many ciphers and because of this transactions and supported patterns can be broken by the server with a high probability by launching the frequency-based attack. In propose method we are going to use devise encryption schemes such that formal based attack privacy guarantees can be proven against attacks conducted by the server using background knowledge [1]. First, we define an attack model for the displaying and make precise the background knowledge the adversary may possess. Our notion of privacy requires that for each cipher text item, there are at least k 1 distinct cipher items that are indistinguishable from the item regarding their supports Second, we make an encryption scheme, called Rob Frugal that the E/D module can employ to transform client data before it is shipped to the server. Third, to allow the E/D module to recover the confidence value and true patterns and along with their correct support of data item, we propose that it successfully creates and keeps a compact structure, called summary [1]. 2. Related Work There are several edges where related work is occurring. Previous work in privacy-protection data mining has two issues. First is the aim is preserving customer privacy by distorting the data values [4]. The idea is that the prevented data does not reveal private information, and thus is safe to use for mining. The key result is that the prevent data, and information on the distribution of the random data used to change the data and it can be used to generate an approximation to the original data distribution without changing the original data values. The distribution is used to improve mining results over mining the distorted data directly from the corporate data [5]. The data distortion approach has been applied to association rules [4], [5]. And developed the idea is to modify data values such that reconstruction of the values for any individual transaction is difficult, but the rules learned on the distorted data are still valid. One of the main interesting feature of this work is a flexible definition of privacy protection if data and the ability to correctly guess a value of 1 from the distorted data can be considered a greater threat to privacy than correctly learning a 0.The data displaying approach addresses a different problem from our related work [5]. The research of privacy-preserving data mining (PPDM) has caught much attention recently in data mining. The main approach here is that private data is collected from a number of sources by a collector for the purpose of consolidating the data and conducting mining. The data owner is not trusted with protecting the privacy, so data is subjected to a random disturbance as it is collected. A new approach have been developed for disordering the data so as to preserve privacy while ensuring the mined 197
patterns or other analytical properties are sufficiently close to the patterns mined from original data [1] [2].We can study the problem of outsourcing the association rule mining task within a corporate privacy-preserving framework. A significantly body of work has been done on privacy-preserving data mining in a variety of contexts. A common attribute of most of the previously studied frameworks is that the patterns mined from the data are intended to be shared with parties other than the data owner. The key different between such bodies of work and our problem is that, both the underlying data and the mined results are not intended for sharing and muster main private to the data owner.[3] We adopt a conservative frequency-based attack model in which the server knows the exact set of items in the owner s data; server also knows the exact support of every item in the original data. They introduced the idea of using fake items to defend against the frequencybased attack, it was lacking a formal theoretical analysis of privacy guarantees and it has been shown to be flawed very recently in where a method for breaking the proposed encryption is given. We proposed to solve this problem by using k-privacy and that each item in the outsourced dataset should be indistinguishable from at least k-1 items regarding their support. There are two techniques can be classified into Cryptography-Based Techniques and Generative-Based Techniques. 1. Cryptography-Based Techniques: In the context of PPDM over distributed data or cryptography-based techniques have been developed to solve problem of the following nature: Number of parties want to conduct a computation based on their private inputs. In the system the issue here is how to conduct such a computation so that no party knows anything except its own input. This problem is referred to as the Secure Multi- Party Computation (SMC) problem the technique proposed in address privacy-preserving classification, while the techniques proposed in address privacy-preserving association rule mining, and the technique in addresses privacy-preserving clustering [6]. 2. Generative-Based Techniques: These techniques can be designed to perform distributed mining tasks. In this existing system each party shares just a small portion of its local model which is used to construct the global model for the existing solutions are built over horizontally partitioned data. The solution presented in addresses privacy-preserving frequent item sets in distributed databases [6]. 3. Proposed System This problem attacked in our paper is outsourcing of pattern mining within a corporate privacy-preserving framework. A key different between this problem and the abovementioned PPDM problems is that, not only the underlying data but also the mined results are not intended for sharing and must remain private. When the server possesses background knowledge and conducts attacks on that basis and it should not be able to guess the correct candidate item or item set corresponding to a given cipher item or item set with a probability above a given threshold. Fig 2: General Architecture of Outsourced 3rd Party Association Rules Mining. Fig. shows the architecture of outsource 3 rd party association rules mining (Refer [4]) in this process the specialized terminals are semi trusted in nature, this makes privacy and security leakages a real danger. The 3rd party server knows the confidence and support values of each item-set in every transaction. In that case the data encryption is applied and the process is susceptible to frequency attacks and then leads to a loss of privacy data. The server can make deductions as to the semantic meanings of each encrypted item-set. Clearly, privacy and security mechanisms are required to be in place in such outsourcing activities. How this 198
activity secures are these mechanisms and what security notions do they satisfy? This propose system attempts to contribute to the body of knowledge by making a case for the distribution of outsourced association rules mining, the application scenarios, as well as to define a security notation for an implementation. There are four main modules in the propose system they can be given below: 1. Encryption scheme Rob frugal 2. Data preprocessing and K-means clustering algorithm 3. Support and confidence calculation by using Apriori algorithm 4. Rob frugal Decryption scheme 1. Encryption Scheme of Rob Frugal: The server who gains access to it may possess some background knowledge using which they can on the encrypted database D. We can be referring to of these agents as an attacker. We can adopt a conservative model and assume that the attacker knows exactly the set of (plain) items I in the original transaction database D and their true supports. We assume the service provider (who can be an attacker) is semi-honest in the sense that although he does not know the details of our encryption algorithm, he can be interested and thus can use his background knowledge to make inferences on the encrypted transactions. We also consider that the attacker always returns (encrypted) item sets together with their exact support. The data owner considers the true identity of: (1) Every cipher item (2) Every cipher transaction (3) Every cipher frequent pattern in data mining as the intellectual property which should be protected. In this section, we introduce the encryption scheme, it can be transforms a TDB D into its encrypted version D*. The proposed scheme is parametric k > 0 and consists of three main steps: (1) using 1-1 substitution ciphers for each plain item (2) using a specific item k- grouping method (3)using a method for adding new fake transactions for achieving k-privacy. The build fake transactions are added to D(once items are replaced by cipher items) to form D* and transmitted to the server. 2. K-Means Clustering Algorithm: Clustering is defined as clustering of data is a method by which large sets of data are grouped into clusters of smaller sets of similar data. K-means clustering algorithm to classify or to group your objects based on attributes/features into K number of group. In this algorithm K is positive integer number and the grouping is done by minimizing the sum of squares of distances between data and the corresponding cluster centroid. Fig3. K-means clustering for finding centroid Given k The k-means algorithm consists of four steps: 1. Select initial centroids at random way to choose nearest node. 2. Assign each object to the cluster with the nearest centroid node. 3. Compute each centroid as the mean of the objects assigned to it. 4. Repeat previous 2 steps until no change. In the k-means clustering algorithm we use the dataset and that dataset will be cleaned by the preprocessing techniques.in that also unwanted data will be remove from the dataset. 3. Apriori Algorithm: In a data mining, Apriori is a classic algorithm for learning association rules. The Apriori algorithm is designed to operate on databases containing transactions. The algorithm attempts to find subsets which are common to at least a minimum number C (the support, or confidence threshold) of the item sets. In this algorithm where frequent subsets are extended one item at a time known as candidate generation, and the number of groups of candidates are tested against the data. This algorithm uses breadth-first search and a hash tree structure to count candidate item sets efficiently. Support count: The support count of an item set X and it is denoted by X count, in a data set T is the number of transactions in T that contain X. We can assume T has n transactions. Then, 199
4. Result In the Apriori algorithm find out all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Apriori algorithm consist of eight steps they can be given below: 1: Find all large itemsets 2: For (a = 2; while La-1 is non-empty; a++) 3: {Ca = apriori-gen(lk-1) 4: For each c in Ca, initialize c. where c=count to zero 5: For all records r in the database 6: {Cr = subset (Ca, r); For each c in Cr, c. count++ } 7: Set La := all c in Ca whose count >= minsup Related work done on mining association rules for outsources transaction databases. We have been comparing all the previous results and get the proposed method result.we can create a number of power industries database and calculate the used power of that industries by assuming the last five years dataset and calculate the current used power of that industries. This industries data can be secure by using the AES encryption decryption algorithm. In this Proposed method we also calculate the next year power uses by using the current year power uses. We can calculate the next year power uses also. 4. Rob Frugal Decryption Scheme: When the client requests the execution of a pattern mining query to the server, It can be specify a minimum support threshold σ and server returns the computed frequent patterns from D. Clearly, for every item set S and its corresponding cipher item set E, we have that supp D(S) supp D_(E) and for each cipher pattern E returned by the server together with supp D_(E) and the E/D module recovers the corresponding plain pattern S. It requires reconstructing the exact support of S in D and deciding on this basis if S is a frequent pattern. To reach this goal, the E/D module adjusts the support of E by removing the effect of the fake transactions. Supp D(S) = supp D_(E) supp D_\D(E). This can be follows from the fact that support of an item set is additive over a disjoint union of transaction sets. The pattern S with adjusted support is kept in the output if supp D(S) σ and the final calculation of supp D_\D(E) is performed by the E/D module using the synopsis of the fake transactions in D \ D. Fig 5. Accuracy of Proposed Method In the above graph we can calculate the detection rate (false negative, true positive) we also calculated the penalty and reward for databases. Fig. 4: Flow of the Project Fig 6. Prediction and Uses power details 200
In the above two graph we can show the industry wise prediction power and actual used power. In this figure the uses power is greater than the prediction power we can show the rewards for that day and uses power is less than prediction power then we can give the penalty of the current date. 5. Conclusion Protecting privacy in data mining activities is a very important issue in many applications. There is new approach to solve the problem of privacy preserving data mining in the scenario of outsourced business transaction database has been solved successfully. This approach is efficient and better than many other perturbation techniques. Proposed algorithm has reduced the time complexity and space complexity as well as false rules problems in effective manner from the previous work. In future, we will try to make it is more powerful for cloud and distributed databases. References [1] Fosca Giannotti, Laks V.S. Lakshmanan, Anna Monreale, Dino Pedreschi, and Hui (Wendy) Wang. Privacy-preserving Mining of Association Rules from Outsourced Transaction Databases. Ieee systems journal vol:7 no:3 year 2013 [2] Vineet Richhariya1 & Prateek Chourey. A Robust Technique For Privacy Preservation Of Outsourced Transaction Database. Issn(E): 2321-8843; Issn(P): 2347-4599 Vol. 2, Issue 6, Jun 2014, 51-58 [3] Joseph Chan Joo Keng. Privacy Protection in Outsourced Association Rule Mining using Distributed Servers and Its Privacy Notions. IS752: [4] Adsure Sharad S., Prof. S.Pratap Singh. Preserving Data Privacy By Susceptible Association Rule Hiding Approach. International Journal Of Computer Engineering And Applications, Volume Vii, Issue I, July 14. [5] Sunil kumar chintada,jayanthiraomadina. A Privacy Preserving Association Rule Mining Over Unrealized Datasets. International Journal of Engineering Trends and Technology (IJETT) Volume 5 Number 4 - Nov 2013 ISSN: 2231-5381 [6] R.Natarajan1, Dr.R.Sugumar, M.Mahendran, K.Anbazhagan. A survey on Privacy Preserving Data Mining. ISSN 2278 1021 International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 1, MARCH 2012 [7] Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. In VLDB, pages 487 499, 1994. [8] Rakesh Agrawal and Ramakrishnan Srikant. Privacy preserving data mining. In SIGMOD, pages 439 450, 2000. [9] Gilburd B, Schuste A, and Wolff R. k-ttp: A new privacy model for large scale distributed environments. In VLDB, pages 563 568, 2005. [10] Fosca Giannotti, Laks V.S. Lakshmanan, Anna Monreale, Dino Pedreschi, and Hui Wang. Privacypreserving outsourcing of association Rule mining. Tech Report: 2009-TR-013, ISTI-CNR, Pisa, 2009. [11] Fosca Giannotti, Laks V.S. Lakshmanan, Anna Monreale, Dino Pedreschi, and Hui Wang. Privacypreserving data mining from outsourced Databases. In SPCC2010, in conjunction with CPDP, 2010. [12] Murat Kantarcioglu and Chris Clifton. Privacypreserving distributed mining of association rules on horizontally partitioned data. TKDE, 16(9):1026 1037, 2004. [13] P. Krishna Prasad and C. Pandu Rangan. Privacy preserving birch algorithm for clustering over arbitrarily partitioned databases. In Advanced Data Mining and Applications, pages 146 157, 2007. 201