Data Mining and the Importance of Privacy Preservation

Transcription

1 60 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns Faisal Shahzad 1, Sohail Asghar 2 and Khalid Usmani 2 1 Faculty of Computing, Mohammad Ali Jinnah University, Pakistan 2 Faculty of Computer Science, University Institute of Information Technology, Pakistan Abstract: The rapid advances in technology have led to generating and analyzing huge amounts of data in databases. The examples of such kind of data are bank records, web logs, cell phone records and network traffic records. This has raised a new challenge for people i.e., to transform this data into useful information. To achieve this task successfully, data mining is a vital technique. The aim of data mining is to extract knowledge from data. Sequential Pattern Mining (SPM) is an important area of data mining. Sequential data contains events and events contain items. The order between items does not matter. Whenever, we extract sequential information, there is always a threat that we may reveal sensitive sequential patterns. Thus, a need arises to protect sensitive sequential patterns. To fulfil this need; Privacy Preservation Data Mining (PPDM) techniques are used. The aim of privacy preservation techniques is to extract information from data without revealing sensitive information. In this research we would propose a technique based on FP growth approach and then applying anti-monotone and monotone constraints for identifying sensitive sequential patterns. For data modification we would apply the concept of fuzzy sets. Keywords: Data mining, PPDM, SPM, FP growth, anti-monotone, monotone, fuzzy logic. Received July 6, 2012; accepted December 23, 2013, published online April 17, Introduction Data mining is the process which is used to extract information from large databases. Chen et al. [5, 10] also refer to different terms used for data mining e.g., knowledge extraction, data archaeology and pattern analysis etc., Some people view data mining as knowledge discovery from databases while the others consider it as an essential step for the process of knowledge discovery. Data mining tasks can be divided into: Predictive., and descriptive. In predictive tasks we predict the class label of the new instance. Descriptive tasks specify the general characteristics of the instances. Data mining can be applied to any kind of data. The applications include in it are medical data, spatial and multimedia databases, time stream data, temporal databases and sequence databases etc. When we extract useful information from large repositories of data, there is a privacy threat concerned with it. Therefore, the importance of privacy preservation becomes prominent. Clifton and marks [6] have shown the importance of privacy preservation with a simple example. The authors have taken the example of a super market. The super market has two milk suppliers A and B. Suppose the super market releases the transaction database. There is good enough chance that supplier B could come to know about the association rules of supplier A. Supplier B could run a discount scheme on A s association rules. Gradually the sales of A would decrease and than of B. This scenario clearly states that sensitive information should not be leaked out to the outer world. Sequential Pattern Mining (SPM) deals with sequence data. The problem of mining sequential patterns was first introduced by [4]. An example of sequential pattern is that a customer first buys computer, then printer and then scanner. It is obvious that the items bought have some sequence. The sequence is not simple items but a set of items. The sequence data has events that took place at different times. These events have order as shown in sequence S(e 1, e 2, e 3,... e n ). Sequence S represents that e 1 took place before e 2; e 2 takes place e 3 and so on. An item set represents non empty set of items. Sequential patterns are closely related to association rule mining. The difference between sequential patterns and association rule is that the former has order while the later does not. In association rules, we find which products are bought together while in SPM we find out subsequent purchases made after the purchase of a particular item. Sequential patterns have its applications in Web Usage Mining (WUM), customer purchases, gene and DNA sequences and earthquake etc. The work carried out in [4, 6, 8] also highlights the importance of privacy preservation. The authors in their work state, with the increase in network data the privacy of data becomes an undeniable fact. Agarwal and srikant [4] the primary question was raised, since the primary task of data mining is the development of

2 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns 61 models about aggregated data, can we develop accurate models without access to precise information of individual records?. Privacy preservation of sequential patterns is equally as important as association rules. Sensitive sequential patterns should be hidden from the outer world so that businesses can grow and the gain trust among their customers. The area of privacy preservation in sequential patterns is largely unexplored. Therefore, we propose a scheme which is based on FP growth approach [11]. We would apply anti-monotone and monotone constraints for hiding sensitive sequential patterns. In this paper, we surmount the problems faced in [2] where matching set technique is used. To overcome those problems, we propose FP growth technique and apply anti-monotone and monotone constraint to identify sensitive sequential patterns. We propose a new technique based on Fuzzy logic to sanitize sensitive sequential patterns. The rest of the paper is organized as follows: Section 2 presents the related work. Section 3 describes the problem statement. Section 4 describes the proposed approach. Section 5 presents running example of the proposed approach. Section 6 presents the validation of proposed approach on different datasets. Section 7 presents conclusion and future work. 2. Related Work There are many approaches for preserving privacy of sequential patterns. Abul et al. [2] gave a formal definition to sequence hiding problem. They provided two contributions: 1) They shifted the attention of people from the typical association rules mining to sequence data. 2) They provided the NP-hardness of hiding sequential patterns. The approach discussed in [1] can also be used for sanitization. Their approach for hiding sensitive sequential patterns is based on matching set. From matching set size they identify the position in transaction for sanitization. They first compute the matching set size for every transaction of the database then sort the database in decreasing order of matching set size. The transactions which satisfy the threshold are considered for sanitization. They repeat this process until there are no matches found i.e., matching set size become empty. They also discussed possible extensions of their work e.g., to frequent item sets and saptio-temporal sequential patterns. The main problem with the approach of the authors [2] is the computation of matching set size. This computation takes exponential time in worst case scenario. For large datasets the main problem encountered by this approach is efficiency. Abul et al. [3] extended their idea to sanitize the spatio-temporal locations. They have used the same concept of matching set to hide sensitive trajectories which are also presented in [3]. The problem is formulated as background network. In a background network, nodes represent the spatio-temporal locations and edges represent the paths. Basically, they hide these paths or trajectories. A multi objective scheme is presented for hiding sensitive sequential pattern in [22]. The authors analyze the sequences by constructing candidate tree. The candidate tree contains the sensitive items which are found in transaction. The first level of the tree contains length-1 candidate solutions, and subsequent levels contain length-2, length-3 etc., candidate solutions. For generating candidate solutions, the authors suggest that most of the sensitive patterns should be hidden with less distortions and little effect on non-sensitive patterns. To achieve the above mentioned situations, they provide a weighted summation called objective function. For every node in the candidate tree, an objective function is calculated. The purpose of this function is to choose the best candidate solution for a transaction. In the same way the authors calculate the best solutions for every transaction, and at the end an overall best solution is determined. The overall best solution is then applied on the whole database for sanitizing the database. The authors state that the original database and released database should be as similar as possible. Mhatra et al. [15] presented an approach of inserting fake elements into transactions for hiding sequential patterns. The technique used by authors is somewhat similar to data perturbation. They applied this approach at pre-processing level i.e., before data is available for data miner for mining purposes. Another set of approaches for hiding sequential patterns are secure two party computations [17, 18]. In [17] approach is applied on two party scenarios while in [18] it is applied on multi party scenario. The authors have used homomorphic key encryption technique to achieve privacy of sequential patterns. The problem addressed is collaborative sequential pattern mining of two parties. Both parties have vertically partitioned datasets D 1 and D 2. The approach first sorts the databases, find sequential patterns by using apriori algorithm and then applying homomorphic encryption. Homomorphic encryption generates a key pair for encrypting and decrypting of the data. However, the authors have not given any example of dataset on which they have applied the approach. No results have been discussed regarding the effectiveness of the approach. To achieve privacy preservation of sequential patterns, some data modification techniques have been used. The major techniques include data swapping, data randomization and data perturbation. In [20] data randomization technique is used for hiding sensitive patterns. Data randomization refers to adding some fake items to patterns in transactions. The authors have used h as a factor to keep track of the number of items inserted in transactions. They have adapted the prefix span algorithm and proposed privacy preserving

3 62 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 sequential mining (PPSpan) algorithm. Another approach from data modification category is used in [19]. The authors used data perturbation technique for achieving privacy preservation of sequential patterns. Data perturbation is somewhat similar to data randomization technique. In this technique the data is distorted i.e., noisy items are added to sequential patterns. In randomization the order of items in transaction is changed while in data perturbation noisy items are added to transactions. Ouyang et al. [19] have used a factor h for keeping the track of the noisy items. They insert noisy items by the h factor in transactions to make the data perturbed and to retrieve the original patterns back. They have adopted the Prefix Span algorithm and propose privacy preserving sequential patterns (PPSpan). Jin et al. [12] proposed a technique based on k- anonymity and α-dissociation to hide sensitive sequential patterns. For finding the sensitive sequential patterns they divide a sequential pattern into positive and negative items. The positive items in a sequential pattern represent that they are present in a sequence while the negative items do not occur in a sequence. In the algorithm, the authors first pick the sequential patterns with decreasing length. In those patterns they look for length 1 negative items i.e., 1 negative item in a sequence. If one negative item is present in a sequential pattern and its support is not greater than k or α then that pattern is sensitive pattern and they hide that sensitive sequential pattern. The values of k and α can be set to any level. The results could have been more accurate if negative items of length two or more would be considered. Furthermore, they state that SPAM is used for generating frequent sequential patterns and also compare their results with SPAM which is not a technique for hiding sensitive sequential patterns. Densa et al. [21] proposed their technique for hiding sensitive sequential patterns using k-anonymity technique. Their approach works in three steps. First, they construct the prefix tree of sequences given in dataset D. Tree is in the form of triplet containing N, Є, Root. Root represents root node, N is the finite set of labeled nodes, Є is the set of edges. Every node in the tree except the root has one parent in the path. In the second step all those nodes are pruned whose support is less than k. The frequencies of all frequent nodes in the tree are updated by 1. The sequences which contain the infrequent items are anonymized with their ancestors. The anonymized dataset is represented by D /. The dataset used for experiments is taken from the city of Milan, Italy. The dataset represents the moving objects. They have taken the sequences of the paths visited by the vehicles. One of the limitations of their approach is, for constructing the tree they have used Prefix Span. PrefixSpan scans database multiple times so it is time consuming task. The approach used by Kapoor et al. [13] is applied in distributed databases. The authors proposed a PRIPSEP (Privacy Preserving Sequential Pattern) algorithm which is an extension of SPAM. The proposed technique is applied on distributed databases i.e. databases from different parties. There are three sites namely, Data Miner, Non Colluding sites and Processing Site. Data Miner site acts as a collaborator between original databases. Non colluding site collects noisy data from each database. Processing site processes the secure computation between the databases. This is used by non colluding sites. The authors state that this approach is better than secure multiparty computation. In secure multiparty computation all the sites have to remain online until the process finishes. While in this approach there is no need for the sites to remain online. However, they have not provided any comparisons with secure multiparty computation. Kim et al. [14] the authors presented a technique for privacy preservation of sequential patterns for network traffic data. The authors mine frequent sequential patterns maintaining privacy preservation. For this purpose they use N-repository server model that acts as a single mining server. Every site partitions the network traffic into N groups and encrypts the data of each group. This encrypted information is then sent to one of N servers. Server determines frequent items by totaling the occurrence of each item received. For decrypting the frequent items discovered, they are sent to another server which has the corresponding decryption key. At the end all the servers perform decryption process for received items. They make one coordinating server which totals the occurrence frequent items and find original frequent items. Meta tables are also maintained at each site to quickly determine whether a frequent pattern has occurred or not. 3. Problem Statement A sequence is an ordered list S=s 1, s 2, s 3,..., s n, where, each s i (1 I n) is an item set, and is called an element which is denoted as (x 1, x 2,..., x m ) such that each x k (1 k m) Є and is a finite set of distinct items. A sequence α=a 1, a 2,, a n is called a subsequence of another sequence β=b 1, b 2,, b m, if there exists integers 1 j 1 < j 2 <...< j n m such that a 1 b j1, a 2 b j2,..., a n b jn. A sequence database contains D contains a set of sequences. Given a sequence database D and constraints the sequential pattern mining problem requires to find the complete set of sequential patterns in the database. The sequential patterns hiding problem is defined as follows: Let Sp={S 1, S 2,..., S n } be the set of sensitive sequential patterns that need to be sanitized in D. Let ψ be the threshold. We need to transform the D into D / such that: 1. Sp i Є Sp, supp D / (Sp i <= ψ). 2. Sp i Є Sp supp D (Sp) supp D / (Sp) is minimum. In the above problem D is the original database and D / is the released database. The problem highlights two requirements for hiding sequential patterns. First one is

4 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns 63 to modify database D in such a way that sensitive sequential patterns are hidden. The second requirement says to reduce the effects of sanitization on all those sequential patterns that are not sensitive. 4. Proposed Methodology The proposed approach is divided into two phases: 4.1. Identification of Sensitive Items: 4.2. Sanitization of Sensitive Items Identification of Sensitive Items This phase consists of the following steps: Generate FP Tree In this step, we scan the dataset and generate the FP tree. We read the transactions one by one and place the items of the transactions as nodes of the FP tree. We increment the count of the items by one, on every occurrence. Figure 1 summarizes the process for identification of sensitive items. First, we generate FP tree for the dataset. Once FP tree is generated we apply monotone and anti-monotone constraints to identify the sensitive items and populate the released database D. Algorithm 1. Input: D, α Output: D 1. D Null 2. Root Null 3. for each t D 3.1. FPTree generatefptree() 4. for each Tr FPTree 4.1. D IdentifySensitiveItems() Definition 1: FP Tree: Let D={t 1, t 2,..., t n }, be the transactional database of items, where T i be the i th transaction containing a set of items I={a 1, a 2,, a n }. Let ƹ be the threshold, a pattern p is frequent if p>=ƹ and p satisfies monotone and anti-monotone constraints Anti-monotone and Monotone Constraints Once we generate FP tree, we then apply antimonotone and monotone constraints. a. Anti-monotone Constraint A constraint Ca is anti-monotone; if a pattern S does not satisfy Ca then none of the super-patterns of S would satisfy Ca. Let I={a 1, a 2,, a n } be the given item set and P(I) be the power set of I. Let A and B are item sets of I such that A B^A>= ƹ=> B>=ƹ. Table 1 represents an example of Anti-monotone constraint. 1. Min (Profit)>= Max (Profit)<= 30 Table 1. Anti-monotone constraint Item Profit A 40 B 0 C -20 D 10 E -30 F 30 G 20 H -10 Let the transaction for Table1 be (a, b, c, d, e). Now, if we apply Min (Profit)>= 50 on the transaction, we see that item a does not satisfy this constraint. There is no need to check the rest of the transaction items as it would not satisfy this constraint as well. The same condition will hold for Max (Profit)<= 30. b. Monotone Constraint A constraint C is monotone; if a pattern S satisfies C then every sub-pattern of S would satisfy C. Let I={a 1, a 2,, a n } be the given item set and P(I) be the power set of I. Let A and B are item sets of I such that A B^B>= ƹ=> A>= ƹ. Table 2 represents an example of Monotone constraint. 1. Min (Profit) < = Max (Profit) > = 30 Table 2. Monotone Constraint Item Profit A 40 B 0 C -20 D 10 E -30 F 30 G 20 H -10 Let the transaction for Table 2 be (a, b, c, d, e). Now, if we apply Min(Profit)<= 15 on the transaction, we see that item a does not satisfy this constraint but item b does. Therefore, whole transaction will satisfy this constraint. The same condition will hold for Max(Profit)>= 30. Figure 1 represents the conceptual diagram for Phase 1. Figure 1. Identification of sensitive items. As it can be seen from Figure 2, we first generate FP tree from the dataset. We then apply anti-monotone and monotone constraints on the FP tree to identify sensitive sequential patterns.

5 64 The International Arab Journal of Information Technology, Vol. 12, No.1, January Sanitization of Sensitive Items Fuzzification of Data Fuzzy sets were introduced by Lutfi Zadeh in They can be viewed as an extension of the classical crisp sets. Crisp sets are discerning between members and non-members of a set by assigning 0 or 1 to each object of the universal set. Mathematically it can be represented as: µ A (x)=1 µ B (x)=0 Fuzzy sets generalize this function by assigning values that fall in a particular range of 0 to 1. X is the crisp (rigid boundaries) universal set and the function µ A is the membership function which defines set A. Formally, it can be represented as µ A : X [0,1]. The membership function that we use in proposed scheme is defined in Equation 3. This function is used to assign a membership degree to each of the elements in crisp set X. While fuzzifying data, one thing should be kept in mind that the support of fuzzy set A is given by crisp set containing all of the elements whose membership degree in A is not 0. To fuzzify data, we used the following membership function: (3) Y = (X J)/ I (1) (2) We apply Equation 3 to dataset and find fuzzy values of transactions. In this equation Y represents the output variable, X represents dataset, J represents lower limit and I represents upper limit of dataset. We divide the fuzzified dataset into three sets as high sensitive, medium sensitive and low sensitive. This shows that some items in dataset are more sensitive than the others. Lemma 1: Maximum sanitization can be achieved by fuzzifying values of Sensitive Items. Consider a database D that consists of the transactions Ti, i.e. D={T 1, T 2,..., T n }={Ti} i=1 to n, Where, n=total number of transactions. Let T i be the transaction and expressed as T={a 1, a 2,..., a m } for any number i=j. Consider for any T i Є D, ᴲ a set A i Є T i such that if A is monotone then P(A) also satisfy monotone constraint, it follows that A is sensitive or A Є SI or equivalently SI={A Є SI if A is monotone, A Є SI if A is not monotone}. Furthermore, consider a membership functions X i = G(f Ti ), where f Ti express the corresponding frequency of a i. Replace X i with the corresponding sensitivity SI i of A i. Lemma 1 represents the technique mathematically. It first states the identification of sensitive items and then it sanitizes the sensitive items. At the end we replace the values of sensitive items with sanitized values. Figure 2 represents conceptual diagram of phase 2: Figure 2. Sanitization of sensitive items. Table 3 shows the notations used in proposed algorithm. D D / N F Α T Table 3. Notations used in algorithm. Original Dataset Modified Dataset Total Number of Transactions Set of Fuzzy Values User defined threshold Set of Transactions Algorithm 2 shows the proposed algorithm for sanitization of sensitive sequential patterns. The algorithm takes sensitive sequential patterns as input. It reads the data and apply membership function defined in Equation 3 to generate fuzzy values for dataset. Once, we generate fuzzy values we divide them into three classes i.e., High Sensitive, Medium Sensitive and Low Sensitive. At the end of the algorithm, we replace the values with original dataset and produce the released dataset. Fuzzy values are divided into classes according to the following rules: Algorithm 2: Proposed algorithm for Sanitizing Sensitive Sequential Patterns. Input: 1. D (Original Dataset). 2. α: A user Specified threshold. Output: 3. D (Modified/Released Dataset ). 1. D =NULL. F=NULL. 2. for (i: 1 to N) { //calculate fuzzy values using membership function 2.1. for each (Ti ϵ D) { 2.2.F=(X I)/J //Membership Function } } 3. for each (Ti ϵ F) { 3.1. if (Ti > α) HS Ti 3.2. if (Ti == α) MS Ti 3.3. D D U HS 3.4. D D U MS } 4. End

6 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns Putting Model into Work Table 4 shows a part of sequence database. This table contains sequence id and the items bought in a particular sequence. The sequences represent the purchases that a customer made during his visits to market. In FP tree the root node is given a null value. Each transaction is scanned and the nodes are added to the tree. The nodes in the tree represent items of sequential patterns. If an item appears more than once in database its frequency is updated and written alongside the node. In the given example the sensitive pattern is abe. To eradicate large candidate generation we apply antimonotone constraint on the tree. Table 4. Sequential patterns. Seq.ID Sequence 1 (a)(be)(c)(d) 2 (a)(b)(gde) 3 (b)(c)(de) 4 (a)(be)(c)(d) 5 (ce)(d)(f)(g) 6 (a)(be)(c)(d) 7 (a)(be)(c)(d) 8 (b)(c)(d)(e) 9 (a)(be)(c)(d) 10 (cd)(f)(g)(e) Figure 3 represents the FP tree constructed from Table 4. 7 represents the frequency of b, 5 represents the frequency of e and 2 represents the frequencies of c and d. The frequencies are updated from the FP tree. The frequencies are further divided into three classes, which represent sensitivity level of items. The ranges are as follows: 0-3 represents low sensitivity, 4-6 represent medium sensitivity and 7 and above represent high sensitivity. Table 5. Dataset with frequency of occurrence. Tr_ID Tr_Data Frequency of Occurrence 1 abecd abgde bcde abecd cedfg abecd abecd bcde abecd cdfge cdfge bcde cedfg bcde cedfg bcadefg bcde abecd cdefg cedfg We apply a membership function of Equation 3 to fuzzify the values. The dataset in Table 6 is given as an input to the membership function and this function returns the fuzzy values against the frequency of items. Table 6 represents the fuzzified values for Table 5. Each value in Table 6 represents fuzzy values for the frequency of items in Table 6. 0 and Nan in fuzzified dataset represent that frequency of items was not available. At the end, we replace the fuzzified values with original dataset and produce the released dataset with sensitive sequential patterns hidden. Table 6. Fuzzified dataset. Figure 3. FP tree. The tree shown in Figure 3 is generated by scanning the dataset and inserting nodes into the tree. The root of the tree is labelled as null. The children nodes represent the items and their frequency of occurrence. The tree is constructed by reading the items from the transactions and inserting them into the tree. We apply monotone and anti-monotone constraints on Figure 5 according to constraints Definition in a and b for identification of sensitive items. Table 5 represents the dataset along with frequency of items. The items are divided into three categories i.e., high sensitive, medium sensitive and low sensitive. The occurrence frequencies are in the same order as of sequence order. i.e., 9 represents the frequency of a, Fuzzified Dataset NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6. Results and Discussion In this section, we present analysis of proposed scheme. We have experimented with three datasets. The first dataset is randomly generated over the

7 66 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 alphabet {a, b, c, d, e, f, g} with 1200 sequences. The second dataset is network traffic regarding TCP packets containing 7000 sequences. The third dataset is also of network traffic of UDP packets containing 8000 sequences. All experiments were performed on Intel Core 2 Duo processor having 2 GB memory. There are two stages of development and experimentation. Identification of sensitive items is performed using C#.net while the modification of sensitive items is done using Fuzzy Logic. For implementing fuzzy logic MATLAB is used. All the simulations and graphs have been generated using MATLAB. We compared our approach with the approach presented in [2, 22]. Both [2, 22] are algorithms for hiding sensitive sequential patterns. We refer to both as Abul and Rahbarinia datasets. The comparison criteria is based on multiple database scans and number of modifications for hiding sensitive sequential patterns. Figures 4, 5, 6, and 7 represent experimental results and comparison with [2, 22]. Figure 4 represents comparison with [2], while 5 represents comparison with [22]. Figures 6 and 7 represent comparison with [2, 22] with respect to number of modifications made in both approaches. In Figure 4 we have shown the comparison for TCP dataset and Figure 5 represents the comparison for UDP dataset. In all these figures, X-axis represents number of transactions while Y-axis represents sensitivity level after data modification in original dataset. It can be seen from the figures that proposed approach has reduced the sensitivity level considerably as compared with [2]. This shows that proposed approach has almost achieved maximum sanitization. of modifications for sanitizing sequential patterns. Figure 6 shows comparison of number of modifications with [22] while 7 shows comparison of number of modifications with [2]. The figure shows numbers of transactions on X-axis while number of modifications on Y-axis. It can be clearly seen that number of modifications required to sanitize sensitive sequential patterns in proposed approach are less than the existing approaches. This shows that proposed approach is better than the existing approaches. No.of modifications No.of modifications Transactions Figure 6. Number of modifications. Transactions Figure 7. Number of modifications. Fuzzy based Rahbarinia et.al. [22] Fuzzy based PPDM Abul et.al. [2] Degree of sensitivity Transactions Figure 4. Results Comparison. 7. Conclusions and Future Work Privacy preservation of sequential patterns still is not explored in depth. We looked at different techniques proposed by people to address the issue of privacy preservation in sequential patterns. The work proposed in this paper is also another step towards addressing this problem by providing a solution. We experimented and evaluated the proposed approach by three datasets. We also presented comparison of proposed approach with existing approaches and found that proposed approach works considerably well than currently existing approaches. In future, we would further explore this area and try to find out new technique for sanitizing sensitive sequential patterns. We would also look at using evolutionary approaches for sanitization purposes. Figure 5. Results Comparison. Figures 6 and 7 represent the comparison of proposed approach with [2, 22] with respect to number References [1] Abbas A. and Liu J., Designing an Intelligent Recommender System using Partial Credit Model and Bayesian Rough Set, the International Arab Journal Of Information Technology, vol. 9, no. 2, pp , 2012.

8 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns 67 [2] Abul O., Atzori M., Bonchi F., and Giannotti F., Hiding Sequences, in Proceedings of the 23 rd International Conference on Data Engineering Workshop, Istanbul, Turkey, pp , [3] Abul O., Bonchi F., and Giannotti F., Hiding Sequential and Spatio-Temporal Patterns, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 12, pp , [4] Agarwal R. and Srikant R., Mining Sequential Patterns, in Proceedings of the 11 th International Conference on Data Engineering, Taipei, Taiwan, pp. 3-14, [5] Chen M., Han J., and Yu P., Data Mining: An Overview from a Database Perspective, IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp , [6] Clifton C. and Marks D., Security and Privacy Implications of Data Mining, in Proceedings of the ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, Montreal, Canada, pp.15-19, [7] El-Hajj M., Bifold Constraint-Based Mining By Simultaneous Monotone and Anti-Monotone Checking, in Proceedings of the 15 th International Conference on Data Mining, Texas, USA, pp , [8] Evfimievski A., Srikant R., Agrawal R., and Gehrk J., Privacy Preserving Mining of Association Rules, in Proceedings of the 8 th Conference on Knowledge Discovery and Data Mining, New York, USA, pp. 1-12, [9] Gupta M. and Josh C., Privacy Preserving Fuzzy Association Rules Hiding in Quantitative Data, International Journal of Computer Theory and Engineering, vol. 1, no. 4, pp , [10] Han J. and Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, USA, [11] Han J., PEI J., Yin Y., and Mao R., Mining Frequent Patterns without Candidate Generation, in Proceedings of the 2000 ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, pp. 1-12, [12] Jin H., Chen J., He H., and O Keefe C., Privacy- PreservingSequential Pattern Release, in Proceedings of The Pacific-Asia Conference on Knowledge Discovery and Data Mining, Nanjing, China, pp , [13] Kapoor V., Pocelet P., and Teisseire M., Privacy Preserving Sequential Pattern Mining in Distributed Databases, in Proceedings of the Conference on Information and Knowledge Management, Virginia, USA, pp , [14] Kim S., Park S., Won J., and Kim W., Privacy Preserving Data Mining of Sequential Patterns for Network Traffic Data, Infromation Sciences Journal, vol. 178, no. 3, pp , [15] Mhatra A., Verma M., and Toshniwal D., Privacy Preserving Sequential Pattern Mining in Progressive Databases using Noisy Data, in Proceedings of the 13 th International Conference Information Visualisation, California, USA, pp , [16] Naeem M. and Asghar S., A Novel Architecture for Hiding Sensitive Association Rules, in Proceedings of the International Conference on Data Mining, Nevada, USA, [17] Ouyang W. and Huang Q., Privacy Preserving Sequential Mining Based on Secure Two-Party Computation, in Proceedings of the 5 th International Conference on Machine Learning and Cybernetics, Guangzhou, China, pp , [18] Ouyang W. and Huang W., Privacy Preserving Sequential Pattern Mining Based on Secure Multi-Layer Computation, in Proceedings of the International Conference on Information Acquisition, China, pp , [19] Ouyang W., Xin H., and Huang Q., Privacy Preserving Sequential Pattern Mining Based on Data Perturbation, in Proceedings of the 6 th International Conference on Machine Learning and Cybernetics, Hong Kong, China, pp , [20] Ouyang W., Huang Q., and Xin H., A Randominzation Approach to Mining Sequential Pattern with Privacy Preserving, in Proceedings of the International Symposium on Computational Intelligence and Design, Wuhan, China, pp , [21] Pensa R., Monreale A., Pinelli F., and Pedreschi D., Pattern-Preserving k-anonymization of Sequences and its Application to Mobility Data Mining, Workshop co-located with ESORICS, Malaga, Spain, pp. 1-17, [22] Rahbarinia B., Pedram M., Arabnia H., and Alavi Z., A Multi-Objective Scheme to Hide Sequential Patterns, in Proceedings of the 2 nd International Conference on Computer and Automation Engineering, Singapore, vol. 1, pp , 2010.

9 68 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 Faisal Shahzad is Lecturer in Mohammad Ali Jinnah University, Pakistan. He looks at coordinating and managing instructional labs at Mohammad Ali Jinnah University. He is also visiting faculty member at University Institute of Information Technology (UIIT), Rawalpindi. In 2007 he received his BC degree in computer science from Mohammad Ali Jinnah University, Islamabad. From 2007 to 2009 he worked as a software engineer in a software company in Islamabad. In 2012 he received his MS in computer science from Mohammad Ali Jinnah University, Islamabad. He completed his thesis under the supervision of Dr. Sohail Asghar. Sohail Asghar is Director at University Institute of Information Technology (UIIT), PMAS-Arid Agriculture University, Pakistan. He is also Head of the Center of Research in Data Engineering (CORDE) Research Group. Prior to current position he was serving as an Associate Professor of computer science, Department of computer sciences, Faculty of Engineering and Applied Sciences, Mohammad Ali Jinnah University, Islamabad, Pakistan. Previously, heworked an Assistant Professor of computer sciences and head of R and D, Department of Computer Sciences, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Pakistan. Previously, he was Research Associate and Assistant Lecturer in Clayton School of Information Technology, Faculty of Information Technology at Monash University, Australia. In 1994, he graduated with honors in computer science from the University of Wales, United Kingdom. From 1994 to 2002, he worked as a senior software engineer in a software company in Islamabad. He then obtained his PhD in Information Technology at Monash University, Melbourne Australia in Khalid Usmani is assistant professor at University Institute of Information Technology (UIIT), PMAS-Arid Agriculture University, Pakistan. His areas of interest in research are computer networks, network security and information security. He is very active in research and supervises many students of MS (CS) in their research. He has done his PhD in wireless network security from University Teknologi Malaysia.