Data Mining and the Importance of Privacy Preservation

Size: px
Start display at page:

Download "Data Mining and the Importance of Privacy Preservation"

Transcription

1 60 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns Faisal Shahzad 1, Sohail Asghar 2 and Khalid Usmani 2 1 Faculty of Computing, Mohammad Ali Jinnah University, Pakistan 2 Faculty of Computer Science, University Institute of Information Technology, Pakistan Abstract: The rapid advances in technology have led to generating and analyzing huge amounts of data in databases. The examples of such kind of data are bank records, web logs, cell phone records and network traffic records. This has raised a new challenge for people i.e., to transform this data into useful information. To achieve this task successfully, data mining is a vital technique. The aim of data mining is to extract knowledge from data. Sequential Pattern Mining (SPM) is an important area of data mining. Sequential data contains events and events contain items. The order between items does not matter. Whenever, we extract sequential information, there is always a threat that we may reveal sensitive sequential patterns. Thus, a need arises to protect sensitive sequential patterns. To fulfil this need; Privacy Preservation Data Mining (PPDM) techniques are used. The aim of privacy preservation techniques is to extract information from data without revealing sensitive information. In this research we would propose a technique based on FP growth approach and then applying anti-monotone and monotone constraints for identifying sensitive sequential patterns. For data modification we would apply the concept of fuzzy sets. Keywords: Data mining, PPDM, SPM, FP growth, anti-monotone, monotone, fuzzy logic. Received July 6, 2012; accepted December 23, 2013, published online April 17, Introduction Data mining is the process which is used to extract information from large databases. Chen et al. [5, 10] also refer to different terms used for data mining e.g., knowledge extraction, data archaeology and pattern analysis etc., Some people view data mining as knowledge discovery from databases while the others consider it as an essential step for the process of knowledge discovery. Data mining tasks can be divided into: Predictive., and descriptive. In predictive tasks we predict the class label of the new instance. Descriptive tasks specify the general characteristics of the instances. Data mining can be applied to any kind of data. The applications include in it are medical data, spatial and multimedia databases, time stream data, temporal databases and sequence databases etc. When we extract useful information from large repositories of data, there is a privacy threat concerned with it. Therefore, the importance of privacy preservation becomes prominent. Clifton and marks [6] have shown the importance of privacy preservation with a simple example. The authors have taken the example of a super market. The super market has two milk suppliers A and B. Suppose the super market releases the transaction database. There is good enough chance that supplier B could come to know about the association rules of supplier A. Supplier B could run a discount scheme on A s association rules. Gradually the sales of A would decrease and than of B. This scenario clearly states that sensitive information should not be leaked out to the outer world. Sequential Pattern Mining (SPM) deals with sequence data. The problem of mining sequential patterns was first introduced by [4]. An example of sequential pattern is that a customer first buys computer, then printer and then scanner. It is obvious that the items bought have some sequence. The sequence is not simple items but a set of items. The sequence data has events that took place at different times. These events have order as shown in sequence S(e 1, e 2, e 3,... e n ). Sequence S represents that e 1 took place before e 2; e 2 takes place e 3 and so on. An item set represents non empty set of items. Sequential patterns are closely related to association rule mining. The difference between sequential patterns and association rule is that the former has order while the later does not. In association rules, we find which products are bought together while in SPM we find out subsequent purchases made after the purchase of a particular item. Sequential patterns have its applications in Web Usage Mining (WUM), customer purchases, gene and DNA sequences and earthquake etc. The work carried out in [4, 6, 8] also highlights the importance of privacy preservation. The authors in their work state, with the increase in network data the privacy of data becomes an undeniable fact. Agarwal and srikant [4] the primary question was raised, since the primary task of data mining is the development of

2 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns 61 models about aggregated data, can we develop accurate models without access to precise information of individual records?. Privacy preservation of sequential patterns is equally as important as association rules. Sensitive sequential patterns should be hidden from the outer world so that businesses can grow and the gain trust among their customers. The area of privacy preservation in sequential patterns is largely unexplored. Therefore, we propose a scheme which is based on FP growth approach [11]. We would apply anti-monotone and monotone constraints for hiding sensitive sequential patterns. In this paper, we surmount the problems faced in [2] where matching set technique is used. To overcome those problems, we propose FP growth technique and apply anti-monotone and monotone constraint to identify sensitive sequential patterns. We propose a new technique based on Fuzzy logic to sanitize sensitive sequential patterns. The rest of the paper is organized as follows: Section 2 presents the related work. Section 3 describes the problem statement. Section 4 describes the proposed approach. Section 5 presents running example of the proposed approach. Section 6 presents the validation of proposed approach on different datasets. Section 7 presents conclusion and future work. 2. Related Work There are many approaches for preserving privacy of sequential patterns. Abul et al. [2] gave a formal definition to sequence hiding problem. They provided two contributions: 1) They shifted the attention of people from the typical association rules mining to sequence data. 2) They provided the NP-hardness of hiding sequential patterns. The approach discussed in [1] can also be used for sanitization. Their approach for hiding sensitive sequential patterns is based on matching set. From matching set size they identify the position in transaction for sanitization. They first compute the matching set size for every transaction of the database then sort the database in decreasing order of matching set size. The transactions which satisfy the threshold are considered for sanitization. They repeat this process until there are no matches found i.e., matching set size become empty. They also discussed possible extensions of their work e.g., to frequent item sets and saptio-temporal sequential patterns. The main problem with the approach of the authors [2] is the computation of matching set size. This computation takes exponential time in worst case scenario. For large datasets the main problem encountered by this approach is efficiency. Abul et al. [3] extended their idea to sanitize the spatio-temporal locations. They have used the same concept of matching set to hide sensitive trajectories which are also presented in [3]. The problem is formulated as background network. In a background network, nodes represent the spatio-temporal locations and edges represent the paths. Basically, they hide these paths or trajectories. A multi objective scheme is presented for hiding sensitive sequential pattern in [22]. The authors analyze the sequences by constructing candidate tree. The candidate tree contains the sensitive items which are found in transaction. The first level of the tree contains length-1 candidate solutions, and subsequent levels contain length-2, length-3 etc., candidate solutions. For generating candidate solutions, the authors suggest that most of the sensitive patterns should be hidden with less distortions and little effect on non-sensitive patterns. To achieve the above mentioned situations, they provide a weighted summation called objective function. For every node in the candidate tree, an objective function is calculated. The purpose of this function is to choose the best candidate solution for a transaction. In the same way the authors calculate the best solutions for every transaction, and at the end an overall best solution is determined. The overall best solution is then applied on the whole database for sanitizing the database. The authors state that the original database and released database should be as similar as possible. Mhatra et al. [15] presented an approach of inserting fake elements into transactions for hiding sequential patterns. The technique used by authors is somewhat similar to data perturbation. They applied this approach at pre-processing level i.e., before data is available for data miner for mining purposes. Another set of approaches for hiding sequential patterns are secure two party computations [17, 18]. In [17] approach is applied on two party scenarios while in [18] it is applied on multi party scenario. The authors have used homomorphic key encryption technique to achieve privacy of sequential patterns. The problem addressed is collaborative sequential pattern mining of two parties. Both parties have vertically partitioned datasets D 1 and D 2. The approach first sorts the databases, find sequential patterns by using apriori algorithm and then applying homomorphic encryption. Homomorphic encryption generates a key pair for encrypting and decrypting of the data. However, the authors have not given any example of dataset on which they have applied the approach. No results have been discussed regarding the effectiveness of the approach. To achieve privacy preservation of sequential patterns, some data modification techniques have been used. The major techniques include data swapping, data randomization and data perturbation. In [20] data randomization technique is used for hiding sensitive patterns. Data randomization refers to adding some fake items to patterns in transactions. The authors have used h as a factor to keep track of the number of items inserted in transactions. They have adapted the prefix span algorithm and proposed privacy preserving

3 62 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 sequential mining (PPSpan) algorithm. Another approach from data modification category is used in [19]. The authors used data perturbation technique for achieving privacy preservation of sequential patterns. Data perturbation is somewhat similar to data randomization technique. In this technique the data is distorted i.e., noisy items are added to sequential patterns. In randomization the order of items in transaction is changed while in data perturbation noisy items are added to transactions. Ouyang et al. [19] have used a factor h for keeping the track of the noisy items. They insert noisy items by the h factor in transactions to make the data perturbed and to retrieve the original patterns back. They have adopted the Prefix Span algorithm and propose privacy preserving sequential patterns (PPSpan). Jin et al. [12] proposed a technique based on k- anonymity and α-dissociation to hide sensitive sequential patterns. For finding the sensitive sequential patterns they divide a sequential pattern into positive and negative items. The positive items in a sequential pattern represent that they are present in a sequence while the negative items do not occur in a sequence. In the algorithm, the authors first pick the sequential patterns with decreasing length. In those patterns they look for length 1 negative items i.e., 1 negative item in a sequence. If one negative item is present in a sequential pattern and its support is not greater than k or α then that pattern is sensitive pattern and they hide that sensitive sequential pattern. The values of k and α can be set to any level. The results could have been more accurate if negative items of length two or more would be considered. Furthermore, they state that SPAM is used for generating frequent sequential patterns and also compare their results with SPAM which is not a technique for hiding sensitive sequential patterns. Densa et al. [21] proposed their technique for hiding sensitive sequential patterns using k-anonymity technique. Their approach works in three steps. First, they construct the prefix tree of sequences given in dataset D. Tree is in the form of triplet containing N, Є, Root. Root represents root node, N is the finite set of labeled nodes, Є is the set of edges. Every node in the tree except the root has one parent in the path. In the second step all those nodes are pruned whose support is less than k. The frequencies of all frequent nodes in the tree are updated by 1. The sequences which contain the infrequent items are anonymized with their ancestors. The anonymized dataset is represented by D /. The dataset used for experiments is taken from the city of Milan, Italy. The dataset represents the moving objects. They have taken the sequences of the paths visited by the vehicles. One of the limitations of their approach is, for constructing the tree they have used Prefix Span. PrefixSpan scans database multiple times so it is time consuming task. The approach used by Kapoor et al. [13] is applied in distributed databases. The authors proposed a PRIPSEP (Privacy Preserving Sequential Pattern) algorithm which is an extension of SPAM. The proposed technique is applied on distributed databases i.e. databases from different parties. There are three sites namely, Data Miner, Non Colluding sites and Processing Site. Data Miner site acts as a collaborator between original databases. Non colluding site collects noisy data from each database. Processing site processes the secure computation between the databases. This is used by non colluding sites. The authors state that this approach is better than secure multiparty computation. In secure multiparty computation all the sites have to remain online until the process finishes. While in this approach there is no need for the sites to remain online. However, they have not provided any comparisons with secure multiparty computation. Kim et al. [14] the authors presented a technique for privacy preservation of sequential patterns for network traffic data. The authors mine frequent sequential patterns maintaining privacy preservation. For this purpose they use N-repository server model that acts as a single mining server. Every site partitions the network traffic into N groups and encrypts the data of each group. This encrypted information is then sent to one of N servers. Server determines frequent items by totaling the occurrence of each item received. For decrypting the frequent items discovered, they are sent to another server which has the corresponding decryption key. At the end all the servers perform decryption process for received items. They make one coordinating server which totals the occurrence frequent items and find original frequent items. Meta tables are also maintained at each site to quickly determine whether a frequent pattern has occurred or not. 3. Problem Statement A sequence is an ordered list S=s 1, s 2, s 3,..., s n, where, each s i (1 I n) is an item set, and is called an element which is denoted as (x 1, x 2,..., x m ) such that each x k (1 k m) Є and is a finite set of distinct items. A sequence α=a 1, a 2,, a n is called a subsequence of another sequence β=b 1, b 2,, b m, if there exists integers 1 j 1 < j 2 <...< j n m such that a 1 b j1, a 2 b j2,..., a n b jn. A sequence database contains D contains a set of sequences. Given a sequence database D and constraints the sequential pattern mining problem requires to find the complete set of sequential patterns in the database. The sequential patterns hiding problem is defined as follows: Let Sp={S 1, S 2,..., S n } be the set of sensitive sequential patterns that need to be sanitized in D. Let ψ be the threshold. We need to transform the D into D / such that: 1. Sp i Є Sp, supp D / (Sp i <= ψ). 2. Sp i Є Sp supp D (Sp) supp D / (Sp) is minimum. In the above problem D is the original database and D / is the released database. The problem highlights two requirements for hiding sequential patterns. First one is

4 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns 63 to modify database D in such a way that sensitive sequential patterns are hidden. The second requirement says to reduce the effects of sanitization on all those sequential patterns that are not sensitive. 4. Proposed Methodology The proposed approach is divided into two phases: 4.1. Identification of Sensitive Items: 4.2. Sanitization of Sensitive Items Identification of Sensitive Items This phase consists of the following steps: Generate FP Tree In this step, we scan the dataset and generate the FP tree. We read the transactions one by one and place the items of the transactions as nodes of the FP tree. We increment the count of the items by one, on every occurrence. Figure 1 summarizes the process for identification of sensitive items. First, we generate FP tree for the dataset. Once FP tree is generated we apply monotone and anti-monotone constraints to identify the sensitive items and populate the released database D. Algorithm 1. Input: D, α Output: D 1. D Null 2. Root Null 3. for each t D 3.1. FPTree generatefptree() 4. for each Tr FPTree 4.1. D IdentifySensitiveItems() Definition 1: FP Tree: Let D={t 1, t 2,..., t n }, be the transactional database of items, where T i be the i th transaction containing a set of items I={a 1, a 2,, a n }. Let ƹ be the threshold, a pattern p is frequent if p>=ƹ and p satisfies monotone and anti-monotone constraints Anti-monotone and Monotone Constraints Once we generate FP tree, we then apply antimonotone and monotone constraints. a. Anti-monotone Constraint A constraint Ca is anti-monotone; if a pattern S does not satisfy Ca then none of the super-patterns of S would satisfy Ca. Let I={a 1, a 2,, a n } be the given item set and P(I) be the power set of I. Let A and B are item sets of I such that A B^A>= ƹ=> B>=ƹ. Table 1 represents an example of Anti-monotone constraint. 1. Min (Profit)>= Max (Profit)<= 30 Table 1. Anti-monotone constraint Item Profit A 40 B 0 C -20 D 10 E -30 F 30 G 20 H -10 Let the transaction for Table1 be (a, b, c, d, e). Now, if we apply Min (Profit)>= 50 on the transaction, we see that item a does not satisfy this constraint. There is no need to check the rest of the transaction items as it would not satisfy this constraint as well. The same condition will hold for Max (Profit)<= 30. b. Monotone Constraint A constraint C is monotone; if a pattern S satisfies C then every sub-pattern of S would satisfy C. Let I={a 1, a 2,, a n } be the given item set and P(I) be the power set of I. Let A and B are item sets of I such that A B^B>= ƹ=> A>= ƹ. Table 2 represents an example of Monotone constraint. 1. Min (Profit) < = Max (Profit) > = 30 Table 2. Monotone Constraint Item Profit A 40 B 0 C -20 D 10 E -30 F 30 G 20 H -10 Let the transaction for Table 2 be (a, b, c, d, e). Now, if we apply Min(Profit)<= 15 on the transaction, we see that item a does not satisfy this constraint but item b does. Therefore, whole transaction will satisfy this constraint. The same condition will hold for Max(Profit)>= 30. Figure 1 represents the conceptual diagram for Phase 1. Figure 1. Identification of sensitive items. As it can be seen from Figure 2, we first generate FP tree from the dataset. We then apply anti-monotone and monotone constraints on the FP tree to identify sensitive sequential patterns.

5 64 The International Arab Journal of Information Technology, Vol. 12, No.1, January Sanitization of Sensitive Items Fuzzification of Data Fuzzy sets were introduced by Lutfi Zadeh in They can be viewed as an extension of the classical crisp sets. Crisp sets are discerning between members and non-members of a set by assigning 0 or 1 to each object of the universal set. Mathematically it can be represented as: µ A (x)=1 µ B (x)=0 Fuzzy sets generalize this function by assigning values that fall in a particular range of 0 to 1. X is the crisp (rigid boundaries) universal set and the function µ A is the membership function which defines set A. Formally, it can be represented as µ A : X [0,1]. The membership function that we use in proposed scheme is defined in Equation 3. This function is used to assign a membership degree to each of the elements in crisp set X. While fuzzifying data, one thing should be kept in mind that the support of fuzzy set A is given by crisp set containing all of the elements whose membership degree in A is not 0. To fuzzify data, we used the following membership function: (3) Y = (X J)/ I (1) (2) We apply Equation 3 to dataset and find fuzzy values of transactions. In this equation Y represents the output variable, X represents dataset, J represents lower limit and I represents upper limit of dataset. We divide the fuzzified dataset into three sets as high sensitive, medium sensitive and low sensitive. This shows that some items in dataset are more sensitive than the others. Lemma 1: Maximum sanitization can be achieved by fuzzifying values of Sensitive Items. Consider a database D that consists of the transactions Ti, i.e. D={T 1, T 2,..., T n }={Ti} i=1 to n, Where, n=total number of transactions. Let T i be the transaction and expressed as T={a 1, a 2,..., a m } for any number i=j. Consider for any T i Є D, ᴲ a set A i Є T i such that if A is monotone then P(A) also satisfy monotone constraint, it follows that A is sensitive or A Є SI or equivalently SI={A Є SI if A is monotone, A Є SI if A is not monotone}. Furthermore, consider a membership functions X i = G(f Ti ), where f Ti express the corresponding frequency of a i. Replace X i with the corresponding sensitivity SI i of A i. Lemma 1 represents the technique mathematically. It first states the identification of sensitive items and then it sanitizes the sensitive items. At the end we replace the values of sensitive items with sanitized values. Figure 2 represents conceptual diagram of phase 2: Figure 2. Sanitization of sensitive items. Table 3 shows the notations used in proposed algorithm. D D / N F Α T Table 3. Notations used in algorithm. Original Dataset Modified Dataset Total Number of Transactions Set of Fuzzy Values User defined threshold Set of Transactions Algorithm 2 shows the proposed algorithm for sanitization of sensitive sequential patterns. The algorithm takes sensitive sequential patterns as input. It reads the data and apply membership function defined in Equation 3 to generate fuzzy values for dataset. Once, we generate fuzzy values we divide them into three classes i.e., High Sensitive, Medium Sensitive and Low Sensitive. At the end of the algorithm, we replace the values with original dataset and produce the released dataset. Fuzzy values are divided into classes according to the following rules: Algorithm 2: Proposed algorithm for Sanitizing Sensitive Sequential Patterns. Input: 1. D (Original Dataset). 2. α: A user Specified threshold. Output: 3. D (Modified/Released Dataset ). 1. D =NULL. F=NULL. 2. for (i: 1 to N) { //calculate fuzzy values using membership function 2.1. for each (Ti ϵ D) { 2.2.F=(X I)/J //Membership Function } } 3. for each (Ti ϵ F) { 3.1. if (Ti > α) HS Ti 3.2. if (Ti == α) MS Ti 3.3. D D U HS 3.4. D D U MS } 4. End

6 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns Putting Model into Work Table 4 shows a part of sequence database. This table contains sequence id and the items bought in a particular sequence. The sequences represent the purchases that a customer made during his visits to market. In FP tree the root node is given a null value. Each transaction is scanned and the nodes are added to the tree. The nodes in the tree represent items of sequential patterns. If an item appears more than once in database its frequency is updated and written alongside the node. In the given example the sensitive pattern is abe. To eradicate large candidate generation we apply antimonotone constraint on the tree. Table 4. Sequential patterns. Seq.ID Sequence 1 (a)(be)(c)(d) 2 (a)(b)(gde) 3 (b)(c)(de) 4 (a)(be)(c)(d) 5 (ce)(d)(f)(g) 6 (a)(be)(c)(d) 7 (a)(be)(c)(d) 8 (b)(c)(d)(e) 9 (a)(be)(c)(d) 10 (cd)(f)(g)(e) Figure 3 represents the FP tree constructed from Table 4. 7 represents the frequency of b, 5 represents the frequency of e and 2 represents the frequencies of c and d. The frequencies are updated from the FP tree. The frequencies are further divided into three classes, which represent sensitivity level of items. The ranges are as follows: 0-3 represents low sensitivity, 4-6 represent medium sensitivity and 7 and above represent high sensitivity. Table 5. Dataset with frequency of occurrence. Tr_ID Tr_Data Frequency of Occurrence 1 abecd abgde bcde abecd cedfg abecd abecd bcde abecd cdfge cdfge bcde cedfg bcde cedfg bcadefg bcde abecd cdefg cedfg We apply a membership function of Equation 3 to fuzzify the values. The dataset in Table 6 is given as an input to the membership function and this function returns the fuzzy values against the frequency of items. Table 6 represents the fuzzified values for Table 5. Each value in Table 6 represents fuzzy values for the frequency of items in Table 6. 0 and Nan in fuzzified dataset represent that frequency of items was not available. At the end, we replace the fuzzified values with original dataset and produce the released dataset with sensitive sequential patterns hidden. Table 6. Fuzzified dataset. Figure 3. FP tree. The tree shown in Figure 3 is generated by scanning the dataset and inserting nodes into the tree. The root of the tree is labelled as null. The children nodes represent the items and their frequency of occurrence. The tree is constructed by reading the items from the transactions and inserting them into the tree. We apply monotone and anti-monotone constraints on Figure 5 according to constraints Definition in a and b for identification of sensitive items. Table 5 represents the dataset along with frequency of items. The items are divided into three categories i.e., high sensitive, medium sensitive and low sensitive. The occurrence frequencies are in the same order as of sequence order. i.e., 9 represents the frequency of a, Fuzzified Dataset NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6. Results and Discussion In this section, we present analysis of proposed scheme. We have experimented with three datasets. The first dataset is randomly generated over the

7 66 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 alphabet {a, b, c, d, e, f, g} with 1200 sequences. The second dataset is network traffic regarding TCP packets containing 7000 sequences. The third dataset is also of network traffic of UDP packets containing 8000 sequences. All experiments were performed on Intel Core 2 Duo processor having 2 GB memory. There are two stages of development and experimentation. Identification of sensitive items is performed using C#.net while the modification of sensitive items is done using Fuzzy Logic. For implementing fuzzy logic MATLAB is used. All the simulations and graphs have been generated using MATLAB. We compared our approach with the approach presented in [2, 22]. Both [2, 22] are algorithms for hiding sensitive sequential patterns. We refer to both as Abul and Rahbarinia datasets. The comparison criteria is based on multiple database scans and number of modifications for hiding sensitive sequential patterns. Figures 4, 5, 6, and 7 represent experimental results and comparison with [2, 22]. Figure 4 represents comparison with [2], while 5 represents comparison with [22]. Figures 6 and 7 represent comparison with [2, 22] with respect to number of modifications made in both approaches. In Figure 4 we have shown the comparison for TCP dataset and Figure 5 represents the comparison for UDP dataset. In all these figures, X-axis represents number of transactions while Y-axis represents sensitivity level after data modification in original dataset. It can be seen from the figures that proposed approach has reduced the sensitivity level considerably as compared with [2]. This shows that proposed approach has almost achieved maximum sanitization. of modifications for sanitizing sequential patterns. Figure 6 shows comparison of number of modifications with [22] while 7 shows comparison of number of modifications with [2]. The figure shows numbers of transactions on X-axis while number of modifications on Y-axis. It can be clearly seen that number of modifications required to sanitize sensitive sequential patterns in proposed approach are less than the existing approaches. This shows that proposed approach is better than the existing approaches. No.of modifications No.of modifications Transactions Figure 6. Number of modifications. Transactions Figure 7. Number of modifications. Fuzzy based Rahbarinia et.al. [22] Fuzzy based PPDM Abul et.al. [2] Degree of sensitivity Transactions Figure 4. Results Comparison. 7. Conclusions and Future Work Privacy preservation of sequential patterns still is not explored in depth. We looked at different techniques proposed by people to address the issue of privacy preservation in sequential patterns. The work proposed in this paper is also another step towards addressing this problem by providing a solution. We experimented and evaluated the proposed approach by three datasets. We also presented comparison of proposed approach with existing approaches and found that proposed approach works considerably well than currently existing approaches. In future, we would further explore this area and try to find out new technique for sanitizing sensitive sequential patterns. We would also look at using evolutionary approaches for sanitization purposes. Figure 5. Results Comparison. Figures 6 and 7 represent the comparison of proposed approach with [2, 22] with respect to number References [1] Abbas A. and Liu J., Designing an Intelligent Recommender System using Partial Credit Model and Bayesian Rough Set, the International Arab Journal Of Information Technology, vol. 9, no. 2, pp , 2012.

8 A Fuzzy Based Scheme for Sanitizing Sensitive Sequential Patterns 67 [2] Abul O., Atzori M., Bonchi F., and Giannotti F., Hiding Sequences, in Proceedings of the 23 rd International Conference on Data Engineering Workshop, Istanbul, Turkey, pp , [3] Abul O., Bonchi F., and Giannotti F., Hiding Sequential and Spatio-Temporal Patterns, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 12, pp , [4] Agarwal R. and Srikant R., Mining Sequential Patterns, in Proceedings of the 11 th International Conference on Data Engineering, Taipei, Taiwan, pp. 3-14, [5] Chen M., Han J., and Yu P., Data Mining: An Overview from a Database Perspective, IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp , [6] Clifton C. and Marks D., Security and Privacy Implications of Data Mining, in Proceedings of the ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, Montreal, Canada, pp.15-19, [7] El-Hajj M., Bifold Constraint-Based Mining By Simultaneous Monotone and Anti-Monotone Checking, in Proceedings of the 15 th International Conference on Data Mining, Texas, USA, pp , [8] Evfimievski A., Srikant R., Agrawal R., and Gehrk J., Privacy Preserving Mining of Association Rules, in Proceedings of the 8 th Conference on Knowledge Discovery and Data Mining, New York, USA, pp. 1-12, [9] Gupta M. and Josh C., Privacy Preserving Fuzzy Association Rules Hiding in Quantitative Data, International Journal of Computer Theory and Engineering, vol. 1, no. 4, pp , [10] Han J. and Kamber M., Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, USA, [11] Han J., PEI J., Yin Y., and Mao R., Mining Frequent Patterns without Candidate Generation, in Proceedings of the 2000 ACM SIGMOD Workshop on Data Mining and Knowledge Discovery, pp. 1-12, [12] Jin H., Chen J., He H., and O Keefe C., Privacy- PreservingSequential Pattern Release, in Proceedings of The Pacific-Asia Conference on Knowledge Discovery and Data Mining, Nanjing, China, pp , [13] Kapoor V., Pocelet P., and Teisseire M., Privacy Preserving Sequential Pattern Mining in Distributed Databases, in Proceedings of the Conference on Information and Knowledge Management, Virginia, USA, pp , [14] Kim S., Park S., Won J., and Kim W., Privacy Preserving Data Mining of Sequential Patterns for Network Traffic Data, Infromation Sciences Journal, vol. 178, no. 3, pp , [15] Mhatra A., Verma M., and Toshniwal D., Privacy Preserving Sequential Pattern Mining in Progressive Databases using Noisy Data, in Proceedings of the 13 th International Conference Information Visualisation, California, USA, pp , [16] Naeem M. and Asghar S., A Novel Architecture for Hiding Sensitive Association Rules, in Proceedings of the International Conference on Data Mining, Nevada, USA, [17] Ouyang W. and Huang Q., Privacy Preserving Sequential Mining Based on Secure Two-Party Computation, in Proceedings of the 5 th International Conference on Machine Learning and Cybernetics, Guangzhou, China, pp , [18] Ouyang W. and Huang W., Privacy Preserving Sequential Pattern Mining Based on Secure Multi-Layer Computation, in Proceedings of the International Conference on Information Acquisition, China, pp , [19] Ouyang W., Xin H., and Huang Q., Privacy Preserving Sequential Pattern Mining Based on Data Perturbation, in Proceedings of the 6 th International Conference on Machine Learning and Cybernetics, Hong Kong, China, pp , [20] Ouyang W., Huang Q., and Xin H., A Randominzation Approach to Mining Sequential Pattern with Privacy Preserving, in Proceedings of the International Symposium on Computational Intelligence and Design, Wuhan, China, pp , [21] Pensa R., Monreale A., Pinelli F., and Pedreschi D., Pattern-Preserving k-anonymization of Sequences and its Application to Mobility Data Mining, Workshop co-located with ESORICS, Malaga, Spain, pp. 1-17, [22] Rahbarinia B., Pedram M., Arabnia H., and Alavi Z., A Multi-Objective Scheme to Hide Sequential Patterns, in Proceedings of the 2 nd International Conference on Computer and Automation Engineering, Singapore, vol. 1, pp , 2010.

9 68 The International Arab Journal of Information Technology, Vol. 12, No.1, January 2015 Faisal Shahzad is Lecturer in Mohammad Ali Jinnah University, Pakistan. He looks at coordinating and managing instructional labs at Mohammad Ali Jinnah University. He is also visiting faculty member at University Institute of Information Technology (UIIT), Rawalpindi. In 2007 he received his BC degree in computer science from Mohammad Ali Jinnah University, Islamabad. From 2007 to 2009 he worked as a software engineer in a software company in Islamabad. In 2012 he received his MS in computer science from Mohammad Ali Jinnah University, Islamabad. He completed his thesis under the supervision of Dr. Sohail Asghar. Sohail Asghar is Director at University Institute of Information Technology (UIIT), PMAS-Arid Agriculture University, Pakistan. He is also Head of the Center of Research in Data Engineering (CORDE) Research Group. Prior to current position he was serving as an Associate Professor of computer science, Department of computer sciences, Faculty of Engineering and Applied Sciences, Mohammad Ali Jinnah University, Islamabad, Pakistan. Previously, heworked an Assistant Professor of computer sciences and head of R and D, Department of Computer Sciences, Shaheed Zulfikar Ali Bhutto Institute of Science and Technology, Pakistan. Previously, he was Research Associate and Assistant Lecturer in Clayton School of Information Technology, Faculty of Information Technology at Monash University, Australia. In 1994, he graduated with honors in computer science from the University of Wales, United Kingdom. From 1994 to 2002, he worked as a senior software engineer in a software company in Islamabad. He then obtained his PhD in Information Technology at Monash University, Melbourne Australia in Khalid Usmani is assistant professor at University Institute of Information Technology (UIIT), PMAS-Arid Agriculture University, Pakistan. His areas of interest in research are computer networks, network security and information security. He is very active in research and supervises many students of MS (CS) in their research. He has done his PhD in wireless network security from University Teknologi Malaysia.

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: kc.gomathi@gmail.com M. Moorthi

More information

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules

Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582

International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582 1582 AN EFFICIENT CRYPTOGRAPHIC APPROACH FOR PRESERVING PRIVACY IN DATA MINING T.Sujitha 1, V.Saravanakumar 2, C.Saravanabhavan 3 1. M.E. Student, Sujiraj.me@gmail.com 2. Assistant Professor, visaranams@yahoo.co.in

More information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information

Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

Comparison of Data Mining Techniques for Money Laundering Detection System

Comparison of Data Mining Techniques for Money Laundering Detection System Comparison of Data Mining Techniques for Money Laundering Detection System Rafał Dreżewski, Grzegorz Dziuban, Łukasz Hernik, Michał Pączek AGH University of Science and Technology, Department of Computer

More information

Privacy Preserved Association Rule Mining For Attack Detection and Prevention

Privacy Preserved Association Rule Mining For Attack Detection and Prevention Privacy Preserved Association Rule Mining For Attack Detection and Prevention V.Ragunath 1, C.R.Dhivya 2 P.G Scholar, Department of Computer Science and Engineering, Nandha College of Technology, Erode,

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING(27-32) AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING

AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING(27-32) AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING Ravindra Kumar Tiwari Ph.D Scholar, Computer Sc. AISECT University, Bhopal Abstract-The recent advancement in data mining technology

More information

A Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments

A Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments A Hybrid Data Mining Approach for Analysis of Patient Behaviors in RFID Environments incent S. Tseng 1, Eric Hsueh-Chan Lu 1, Chia-Ming Tsai 1, and Chun-Hung Wang 1 Department of Computer Science and Information

More information

Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining

Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining by Ashish Mangalampalli, Vikram Pudi Report No: IIIT/TR/2008/127 Centre for Data Engineering International Institute of Information Technology

More information

Information Security in Big Data using Encryption and Decryption

Information Security in Big Data using Encryption and Decryption International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor

More information

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS

International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, chittiprolumounika@gmail.com; Third C.

More information

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases

A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases A Novel Technique of Privacy Protection Mining of Association Rules from Outsource Transaction Databases 1 Dhananjay D. Wadkar, 2 Santosh N. Shelke 1 Computer Engineering, Sinhgad Academy of Engineering

More information

Privacy Preserving Outsourcing for Frequent Itemset Mining

Privacy Preserving Outsourcing for Frequent Itemset Mining Privacy Preserving Outsourcing for Frequent Itemset Mining M. Arunadevi 1, R. Anuradha 2 PG Scholar, Department of Software Engineering, Sri Ramakrishna Engineering College, Coimbatore, India 1 Assistant

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

An Improved Algorithm for Fuzzy Data Mining for Intrusion Detection

An Improved Algorithm for Fuzzy Data Mining for Intrusion Detection An Improved Algorithm for Fuzzy Data Mining for Intrusion Detection German Florez, Susan M. Bridges, and Rayford B. Vaughn Abstract We have been using fuzzy data mining techniques to extract patterns that

More information

A Secure Model for Medical Data Sharing

A Secure Model for Medical Data Sharing International Journal of Database Theory and Application 45 A Secure Model for Medical Data Sharing Wong Kok Seng 1,1,Myung Ho Kim 1, Rosli Besar 2, Fazly Salleh 2 1 Department of Computer, Soongsil University,

More information

A Time Efficient Algorithm for Web Log Analysis

A Time Efficient Algorithm for Web Log Analysis A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,

More information

The Fuzzy Frequent Pattern Tree

The Fuzzy Frequent Pattern Tree The Fuzzy Frequent Pattern Tree STERGIOS PAPADIMITRIOU 1 SEFERINA MAVROUDI 2 1. Department of Information Management, Technological Educational Institute of Kavala, 65404 Kavala, Greece 2. Pattern Recognition

More information

PRIVACY PRESERVING ASSOCIATION RULE MINING

PRIVACY PRESERVING ASSOCIATION RULE MINING Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,

More information

Mining Multi Level Association Rules Using Fuzzy Logic

Mining Multi Level Association Rules Using Fuzzy Logic Mining Multi Level Association Rules Using Fuzzy Logic Usha Rani 1, R Vijaya Praash 2, Dr. A. Govardhan 3 1 Research Scholar, JNTU, Hyderabad 2 Dept. Of Computer Science & Engineering, SR Engineering College,

More information

On Mining Group Patterns of Mobile Users

On Mining Group Patterns of Mobile Users On Mining Group Patterns of Mobile Users Yida Wang 1, Ee-Peng Lim 1, and San-Yih Hwang 2 1 Centre for Advanced Information Systems, School of Computer Engineering Nanyang Technological University, Singapore

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

Data Outsourcing based on Secure Association Rule Mining Processes

Data Outsourcing based on Secure Association Rule Mining Processes , pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, kanak.saxena@gmail.com D.S. Rajpoot Registrar,

More information

SPMF: a Java Open-Source Pattern Mining Library

SPMF: a Java Open-Source Pattern Mining Library Journal of Machine Learning Research 1 (2014) 1-5 Submitted 4/12; Published 10/14 SPMF: a Java Open-Source Pattern Mining Library Philippe Fournier-Viger philippe.fournier-viger@umoncton.ca Department

More information

IncSpan: Incremental Mining of Sequential Patterns in Large Database

IncSpan: Incremental Mining of Sequential Patterns in Large Database IncSpan: Incremental Mining of Sequential Patterns in Large Database Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 hcheng3@uiuc.edu Xifeng

More information

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Multi-table Association Rules Hiding

Multi-table Association Rules Hiding Multi-table Association Rules Hiding Shyue-Liang Wang 1 and Tzung-Pei Hong 2 1 Department of Information Management 2 Department of Computer Science and Information Engineering National University of Kaohsiung

More information

Method of Fault Detection in Cloud Computing Systems

Method of Fault Detection in Cloud Computing Systems , pp.205-212 http://dx.doi.org/10.14257/ijgdc.2014.7.3.21 Method of Fault Detection in Cloud Computing Systems Ying Jiang, Jie Huang, Jiaman Ding and Yingli Liu Yunnan Key Lab of Computer Technology Application,

More information

Mining Sequence Data. JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych

Mining Sequence Data. JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych Mining Sequence Data JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych Outline of the presentation 1. Realtionships to mining frequent items 2. Motivations for

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment

A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment www.ijcsi.org 434 A generalized Framework of Privacy Preservation in Distributed Data mining for Unstructured Data Environment V.THAVAVEL and S.SIVAKUMAR* Department of Computer Applications, Karunya University,

More information

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical

More information

Efficient Integration of Data Mining Techniques in Database Management Systems

Efficient Integration of Data Mining Techniques in Database Management Systems Efficient Integration of Data Mining Techniques in Database Management Systems Fadila Bentayeb Jérôme Darmont Cédric Udréa ERIC, University of Lyon 2 5 avenue Pierre Mendès-France 69676 Bron Cedex France

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Visualizing e-government Portal and Its Performance in WEBVS

Visualizing e-government Portal and Its Performance in WEBVS Visualizing e-government Portal and Its Performance in WEBVS Ho Si Meng, Simon Fong Department of Computer and Information Science University of Macau, Macau SAR ccfong@umac.mo Abstract An e-government

More information

Knowledge Based Context Awareness Network Security For Wireless Networks

Knowledge Based Context Awareness Network Security For Wireless Networks Knowledge Based Context Awareness Network Security For Wireless Networks Deepa U. Mishra Department of Computer Engineering Smt. Kashibai Navale College of Engineering Pune, India Abstract - Context awareness

More information

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India

Manjeet Kaur Bhullar, Kiranbir Kaur Department of CSE, GNDU, Amritsar, Punjab, India Volume 5, Issue 6, June 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Multiple Pheromone

More information

An Empirical Study of Application of Data Mining Techniques in Library System

An Empirical Study of Application of Data Mining Techniques in Library System An Empirical Study of Application of Data Mining Techniques in Library System Veepu Uppal Department of Computer Science and Engineering, Manav Rachna College of Engineering, Faridabad, India Gunjan Chindwani

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

A Genetic Algorithm Approach for Solving a Flexible Job Shop Scheduling Problem

A Genetic Algorithm Approach for Solving a Flexible Job Shop Scheduling Problem A Genetic Algorithm Approach for Solving a Flexible Job Shop Scheduling Problem Sayedmohammadreza Vaghefinezhad 1, Kuan Yew Wong 2 1 Department of Manufacturing & Industrial Engineering, Faculty of Mechanical

More information

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,

More information

Using Data Mining Methods to Predict Personally Identifiable Information in Emails

Using Data Mining Methods to Predict Personally Identifiable Information in Emails Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Data Mining: Association Analysis Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of

More information

A NURSING CARE PLAN RECOMMENDER SYSTEM USING A DATA MINING APPROACH

A NURSING CARE PLAN RECOMMENDER SYSTEM USING A DATA MINING APPROACH Proceedings of the 3 rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 8) J. Li, D. Aleman, R. Sikora, eds. A NURSING CARE PLAN RECOMMENDER SYSTEM USING A DATA MINING APPROACH Lian Duan

More information

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,

More information

Intrusion Detection Using Data Mining Along Fuzzy Logic and Genetic Algorithms

Intrusion Detection Using Data Mining Along Fuzzy Logic and Genetic Algorithms IJCSNS International Journal of Computer Science and Network Security, VOL.8 No., February 8 7 Intrusion Detection Using Data Mining Along Fuzzy Logic and Genetic Algorithms Y.Dhanalakshmi and Dr.I. Ramesh

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com

More information

Network (Tree) Topology Inference Based on Prüfer Sequence

Network (Tree) Topology Inference Based on Prüfer Sequence Network (Tree) Topology Inference Based on Prüfer Sequence C. Vanniarajan and Kamala Krithivasan Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai 600036 vanniarajanc@hcl.in,

More information

Preventing Denial-of-request Inference Attacks in Location-sharing Services

Preventing Denial-of-request Inference Attacks in Location-sharing Services Preventing Denial-of-request Inference Attacks in Location-sharing Services Kazuhiro Minami Institute of Statistical Mathematics, Tokyo, Japan Email: kminami@ism.ac.jp Abstract Location-sharing services

More information

Data Mining Approach in Security Information and Event Management

Data Mining Approach in Security Information and Event Management Data Mining Approach in Security Information and Event Management Anita Rajendra Zope, Amarsinh Vidhate, and Naresh Harale Abstract This paper gives an overview of data mining field & security information

More information

Directed Graph based Distributed Sequential Pattern Mining Using Hadoop Map Reduce

Directed Graph based Distributed Sequential Pattern Mining Using Hadoop Map Reduce Directed Graph based Distributed Sequential Pattern Mining Using Hadoop Map Reduce Sushila S. Shelke, Suhasini A. Itkar, PES s Modern College of Engineering, Shivajinagar, Pune Abstract - Usual sequential

More information

Three Effective Top-Down Clustering Algorithms for Location Database Systems

Three Effective Top-Down Clustering Algorithms for Location Database Systems Three Effective Top-Down Clustering Algorithms for Location Database Systems Kwang-Jo Lee and Sung-Bong Yang Department of Computer Science, Yonsei University, Seoul, Republic of Korea {kjlee5435, yang}@cs.yonsei.ac.kr

More information

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK

HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK HYBRID INTRUSION DETECTION FOR CLUSTER BASED WIRELESS SENSOR NETWORK 1 K.RANJITH SINGH 1 Dept. of Computer Science, Periyar University, TamilNadu, India 2 T.HEMA 2 Dept. of Computer Science, Periyar University,

More information

A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers

A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers , pp.155-164 http://dx.doi.org/10.14257/ijunesst.2015.8.1.14 A RFID Data-Cleaning Algorithm Based on Communication Information among RFID Readers Yunhua Gu, Bao Gao, Jin Wang, Mingshu Yin and Junyong Zhang

More information

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

Bisecting K-Means for Clustering Web Log data

Bisecting K-Means for Clustering Web Log data Bisecting K-Means for Clustering Web Log data Ruchika R. Patil Department of Computer Technology YCCE Nagpur, India Amreen Khan Department of Computer Technology YCCE Nagpur, India ABSTRACT Web usage mining

More information

ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining

ISSN: 2348 9510. A Review: Image Retrieval Using Web Multimedia Mining A Review: Image Retrieval Using Web Multimedia Satish Bansal*, K K Yadav** *, **Assistant Professor Prestige Institute Of Management, Gwalior (MP), India Abstract Multimedia object include audio, video,

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Business Lead Generation for Online Real Estate Services: A Case Study

Business Lead Generation for Online Real Estate Services: A Case Study Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University

More information

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing

A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing A Dynamic Resource Management with Energy Saving Mechanism for Supporting Cloud Computing Liang-Teh Lee, Kang-Yuan Liu, Hui-Yang Huang and Chia-Ying Tseng Department of Computer Science and Engineering,

More information

College information system research based on data mining

College information system research based on data mining 2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei

More information

Random forest algorithm in big data environment

Random forest algorithm in big data environment Random forest algorithm in big data environment Yingchun Liu * School of Economics and Management, Beihang University, Beijing 100191, China Received 1 September 2014, www.cmnt.lv Abstract Random forest

More information

Mining Mobile Group Patterns: A Trajectory-Based Approach

Mining Mobile Group Patterns: A Trajectory-Based Approach Mining Mobile Group Patterns: A Trajectory-Based Approach San-Yih Hwang, Ying-Han Liu, Jeng-Kuen Chiu, and Ee-Peng Lim Department of Information Management National Sun Yat-Sen University, Kaohsiung, Taiwan

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

Application Tool for Experiments on SQL Server 2005 Transactions

Application Tool for Experiments on SQL Server 2005 Transactions Proceedings of the 5th WSEAS Int. Conf. on DATA NETWORKS, COMMUNICATIONS & COMPUTERS, Bucharest, Romania, October 16-17, 2006 30 Application Tool for Experiments on SQL Server 2005 Transactions ŞERBAN

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

A Survey of Quantification of Privacy Preserving Data Mining Algorithms

A Survey of Quantification of Privacy Preserving Data Mining Algorithms A Survey of Quantification of Privacy Preserving Data Mining Algorithms Elisa Bertino, Dan Lin, and Wei Jiang Abstract The aim of privacy preserving data mining (PPDM) algorithms is to extract relevant

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET)

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976 6367(Print), ISSN 0976 6367(Print) ISSN 0976 6375(Online)

More information

Discovering Sequential Rental Patterns by Fleet Tracking

Discovering Sequential Rental Patterns by Fleet Tracking Discovering Sequential Rental Patterns by Fleet Tracking Xinxin Jiang (B), Xueping Peng, and Guodong Long Quantum Computation and Intelligent Systems, University of Technology Sydney, Ultimo, Australia

More information

Privacy-Preserving Mining of Association Rules On Cloud by improving Rob Frugal Algorithm

Privacy-Preserving Mining of Association Rules On Cloud by improving Rob Frugal Algorithm Privacy-Preserving Mining of Association Rules On Cloud by improving Rob Frugal Algorithm Mr.Vishal R. Redekar Department of Computer Engineering Smt.Kashibai Navale College of Engineering Pune-411041,

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns

An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns An Efficient GA-Based Algorithm for Mining Negative Sequential Patterns Zhigang Zheng 1, Yanchang Zhao 1,2,ZiyeZuo 1, and Longbing Cao 1 1 Data Sciences & Knowledge Discovery Research Lab Centre for Quantum

More information

Load Balancing on a Grid Using Data Characteristics

Load Balancing on a Grid Using Data Characteristics Load Balancing on a Grid Using Data Characteristics Jonathan White and Dale R. Thompson Computer Science and Computer Engineering Department University of Arkansas Fayetteville, AR 72701, USA {jlw09, drt}@uark.edu

More information

A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs

A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs Michael Scherger Department of Computer Science Texas Christian University Email: m.scherger@tcu.edu Zakir Hussain Syed

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

A Survey on Association Rule Mining in Market Basket Analysis

A Survey on Association Rule Mining in Market Basket Analysis International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey

More information

Application of Data Mining Techniques in Intrusion Detection

Application of Data Mining Techniques in Intrusion Detection Application of Data Mining Techniques in Intrusion Detection LI Min An Yang Institute of Technology leiminxuan@sohu.com Abstract: The article introduced the importance of intrusion detection, as well as

More information

A Layered Signcryption Model for Secure Cloud System Communication

A Layered Signcryption Model for Secure Cloud System Communication Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.1086

More information

An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset

An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset P.Abinaya 1, Dr. (Mrs) D.Suganyadevi 2 M.Phil. Scholar 1, Department of Computer Science,STC,Pollachi

More information

A Statistical Text Mining Method for Patent Analysis

A Statistical Text Mining Method for Patent Analysis A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, shjun@cju.ac.kr Abstract Most text data from diverse document databases are unsuitable for analytical

More information