Exploratory Data Analysis Leading towards the Most Interesting Binary Association Rules

Size: px
Start display at page:

Download "Exploratory Data Analysis Leading towards the Most Interesting Binary Association Rules"

Transcription

1 Exploratory Data Analysis Leading towards the Most Interesting Binary Association Rules Alfonso Iodice D Enza 1, Francesco Palumbo 2, and Michael Greenacre 3 1 Department of Mathematics and Statistics - Università di Napoli Federico II Complesso Universitario di Monte S. Angelo, Via Cinthia I Napoli, Italy ( iodicede@unina.it) 2 Department of Economics and Finance - Università di Macerata Via Crescimbeni, 20 I Macerata, Italy ( francesco.palumbo@unimc.it) 3 Department of Economics and Business - Universitat Pompeu Fabra Ramon Trias Fargas, E Barcelona, Spain ( michael.greenacre@upf.edu) Abstract. Association Rules (AR) represent one of the most powerful and largely used approach to detect the presence of regularities and paths in large databases. Rules express the relations (in terms of co-occurence) between pairs of items and are defined in two parts: support and confidence. Most techniques for finding AR scan the whole data set, evaluate all possible rules and retain only rules that have support and confidence greater than thresholds, which should be fixed in order to avoid both that only trivial rules are retained and also that interesting rules are not discarded. This paper proposes a two steps interactive, graphical approach that uses factorial planes in the identification of potentially interesting items. Keywords: Association Rules, Classification, Binary variables. 1 Introduction Association rules (AR) [Agrawal et al., 1993] represent a suitable data mining tool to identify frequently occurring patterns of information in large data bases. The classic field of application of AR mining is market basket analysis (MBA); in this context, data are stored as transactions: each transaction is a binary sequence that records the presence/absence of a set of p features. The basic data structure consists of an n p Boolean matrix; the terms of the table are 1 and 0, which correspond to the states presence and absence, respectively. MBA is one of the first and better known application fields of AR mining: however, the binary structure of the data makes AR applicable in Acknownledgements: The paper was financially supported by PRIN2003 grant: Multivariate Statistical and Visualization Methods to Analyze, to Summarize, and to Evaluate Performance Indicators research unit of University of Macerata, responsible F. Palumbo.

2 Exploratory Data Analysis Leading towards many different contexts. AR are largely used in text mining, image analysis and microarray data analysis. An AR in its simplest form involves a pair of items playing different roles: the antecedent part of the rule (body), and the consequent part of the rule (head). The relation characterizing the considered items is usually expressed in two different measures: support and confidence. The support is the intensity of the association between the considered items; the confidence measures the strength of the logical dependence expressed by the rule. An example illustrates an AR in its simplest form: let A and B be a pair of items, beer and chips for example, the following notation represents the achieved AR: A = B = {support = 20%, confidence = 80%}. The rule shows that 20 percent of customers buy both beer and chips, and if a customer buys beer, in 80 percent of cases buys chips too. Simple rules does not represent the whole output of a mining process, since complex AR are extracted too. In its most general definition a complex AR is a rule in which there are one or more antecedent items and one or more consequents. Complex AR represent a more powerful tool, but they are even harder to handle. The large size of the starting data matrix implies very high computational efforts in mining rules, and it can lead to a massive quantity of rules. Many proposed algorithms lead to mine rules in more and more efficient ways, but the identification and selection of interesting rules remains a still opened problem. It requires the definition of consistent criteria to avoid the risk that the truly interesting information is hidden by the presence of trivial and redundant rules. The great part of the contributions in the AR literature is aimed to the implementation of algorithms for the generation of AR in reduced computational costs and output amount, as well as for the generalization of AR to categorical and numerical data. The reference point among these algorithms is the Apriori, introduced by [Agrawal and Srikant, 1994]. This algorithm consists of two phases: the frequent itemset mining phase and the properly defined association rule mining phase. An itemset is frequent if the items involved in it co-occur with a frequency greater than a user-defined threshold, in other words the itemset support is greater than the assigned minimal support threshold. In the second phase, there is the generation of all the possible rules deriving from the itemsets previously selected: the output rules provided by the procedure are those characterized by confidence exceeding a minimal confidence threshold. Based on the same idea of the Apriori, the AprioriTid [Agrawal and Srikant, 1994] introduces an encoding of the founded frequent itemsets in order to reduce the computational effort. The AprioriHybrid is a combination of both Apriori and AprioriTid, in the earlier and the latter iterations respectively [Agrawal and Srikant, 1994].

3 258 Iodice D Enza et al. Other interesting algorithms are the DHP (Direct Hashing and Pruning) [Park et al., 1995], the Partition algorithm [Savasere et al., 1995], the DIC (Dynamic Itemset Counting ) algorithm [Brin et al., 1997] and the FP-growth (Frequent Pattern growth) algorithm [Han et al., 2000] for which we refer the reader to the bibliography. The procedures above are applicable to Boolean data. To generalize the algorithms to numerical and categorical AR, a previous recoding of the data is required: in this sense the contribution of [Srikant and Agrawal, 1996] and [Miller and Yang, 1997]. A last class of proposals is aimed to solve a common problem of AR mining, the huge number of rules generated, selecting and identifying the interesting generated AR. The contributions belonging to this class, like [Zaki, 2000] and [Liu et al., 1999], are characterized by a same target as well as approaches of different nature. Almost all AR mining algorithms use thresholds whose settings are related to a trade-off: tight thresholds can cause loss of interesting information; otherwise, loose thresholds cause an excessive number of uninteresting rules to be selected. In addition, notice that the reduction of output rules is anyway linked to the generation of all potential rules. The present paper proposes an exploratory strategy to identify a priori items that are potentially interesting as antecedent or consequent parts of rules. The method does not produce AR but provides the user with information about the most probably interesting items. One of the above mentioned algorithms must be used, focusing the attention only to those items that the procedure indicated as interesting, to obtain the AR set. Reducing the number of considered items, the user can define looser thresholds, avoiding as well the risk of a huge amount of rules. In addition, information about the items help the user to pay greater attention to the rules containing the previously identified items. The proposed strategy exploits the graphical and analytical capabilities of multidimensional data analysis (MDA), in particular the paper will focus on the computational aspects when n and p are large. The procedure deals with the following data structures: let Z be a (n p) presence/absence matrix characterized by n binary sequences considered with respect to p Boolean variables; in our application the n rows refer to baskets of items purchased and the p columns refer to the items (with coding 1 = buy and 0 = not buy). Let S = n 1 Z T Z be a symmetric contingency table, whose general extra-diagonal term s jj = s j j represents the support with respect to the items j and j, in our case, the relative frequency of purchasing pairs of items. The asymmetric square matrix C is defined as C = Z T ZD 1, where D is a diagonal matrix having general term d jj = s jj (j = 1,...,p). Matrix C has the general diagonal term c jj = 1 while for j j the term c jj corresponds to the confidence of the rule {A j = A j }. In Section 2 we present the steps of the strategy and the related tools: clustering phase (subsection 2.1), items selection according to the supports

4 Exploratory Data Analysis Leading towards (subsection 2.2), identification of rules bodies and heads (subsection 2.3). In section 3 we present an example on the BMS-Webview-2 dataset. 2 Multidimensional data analysis (MDA) approach The proposed strategy consists of two main phases and aims at generating a reduced number of AR; in particular the phases are: i) partitioning of the considered binary sequences (transactions) in homogeneous classes; ii) selecting interesting items and visually representing the interesting rules using MDA techniques. In the case of huge data sets the applicability of the whole procedure strongly depends on the the first phase of classification: this step is very expensive in terms of time elapsed and memory required. Thus it is necessary to choose an algorithm increasing speed and efficiency of the whole procedure. Once the group are determined, the attention is focused on the supports and confidences matrices of order p p related to each group. In addiction, our proposal is based on a suitable factorial analysis on these matrices that requires short computing time and low memory usage: the most time consuming phase consists in the singular value decomposition of symmetric matrices. Moreover, partitioning the data in groups permits to perform the analysis on parallel computing architectures. The aim is to select the most occurring pairs of items in each group and to assign the role of antecedent or consequent to the set of selected items. 2.1 Clustering transactions The general aim of clustering techniques is to partition the statistical units of a given data set in disjoint classes such that similar units are grouped together. Dealing with large, high dimensional and sparse data sets, classic clustering techniques like K-means algorithm [Hartigan, 1975] and agglomerative algorithm require very high computational costs and do not guarantee reliable solutions. Thus, in the literature, there are many contributions proposing algorithms optimized for massive amounts of binary data: the ROCK algorithm proposed by [Guha et al., 2000], that is based on links and represents an agglomerative hierarchical clustering; QROCK that is a speeded up version of ROCK, while a density based algorithm considering links is the SNN (shared nearest neighbors). Two of the non-hierarchical clustering algorithms for binary data are LWC (light weight clustering) and incremental K-means proposed by [Gaber et al., 2004] and [Ordonez, 2003], respectively: the main aim of both these procedures is clustering of data streams, which are flows of binary sequences. In this paper, however, the procedure is applied on a finite set of binary sequences.

5 260 Iodice D Enza et al. The first step of the proposed strategy is then implemented exploiting the features of one of the above procedures, incremental K-means that is a nonhierarchical algorithm and it is characterized by doing a single iteration to get the partition of the rows of the binary matrix Z in K disjoint classes. The logical distance between the binary rows of Z is measured trough the Jaccard coefficient. The incremental K-means takes as input the number of clusters that is hence user-defined. The output provided by the procedure consists of: the partition matrices Z k (n k p), where k = 1,...,K; the centroid matrix C that is (K p), with cluster centroids on rows; a K-elements vector w of cluster weights such that w k = n k n ; a (K p) matrix R of squared distances. The initialization phase of Incremental K-means presents a difference with Standard K-means: instead of using k random sequences as centers, this algorithm exploits global statistics (mean and variance) of the input indicator matrix to obtain the starting centers. Furthermore, the procedure does not update the centroid matrix C and the cluster weights vector w at every binary sequence but every (n/l) times, where n and L are the number of considered sequences and an initialization parameter, respectively. The reader is referred to [Ordonez, 2003] for further details about the procedure. 2.2 Selection of interesting items The previous step defined a partition of Z in Z k, with k = 1,...,K; for each of the Z k matrices, supports (S k ) and confidences (C k ) are defined. AR mining is hence referred to each group of homogeneous sequences. In particular, through the analysis of each S k, the most occurring pairs of items within the k-th group are selected; while through the analysis of each C k, the procedure assigns the role of antecedent or consequent to each one of the selected items. The selected pairs of items resulting by the analysis of S k represent evident relations characterizing groups of considered binary sequences: these hidden patterns can be missed using general support thresholds. The pairs of items characterized by a degree of co-occurrence that is high in one or more of the K groups and low with respect to the whole data are then considered non-trivial. The criteria used to define what is high or low are based on the ratio between the most occurring supports inside the groups and the total supports. Proper statistical tests can be adopted to exploit the task; however, this paper does not focus on this aspect. 2.3 Items roles in the rules The confidence table C is square and asymmetric, a characteristic that has to be taken into account in analyzing the matrix. The features of C can be extended to each C k = Z T k Z kd 1 k. In the context of multidimensional data analysis, different proposals in the literature extend well-known methods like correspondence analysis (CA) [Greenacre, 2000] and multidimensional scaling (MDS) to square asymmetric tables [Bove, 1989]. A common aspect

6 Exploratory Data Analysis Leading towards of these methods is in the decomposition of the asymmetric table into two components: symmetric and skew-symmetric. Applying this ( decomposition to a table C, it results: C = C s + C sk, where C s = ) 1 2 C + C T ( represents the symmetric component of C, and C sk = ) 1 2 C C T represents the skew-symmetric component of C. The separate analyses of the two components lead to obtain a representation of the symmetric and skew-symmetric characteristics of table C: the methodology applied on C s and C sk, and the corresponding representation display, characterize the different proposals treating square asymmetric tables. Taking into account the general pair of items j and j, identified by the analysis of S, the role of j and j depends on the values c (j,j )sk and c (j,j )s. Remark that the diagonal elements of C are constant values equal to 1. These trivial values are completely irrelevant to the analysis and their presence introduces noise in the representations. In order to cut off noise from C according to Greenacre [Greenacre, 1984], these values can be ignored and replaced by an iterative alternating procedure. The matrix C is decomposed in symmetric and skew-symmetric components and then treated to replace the diagonal elements, by the following procedure: i) decomposition C in C s and C sk ; ii) correspondence analysis of C s and C sk ; iii) reconstruction of the main diagonal of C s through the general reconstruction formula: np ij = nr i r j (1 + F f=1 λ 0.5 f ϕ if ϕ jf ), (1) with i, j = 1,...,p and f = 1,...,F. F is the number of considered factors, p is the number of considered items, ϕ if is the principal coordinate of the i-th item on the factor f; r i and r j represent the row margins of C; iv) comparison of the reconstructed diagonal with the previous main diagonal for C s : if there is no difference then the whole matrix C s is reconstructed using formula (1); else C s is updated with the obtained diagonal and repeat the previous steps; v) rebuild the confidence C =C s + C sk ; Each of the previously selected items is considered as an antecedent or consequent part of interesting rule depending on its deviation from symmetry: items having a positive deviation are considered interesting heads, the remaining items are then the bodies. 3 Example In this section the procedure is applied to the BMS-Webview-2 data set: this data set was used in KDD-cup 2000 competition and refers to the transactions

7 262 Iodice D Enza et al. associated to an e-commerce company. The whole data set is available at KDD-cup 2000 home page (url: This data set was already used in many applications of data mining procedures proposals and it represents a qualifying benchmark. Being the presented approach complementary to the computer science based proposals, a direct result comparison would not make any sense. In the BMS-Webview-2 data set each statistical unit represents the clickstreams of a single session of a visitor in the e-commerce web-site; each item corresponds to a single product. The raw data set is characterized by web click-streams and 3340 product-pages (items). After a pre-processing phase the incremental K-means algorithm is applied with different numbers of classes. As shown in figure 1, the best partition is obtained for K = 10, quality level number of groups Fig.1. Classification quality versus number of classes since the quality level of the classification becomes stable. The quality level measure for the classification is proportional to the reciprocal of the mean distance between each unit and its related cluster centroid. Following the procedure, once the partition of Z in K groups is determined, the items that mostly characterize each group are then selected. Interesting items selection and the consequent determination of the selected items roles can be completely automated: however expert users can interact and iteratively set up different parameters. The lack of space does not allow us to represent the graphical displays associated for all ten classes. We just shall give an interpretation key of the whole procedure output. The procedure output is mainly graphical and consists of various representations of the reconstructed confidence matrix. The paper proposes two of them. The first one (see figure 2) represents the items in the principal factorial space of C k and C sk according to Greenacre (2000). The other one, which is more simple, represents the values of the reconstructed C. In the symmetric display, the closeness of two points/items indicates a high degree

8 Exploratory Data Analysis Leading towards symmetric display skew symmetric display Fig. 2. a) representation of the symmetric component of confidence matrix; b)representation of the skew-symmetric component of the confidences. of co-occurrence that is a support-like information; in the skew symmetric display, points far from the center of the map are characterized by a high deviation from symmetry. For each pair of items, the amount of the deviation from symmetry is proportional to the area of the triangle formed by the considered pair of points and the axis origin. The sign of the deviation from symmetry depends on oriented triangles: positive deviations correspond to clockwise oriented triangles. The latter display could be quite difficult for a non-expert user to interpret. The left part of figure 2 shows the item to be positioned far from the others, that means a different degree of occurrence. The skewsymmetric display confirm the different behavior of On the basis of such considerations, a possible rule could be in this case , and it is highlighted in both the sides of figure 2. Fig.3. Confidences representation

9 264 Iodice D Enza et al. The second approach is much more understandable, it displays, by a barchart, the n (n 1) quantities q j,j = 1 c (j,j ) c (with j, j = 1,...,n and (j,j) j > j ). The consideration of confidence ratios q j,j lead to identify the pairs of items j and j with differing c (j,j ) and c (j,j). Bars are sorted in decreasing order, according to the confidence ratios. A bar represents an interesting rule j j if it has an high value, being c (j,j ) < c (j,j), and the ordered pair (j, j ) has a positive deviation from symmetry. Figure 3 confirms the importance of the rule that is associated to the first bar in the bar-graph. Furthermore, the different behavior of the item is evident: the first-ranked bars are all associated to rules having as an antecedent part. 4 Conclusion and perspectives Since the AR were introduced for the analysis of large (and huge) data bases, most of the computational aspects, in term of speed and memory, have been successfully solved. However, it is still an open problem how to interpret the massive output. J. Edler and D. Pregibon in 1996 [Elder and Pregibon, 1996] foresaw that the KDD would have been a field for important challenges for the statistical community. They wrote: The statistician s tendency to avoid complete automation out of respect for the challenges of the data, and the historical emphasis on models with interpretable structure, has led that community to focus on problems with a more manageable number of variables (a dozen, say) and cases (several hundred typically) than may be encountered in KDD problems, which can be orders of magnitude larger at the outset. With increasingly huge and amorphous databases, it is clear that methods for automatically hunting down possible patterns worthy of fuller, interactive attention, are required. This paper, eight years later, goes in the direction indicated by Edler and Pregibon. Nowadays, the more and more increased power of modern computer makes easier to achieve this task. Future enhancements of the proposed approach are into different directions. From a computational point of view, the aim is to improve the classification step in order to obtain higher quality solution in less time. Another important aspect is the generalization of the selection criteria from pairs of items to pairs of itemsets, or rather to generalize the procedure to complex rules. Furthermore, according to the exploratory nature of the procedure, it is necessary to improve the visualization tools and introduce interactive capabilities. References [Agrawal and Srikant, 1994]R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of the 20th Very Large Data Bases Conference,

10 Exploratory Data Analysis Leading towards pages , Santiago, Chile, [Agrawal et al., 1993]R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In ACM SIGMOD int. conf. on Management of data, volume 22, pages , N.Y., ACM Press. [Bove, 1989]G. Bove. Nuovi metodi di rappresentazione di dati di prossimità. Tesi di dottorato, Univ. di Roma La Sapienza, Roma, [Brin et al., 1997]S. Brin, R. Motwani, J. Ullman, and S. Tsur. Dynamic itemsets and implication rules in market basket data. In ACM SIGMOD int. conf. on Management of data, pages , Tucson, Arizona, USA, May [Elder and Pregibon, 1996]J. F. IV Elder and D. Pregibon. A statistical perspective on knowledge discovery in databases. In U. M. Fayyad et al., editors, Advances in knowledge discovery and data mining, pages , Am. Ass. for AI, [Gaber et al., 2004]M. M. Gaber, S. Krishnaswamy, and A. Zaslavsky. Cost-efficient mining techniques for data streams. In Proc. of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation, pages , Dunedin, New Zealand, [Greenacre, 1984]M. Greenacre. Theory and Applications of Correspondence Analysis. Academic Press, London, [Greenacre, 2000]M. Greenacre. Correspondence analysis of square asymmetric matrices. Applied Statistics, 49(3): , [Guha et al., 2000]S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Proc. of the 15 th Int. Conf. on Data Engineering. IEEE Computer Society, [Han et al., 2000]J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM SIGMOD int. conf. on Management of data, pages 1 12, Dallas, Texas, May ACM SIGMOD. [Hartigan, 1975]J. A. Hartigan. Clustering Algorithms. John Wiley & Sons, [Liu et al., 1999]H. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In Proc. of KDD, pages , San Diego, CA, [Miller and Yang, 1997]R. J. Miller and Y. Yang. Association rules over interval data. In ACM SIGMOD int. conf. on Management of data, pages , Tucson, AZ, USA, [Ordonez, 2003]C. Ordonez. Clustering binary data streams with k-means. In Proc. of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, pages 12 19, San Diego, CA, ACM Press. [Park et al., 1995]J. Park, M. Chen, and P. Yu. An effective hash based algorithm for mining association rules. In M. J. Carey and D. A. Schneider, eds, ACM SIGMOD Int. Conf. on Management of Data, pp , San Jose, [Savasere et al., 1995]A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining association rules in large data bases. In 21 st Conference on Very Large Data Bases, pages , Zurich, Switzerland, September [Srikant and Agrawal, 1996]R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In ACM SIGMOD int. conf. on Management of data, pages 1 12, Montreal, Quebec, Canada, June [Zaki, 2000]M. J. Zaki. Generating non-redundant association rules. In Knowledge Discovery and Data Mining, pages 34 43, 2000.

Mining an Online Auctions Data Warehouse

Mining an Online Auctions Data Warehouse Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

Robust Outlier Detection Technique in Data Mining: A Univariate Approach

Robust Outlier Detection Technique in Data Mining: A Univariate Approach Robust Outlier Detection Technique in Data Mining: A Univariate Approach Singh Vijendra and Pathak Shivani Faculty of Engineering and Technology Mody Institute of Technology and Science Lakshmangarh, Sikar,

More information

A Survey on Association Rule Mining in Market Basket Analysis

A Survey on Association Rule Mining in Market Basket Analysis International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey

More information

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS

EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract

More information

The Data Mining Process

The Data Mining Process Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data

More information

Mining Association Rules: A Database Perspective

Mining Association Rules: A Database Perspective IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, kanak.saxena@gmail.com D.S. Rajpoot Registrar,

More information

A Two-Step Method for Clustering Mixed Categroical and Numeric Data

A Two-Step Method for Clustering Mixed Categroical and Numeric Data Tamkang Journal of Science and Engineering, Vol. 13, No. 1, pp. 11 19 (2010) 11 A Two-Step Method for Clustering Mixed Categroical and Numeric Data Ming-Yi Shih*, Jar-Wen Jheng and Lien-Fu Lai Department

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Association rules for improving website effectiveness: case analysis

Association rules for improving website effectiveness: case analysis Association rules for improving website effectiveness: case analysis Maja Dimitrijević, The Higher Technical School of Professional Studies, Novi Sad, Serbia, dimitrijevic@vtsns.edu.rs Tanja Krunić, The

More information

An application for clickstream analysis

An application for clickstream analysis An application for clickstream analysis C. E. Dinucă Abstract In the Internet age there are stored enormous amounts of data daily. Nowadays, using data mining techniques to extract knowledge from web log

More information

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data

More information

Data Mining Analytics for Business Intelligence and Decision Support

Data Mining Analytics for Business Intelligence and Decision Support Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing

More information

Market Basket Analysis for a Supermarket based on Frequent Itemset Mining

Market Basket Analysis for a Supermarket based on Frequent Itemset Mining www.ijcsi.org 257 Market Basket Analysis for a Supermarket based on Frequent Itemset Mining Loraine Charlet Annie M.C. 1 and Ashok Kumar D 2 1 Department of Computer Science, Government Arts College Tchy,

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate

More information

Web Usage Association Rule Mining System

Web Usage Association Rule Mining System Interdisciplinary Journal of Information, Knowledge, and Management Volume 6, 2011 Web Usage Association Rule Mining System Maja Dimitrijević The Advanced School of Technology, Novi Sad, Serbia dimitrijevic@vtsns.edu.rs

More information

SPMF: a Java Open-Source Pattern Mining Library

SPMF: a Java Open-Source Pattern Mining Library Journal of Machine Learning Research 1 (2014) 1-5 Submitted 4/12; Published 10/14 SPMF: a Java Open-Source Pattern Mining Library Philippe Fournier-Viger philippe.fournier-viger@umoncton.ca Department

More information

How To Use Neural Networks In Data Mining

How To Use Neural Networks In Data Mining International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and

More information

Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set

Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large data set Computational Statistics & Data Analysis ( ) www.elsevier.com/locate/csda Combined use of association rules mining and clustering methods to find relevant links between binary rare attributes in a large

More information

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis , 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,

More information

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining

Extend Table Lens for High-Dimensional Data Visualization and Classification Mining Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du fdu@cs.ubc.ca University of British Columbia

More information

Data Mining to Recognize Fail Parts in Manufacturing Process

Data Mining to Recognize Fail Parts in Manufacturing Process 122 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.7, NO.2 August 2009 Data Mining to Recognize Fail Parts in Manufacturing Process Wanida Kanarkard 1, Danaipong Chetchotsak

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

MASTER'S THESIS. Mining Changes in Customer Purchasing Behavior

MASTER'S THESIS. Mining Changes in Customer Purchasing Behavior MASTER'S THESIS 2009:097 Mining Changes in Customer Purchasing Behavior - a Data Mining Approach Samira Madani Luleå University of Technology Master Thesis, Continuation Courses Marketing and e-commerce

More information

Using Data Mining Methods to Predict Personally Identifiable Information in Emails

Using Data Mining Methods to Predict Personally Identifiable Information in Emails Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,

More information

Mining Multi Level Association Rules Using Fuzzy Logic

Mining Multi Level Association Rules Using Fuzzy Logic Mining Multi Level Association Rules Using Fuzzy Logic Usha Rani 1, R Vijaya Praash 2, Dr. A. Govardhan 3 1 Research Scholar, JNTU, Hyderabad 2 Dept. Of Computer Science & Engineering, SR Engineering College,

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Measures of association applied to operational risks

Measures of association applied to operational risks 9 Measures of association applied to operational risks Ron S. Kenett and Silvia Salini 9. Introduction Association rules are one of the most popular unsupervised data mining methods (Agrawal et al., 993;

More information

DWMiner : A tool for mining frequent item sets efficiently in data warehouses

DWMiner : A tool for mining frequent item sets efficiently in data warehouses DWMiner : A tool for mining frequent item sets efficiently in data warehouses Bruno Kinder Almentero, Alexandre Gonçalves Evsukoff and Marta Mattoso COPPE/Federal University of Rio de Janeiro, P.O.Box

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

Fuzzy Association Rules

Fuzzy Association Rules Vienna University of Economics and Business Administration Fuzzy Association Rules An Implementation in R Master Thesis Vienna University of Economics and Business Administration Author Bakk. Lukas Helm

More information

Mining On-line Newspaper Web Access Logs

Mining On-line Newspaper Web Access Logs Mining On-line Newspaper Web Access Logs Paulo Batista, Mário J. Silva Departamento de Informática Faculdade de Ciências Universidade de Lisboa Campo Grande 1700 Lisboa, Portugal {pb, mjs} @di.fc.ul.pt

More information

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm International Journal of Engineering Inventions e-issn: 2278-7461, p-issn: 2319-6491 Volume 2, Issue 5 (March 2013) PP: 16-21 Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

More information

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch

Clustering on Large Numeric Data Sets Using Hierarchical Approach Birch Global Journal of Computer Science and Technology Software & Data Engineering Volume 12 Issue 12 Version 1.0 Year 2012 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global

More information

D. Bhanu #, Dr. S. Pavai Madeshwari *

D. Bhanu #, Dr. S. Pavai Madeshwari * Retail Market analysis in targeting sales based on Consumer Behaviour using Fuzzy Clustering - A Rule Based Model D. Bhanu #, Dr. S. Pavai Madeshwari * # Department of Computer Applications, Sri Krishna

More information

Mining Social-Network Graphs

Mining Social-Network Graphs 342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is

More information

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Clustering Data Streams

Clustering Data Streams Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data

Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data Volume 39 No10, February 2012 Future Trend Prediction of Indian IT Stock Market using Association Rule Mining of Transaction data Rajesh V Argiddi Assit Prof Department Of Computer Science and Engineering,

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

Identifying erroneous data using outlier detection techniques

Identifying erroneous data using outlier detection techniques Identifying erroneous data using outlier detection techniques Wei Zhuang 1, Yunqing Zhang 2 and J. Fred Grassle 2 1 Department of Computer Science, Rutgers, the State University of New Jersey, Piscataway,

More information

Scoring the Data Using Association Rules

Scoring the Data Using Association Rules Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 9. Introduction to Data Mining Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004

More information

AN APPROACH TO ANTICIPATE MISSING ITEMS IN SHOPPING CARTS

AN APPROACH TO ANTICIPATE MISSING ITEMS IN SHOPPING CARTS AN APPROACH TO ANTICIPATE MISSING ITEMS IN SHOPPING CARTS Maddela Pradeep 1, V. Nagi Reddy 2 1 M.Tech Scholar(CSE), 2 Assistant Professor, Nalanda Institute Of Technology(NIT), Siddharth Nagar, Guntur,

More information

Association Rule Mining: A Survey

Association Rule Mining: A Survey Association Rule Mining: A Survey Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University, Singapore 1. DATA MINING OVERVIEW Data mining [Chen et

More information

Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

More information

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing

Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,

More information

Mining changes in customer behavior in retail marketing

Mining changes in customer behavior in retail marketing Expert Systems with Applications 28 (2005) 773 781 www.elsevier.com/locate/eswa Mining changes in customer behavior in retail marketing Mu-Chen Chen a, *, Ai-Lun Chiu b, Hsu-Hwa Chang c a Department of

More information

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College

More information

How To Write An Association Rules Mining For Business Intelligence

How To Write An Association Rules Mining For Business Intelligence International Journal of Scientific and Research Publications, Volume 4, Issue 5, May 2014 1 Association Rules Mining for Business Intelligence Rashmi Jha NIELIT Center, Under Ministry of IT, New Delhi,

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Discovering Interesting Association Rules in the Web Log Usage Data

Discovering Interesting Association Rules in the Web Log Usage Data Interdisciplinary Journal of Information, Knowledge, and Management Volume 5, 2010 Discovering Interesting Association Rules in the Web Log Usage Data Maja Dimitrijević The Advanced Technical School, Novi

More information

Data Mining Algorithms And Medical Sciences

Data Mining Algorithms And Medical Sciences Data Mining Algorithms And Medical Sciences Irshad Ullah Irshadullah79@gmail.com Abstract Extensive amounts of data stored in medical databases require the development of dedicated tools for accessing

More information

The Role of Visualization in Effective Data Cleaning

The Role of Visualization in Effective Data Cleaning The Role of Visualization in Effective Data Cleaning Yu Qian Dept. of Computer Science The University of Texas at Dallas Richardson, TX 75083-0688, USA qianyu@student.utdallas.edu Kang Zhang Dept. of Computer

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Introduction. A. Bellaachia Page: 1

Introduction. A. Bellaachia Page: 1 Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Mobile Phone APP Software Browsing Behavior using Clustering Analysis Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Computer Science in Education

Computer Science in Education www.ijcsi.org 290 Computer Science in Education Irshad Ullah Institute, Computer Science, GHSS Ouch Khyber Pakhtunkhwa, Chinarkot, ISO 2-alpha PK, Pakistan Abstract Computer science or computing science

More information

Part 2: Community Detection

Part 2: Community Detection Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining

Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining Fuzzy Logic -based Pre-processing for Fuzzy Association Rule Mining by Ashish Mangalampalli, Vikram Pudi Report No: IIIT/TR/2008/127 Centre for Data Engineering International Institute of Information Technology

More information

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection

A Survey on Outlier Detection Techniques for Credit Card Fraud Detection IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VI (Mar-Apr. 2014), PP 44-48 A Survey on Outlier Detection Techniques for Credit Card Fraud

More information

Visualization of textual data: unfolding the Kohonen maps.

Visualization of textual data: unfolding the Kohonen maps. Visualization of textual data: unfolding the Kohonen maps. CNRS - GET - ENST 46 rue Barrault, 75013, Paris, France (e-mail: ludovic.lebart@enst.fr) Ludovic Lebart Abstract. The Kohonen self organizing

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Principles of Data Mining by Hand&Mannila&Smyth

Principles of Data Mining by Hand&Mannila&Smyth Principles of Data Mining by Hand&Mannila&Smyth Slides for Textbook Ari Visa,, Institute of Signal Processing Tampere University of Technology October 4, 2010 Data Mining: Concepts and Techniques 1 Differences

More information

An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset

An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset An Efficient Frequent Item Mining using Various Hybrid Data Mining Techniques in Super Market Dataset P.Abinaya 1, Dr. (Mrs) D.Suganyadevi 2 M.Phil. Scholar 1, Department of Computer Science,STC,Pollachi

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Stabilization by Conceptual Duplication in Adaptive Resonance Theory

Stabilization by Conceptual Duplication in Adaptive Resonance Theory Stabilization by Conceptual Duplication in Adaptive Resonance Theory Louis Massey Royal Military College of Canada Department of Mathematics and Computer Science PO Box 17000 Station Forces Kingston, Ontario,

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Mining and Tracking Evolving Web User Trends from Large Web Server Logs

Mining and Tracking Evolving Web User Trends from Large Web Server Logs Mining and Tracking Evolving Web User Trends from Large Web Server Logs Basheer Hawwash and Olfa Nasraoui* Knowledge Discovery and Web Mining Laboratory, Department of Computer Engineering and Computer

More information

Data Mining 5. Cluster Analysis

Data Mining 5. Cluster Analysis Data Mining 5. Cluster Analysis 5.2 Fall 2009 Instructor: Dr. Masoud Yaghini Outline Data Structures Interval-Valued (Numeric) Variables Binary Variables Categorical Variables Ordinal Variables Variables

More information