Mining an Online Auctions Data Warehouse
|
|
|
- Julian Lee
- 9 years ago
- Views:
Transcription
1 Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance of Assistant Professor Sung-Hyuk Cha Pace University Abstract This report evaluates a marketing opportunity for an auction business in the online auction industry. The proposal is specific to the sale of art, antiques and collectibles and represents sales conducted as English auctions (as opposed to other forms of auction or retail). The analysis will consist of a Market Basket Analysis (MBA) for evaluating cross-promotional sales and marketing opportunities in the business. The taxonomy of the product hierarchy contains collecting categories and the analysis will determine if customers purchase across collecting categories. The input to the MBA is a sales dataset that has been denormalized to contain repeating groups of collecting category sales data. The MBA utilizes the Apriori algorithm, which is a association rule discovery procedure. It is a tabulation algorithm that determines the frequent itemsets and generates the association rules by finding relationships between the items in the dataset. The report concludes with the association rules determined as part of the analysis. 1. Introduction Many businesses attempt to identify customer purchasing patterns to understand if these patterns offer additional business opportunities. In the auction industry, the customers tend to be classified as collectors (i.e., a person who makes a collection) who have a loyalty to a particular product category (e.g., coin or stamp collectors). The main goal and concept description of this report is: To mine the database of buying activity to determine if collectors have an affinity to buy across product categories. The answer to this question could enable important marketing decisions in advertising, targeted or cross-promoted online sales. This proposal utilizes a Market Basket Analysis [5] to determine if customers purchase across collecting categories. This same analysis is well known by the example of customers who purchase diapers also buy beer on Thursdays. MBA is an association rule mining technique that is used to find items that are most frequently purchased together. It uses the Apriori algorithm [1] and generates the appropriate association rules by pruning the itemset space by filtering on the (3) metrics of support, confidence and improvement (support and confidence are sometimes called coverage and accuracy). 8.1
2 Agrawal et al. introduced the Apriori algorithm for the purpose of efficient association rule mining [1,3]. The concept of association rules was introduced in 1993 [2]. The Apriori algorithm is an efficient association discovery algorithm that filters item sets by incorporating item constraints (support). Agrawal et al. actually introduced several algorithms in addition to Apriori (e.g., AprioriTid, AprioriHybrid) that were oriented to mining association rules for large databases. They also demonstrated performance improvements over the then accepted AIS and SETM algorithms. Although efficient, recent progress has been made with other algorithms that use the power of parallel processors to split the problem into multiple parts and other algorithmic improvements. An example of one of these improved algorithms is Dynamic Itemset Counting [4]. 2. Input and Output There are more attributes in the sales dataset than actually required for this analysis. This is to enable the additional statistical analysis beyond the Market Basket Analysis (see research extensions). The data file is in a relational data format and the metadata for the attributes is as follows: Normalized Attribute Description Format BuyerID Unique identifier for the Numeric integer individual buyer Date Date of sale MM/DD/YYYY Category Product category of item Text nominal Price Hammer price in USD Numeric - integer Since the results of the analysis may be used for targeted marketing campaigns, the actual buyer is required so grouped data would not suffice. The analysis will be conducted over the past year (2001) since the earlier data is not thought to be representative of the problem space. The date attribute will be used to filter the data, but also could be used to do seasonal pattern analysis. The category attribute is nominal data representing the category code. The category code is also a codified value. The price attribute could be nominal for this analysis, but it is represented as an integer value to enable further statistical analysis (beyond the scope of this report). The source of the data is a corporate sales data warehouse. To simplify and prepare the data for analysis, the relational data needs to be denormalized since the purchases could occur over various dates (multi-category collectors may not make purchases on the same date because of scarcity of items to buy). The denormalized file will contain the BuyerID followed by a repeating group of (13) categories (i.e., all of the categories the buyer has ever purchased within). The denormalized file structure is: Denormalized Attribute BuyerID Category1. Category13 Description Unique identifier for the individual buyer Product category of 1st item purchased. Product category of the 13th item purchased 8.2
3 In this case, the data was transformed to its denormalized state through a series of queries and operations in Microsoft Access and Excel. As an alternative to the denormalized data structure, the data could also be coded in the ARFF format in order to utilize the Apriori algorithm/code that is available from the Weka web site. Knowledge Representation Several different types of output are appropriate for a Market Basket Analysis. For example, a tabular representation is a two-dimensional co-occurrence matrix. The rows and columns would represent the frequent item set combinations (i.e., categories purchased together most often) and the data would be the frequency count of the combination. The co-occurrence matrix can be used to calculate additional two-dimensional tables representing the metrics of support, confidence and improvement. The co-occurrence matrix can be mined by running the Apriori algorithm across the itemset space, which will determine if any strong relationships exist between the items in the dataset. This process can transform the matrix into association rules for easier interpretation. 3. Algorithms Evaluation Methods Association rule mining is a procedure that looks for relationships between items in a dataset. The relationships can be determined through statistical methods such as correlation analysis or tabulation procedures such as Market Basket Analysis. MBA is useful for determining which items tend to be purchased together. In this case it will be used to identify multi-category collectors. MBA is typically expressed as association rules in the form If Condition then Result (e.g., If Item Y is purchased then Item Z is purchased ). The MBA process consists of: A. Choosing the right items (using a taxonomy or other filtering methods) B. Running a cross-tabulation to create the co-occurrence matrix C. Run Apriori and determine the quality of the association rules by calculating the support, confidence and improvement The Apriori algorithm is an iterative process, which successively processes each k-sized itemset (i.e., an itemset that contains k items) until no larger itemsets can be generated. The specific steps of the algorithm are: 1. Set K=1 2. Calculate all k-sized itemsets 3. Calculate the support for all of the candidate itemsets filter the itemsets based on a minimum support metric 4. Join all k-sized itemsets to generate candidate itemsets for size K+1 8.3
4 5. Set K = K Repeat steps 3 through 5 until no larger itemsets can be formed 7. Generate the final set of itemsets by creating the union of all k-sized itemsets The Apriori algorithm utilizes the three metrics to find strong rules by pruning the itemset space. The good or useful rules (rules with a quality course of action) are the ones we want to find, and ignore the trivial (obviously known rules) and inexplicable rules (rules that seem to have no explanation and no course of action) that Apriori will also find. Support is the percentage of records containing the item combination compared to the total number of records [7]. Confidence is the ratio of the number of transactions with all the items in the rule to the number of transactions with just the items in the condition. An example of confidence is: If A & B => C has an 80% confidence means, if when a collector buys A and B, in 80% of the cases, the collector also bought C. Improvement measures how much better a rule is at predicting a result than just assuming the result. When it is > 1, the rule is better at predicting the result than random chance. D. Generate association rules a) Finding the most frequent combinations of item sets b) Define condition and result (for the conditional association rules) * * * * The Market Basket Analysis was done using a data mining software product from Megaputer called PolyAnalyst [6]. PolyAnalyst (version 4.5) is a powerful tool for discovering patterns and knowledge hidden in your data. This proposal used the MBA features of PolyAnalyst to determine the association rules for this dataset. Microsoft Excel was also used to calculate the co-occurrence, support and confidence matrices. The results were: A. Choosing the right items (using a taxonomy or other filtering methods) 2001 sales data across (13) collecting categories B. Running a cross-tabulation to create the co-occurrence matrix Category CAT01 CAT02 CAT03 CAT04 CAT05 CAT06 CAT07 CAT08 CAT09 CAT10 CAT11 CAT12 CAT13 CAT CAT CAT CAT CAT CAT CAT CAT CAT
5 CAT CAT CAT CAT C. Run Apriori and determine the quality of the association rules by calculating the support, confidence and improvement PolyAnalyst allows you to input and control each of the three parameters (support, confidence and improvement). Setting the minimum support allows you to prune the product groups (in our case categories) that Apriori finds. For instance, if the minimum support is set high, then only category groups found in large numbers of transactions together will be returned (which is more likely to be groups of 2 categories). Setting it lower makes it more likely that category groups with 3 or more categories will be returned. Confidence and improvement are used by PolyAnalyst to prune the resultant association rules. For this analysis, category groups of 2 categories are more important since they represent clear opportunities to cross-promote marketing for those collecting categories (typically, product groups of 3 or more are used for organizing a store or shelf). In addition to calculating support, confidence and improvement for each category group, PolyAnalyst also calculates the p-value (probability). The lower the value, the less likely the group appeared by random chance. The co-occurrence count is also displayed on the same line displayed as the label called support. The power of PolyAnalyst is the ability to quickly run a different analysis with different support, confidence and improvement. If you run an analysis and it returns to many or to few groups, you can reset the minimum support and rerun the analysis. As a result, changing the minimum support will result in different groups being returned, which are all valid, but valid for different purposes. For example, as mentioned, a lower minimum support will tend to return groups with more products within them. This is particularly useful for product positioning. The following matrices were calculated in Microsoft Excel to crosscheck the results of PolyAnalyst (in step D.) Support Category CAT01 CAT02 CAT03 CAT04 CAT05 CAT06 CAT07 CAT08 CAT09 CAT10 CAT11 CAT12 CAT13 CAT01 4.5% 0.7% 0.5% 0.5% 0.4% 0.7% 0.4% 1.0% 0.3% 0.5% 0.3% 0.2% 0.2% CAT02 0.7% 7.7% 0.6% 0.7% 1.0% 1.8% 0.9% 1.5% 0.4% 0.8% 0.8% 0.3% 0.1% CAT03 0.5% 0.6% 11.1% 1.6% 0.9% 1.5% 0.7% 2.0% 0.8% 1.3% 0.8% 0.4% 0.1% CAT04 0.5% 0.7% 1.6% 19.5% 1.1% 1.8% 1.2% 1.9% 0.9% 1.1% 1.0% 0.5% 0.1% CAT05 0.4% 1.0% 0.9% 1.1% 9.4% 3.0% 0.9% 1.7% 0.3% 0.9% 1.5% 0.5% 0.1% CAT06 0.7% 1.8% 1.5% 1.8% 3.0% 16.9% 1.5% 4.0% 0.7% 1.6% 2.0% 0.8% 0.1% CAT07 0.4% 0.9% 0.7% 1.2% 0.9% 1.5% 10.2% 1.4% 0.5% 0.9% 1.1% 0.8% 0.1% CAT08 1.0% 1.5% 2.0% 1.9% 1.7% 4.0% 1.4% 24.1% 1.3% 3.7% 1.3% 0.9% 0.2% CAT09 0.3% 0.4% 0.8% 0.9% 0.3% 0.7% 0.5% 1.3% 7.4% 0.9% 0.3% 0.2% 0.0% CAT10 0.5% 0.8% 1.3% 1.1% 0.9% 1.6% 0.9% 3.7% 0.9% 15.6% 0.6% 0.5% 0.1% CAT11 0.3% 0.8% 0.8% 1.0% 1.5% 2.0% 1.1% 1.3% 0.3% 0.6% 5.8% 0.6% 0.1% 8.5
6 CAT12 0.2% 0.3% 0.4% 0.5% 0.5% 0.8% 0.8% 0.9% 0.2% 0.5% 0.6% 4.9% 0.0% CAT13 0.2% 0.1% 0.1% 0.1% 0.1% 0.1% 0.1% 0.2% 0.0% 0.1% 0.1% 0.0% 1.1% Confidence Category CAT01 CAT02 CAT03 CAT04 CAT05 CAT06 CAT07 CAT08 CAT09 CAT10 CAT11 CAT12 CAT13 CAT % 15.8% 11.6% 11.6% 8.4% 15.0% 7.8% 23.2% 7.4% 10.5% 6.5% 4.0% 3.4% CAT02 9.2% 100.0% 8.4% 9.5% 13.3% 22.7% 11.4% 19.0% 4.8% 10.0% 9.7% 4.3% 1.6% CAT03 4.7% 5.8% 100.0% 14.5% 8.1% 13.9% 6.7% 18.1% 7.0% 11.7% 7.0% 3.3% 1.2% CAT04 2.7% 3.8% 8.3% 100.0% 5.5% 9.0% 6.1% 9.7% 4.8% 5.8% 5.1% 2.6% 0.6% CAT05 4.0% 10.9% 9.6% 11.4% 100.0% 31.9% 10.0% 18.4% 3.4% 9.7% 15.4% 5.0% 1.2% CAT06 4.0% 10.4% 9.1% 10.4% 17.8% 100.0% 9.0% 23.9% 4.3% 9.7% 11.9% 4.6% 0.7% CAT07 3.4% 8.7% 7.3% 11.6% 9.2% 15.0% 100.0% 13.9% 4.6% 9.0% 10.5% 8.3% 1.1% CAT08 4.3% 6.1% 8.3% 7.8% 7.2% 16.7% 5.9% 100.0% 5.6% 15.4% 5.5% 3.6% 0.7% CAT09 4.5% 5.0% 10.5% 12.6% 4.3% 9.7% 6.3% 18.1% 100.0% 12.6% 4.1% 3.2% 0.5% CAT10 3.0% 4.9% 8.3% 7.2% 5.8% 10.5% 5.9% 23.8% 6.0% 100.0% 4.1% 3.0% 0.5% CAT11 5.1% 13.0% 13.4% 17.0% 25.1% 34.8% 18.5% 22.8% 5.2% 11.1% 100.0% 10.3% 1.3% CAT12 3.7% 6.8% 7.6% 10.6% 9.8% 16.0% 17.4% 18.0% 4.9% 9.8% 12.3% 100.0% 1.0% CAT % 10.7% 11.6% 10.7% 9.9% 10.7% 9.9% 15.7% 3.3% 6.6% 6.6% 4.1% 100.0% D. Generate association rules Output Three different analyses were run in PolyAnalyst to determine which produced the most appropriate association rules for this particular business opportunity. By evaluating the calculated support matrix, it is obvious that in general the category relationships have a low support. The three runs set support at.5, 1 and 2. No other runs produced any additional association rules. The results were: Run 1 -- Exploration parameters: Argument Value Numeric values has been binarized Minimum support, % 0.5 Minimum improvement 1 Minimum confidence, % 1 3 group(s) of associated products were found. group #1 contains 3 products p_value: (log = ); support 100 CAT11 CAT05 CAT06 8.6
7 group #2 contains 2 products p_value: (log = ); support 75 CAT02 CAT01 group #3 contains 2 products p_value: (log = ); support 89 CAT12 CAT07 PRODUCT ASSOCIATION RULES Support Confidence Improvement Filter: > 0.50 > 1.00 > 1.00 CAT05, CAT06 -> CAT % 31.55% CAT11, CAT06 -> CAT % 47.17% CAT11, CAT05 -> CAT % 65.36% CAT01 -> CAT % 15.82% CAT02 -> CAT % 9.21% CAT07 -> CAT % 8.29% CAT12 -> CAT % 17.42% Run 2 -- Exploration parameters: Argument Value Numeric values has been binarized Minimum support, % 1 Minimum improvement 1 Minimum confidence, % 1 2 group(s) of associated products were found. group #1 contains 2 products p_value: (log = ); support 153 CAT11 CAT05 group #2 contains 2 products p_value: (log = ); support 185 CAT06 CAT02 8.7
8 PRODUCT ASSOCIATION RULES Support Confidence Improvement Filter: > 1.00 > 1.00 > 1.00 CAT05 -> CAT % 15.41% CAT11 -> CAT % 25.08% CAT02 -> CAT % 22.73% CAT06 -> CAT % 10.39% Run 3 -- Exploration parameters: Argument Value Numeric values has been binarized Minimum support, % 2 Minimum improvement 1 Minimum confidence, % 1 1 group(s) of associated products were found. group #1 contains 2 products p_value: (log = ); support 317 CAT06 CAT05 PRODUCT ASSOCIATION RULES Support Confidence Improvement Filter: > 2.00 > 1.00 > 1.00 CAT05 -> CAT % 31.92% CAT06 -> CAT % 17.81% Interpretation of Association Rules The (3) runs of the MBA essentially produced six category groups (along with their related directional groups). One group contains 3 categories, which for this analysis is less interesting (because it is hard to cross-promote on the fact of a customer purchasing within 2 categories). The relationships aren t particularly strong by evidence of the low support and moderately low confidence results. This indicates that there are minimal cross-promotion opportunities within this business. These results do support the anecdotal business opinion that customers of these types of items have a collectors mentality, which enforces a strong loyalty to a particular collecting category. To the extent that there are opportunities, the association rules can be interpreted as: 8.8
9 If a client collects (purchases) CAT01, then they also collect CAT02. The confidence is 15.82% (meaning the chance of purchasing the CAT02 is 15.88%). Also, it is (improvement) times more likely than random chance that this association rule is valid (i.e., statistically sound). The additional rule is the same (lower confidence at 9.21%), but opposite in the direction of the relationship. If a client collects CAT07, then they also collect CAT12. The rule of opposite direction is valid as well. If a client collects CAT05, then they also collect CAT11. The rule of opposite direction is valid as well. If a client collects CAT02, then they also collect CAT06. The rule of opposite direction is valid as well. If a client collects CAT05, then they also collect CAT06. The rule of opposite direction is valid as well. 4. Research Extensions Beyond the Market Basket Analysis, the dataset has additional attributes, which could be used for additional analysis. For example, since the dataset has a time dimension, seasonal pattern analysis could be done with additional statistical techniques. Also, given that the price is numeric, additional descriptive statistics could be calculated including an analysis of variance (ANOVA). Correlation coefficients could also be calculated between the combinations to determine statistical correlation between the purchasing patterns. References [1] Rakesh Agrawal, and Ramakrishnan Srikant Fast algorithms for mining association rules in large databases. In Proceedings of the 20 th International Conference on Very Large Data Bases, vol. 2, pp , Santiago, Chile, September [2] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vol. 22, pp , June [3] Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, and Inkeri Verkamo. Fast discovery of association rules. In Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, Chapter 12, AAAI Press, [4] Sergey Brin, Rajeev Motwani, Jeffrey Ullman, and Shalom Tsur. Dynamic itemset counting and implication rules for market basket data. In Proceedings of the ACM SIGMOD International Conference on Management of Data, New York, May [2002, April] 8.9
10 [5] Grant Bugher, Market Basket Analysis of the Sales Data for a Client of Cambridge Technology Partners, [2003, March] [6] Megaputer Intelligence, Data Mining with PolyAnalyst, [2002, March] [7] Ian Witten, and Eibe Frank Data Mining Practical Machine Learning Tools and Techniques with Java Implementations, San Francisco, CA: Morgan Kaufmann. 8.10
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York [email protected] Abstract
Data Mining Algorithms And Medical Sciences
Data Mining Algorithms And Medical Sciences Irshad Ullah [email protected] Abstract Extensive amounts of data stored in medical databases require the development of dedicated tools for accessing
Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis
, 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
MBA 8473 - Data Mining & Knowledge Discovery
MBA 8473 - Data Mining & Knowledge Discovery MBA 8473 1 Learning Objectives 55. Explain what is data mining? 56. Explain two basic types of applications of data mining. 55.1. Compare and contrast various
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
Data Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania [email protected] Over
New Matrix Approach to Improve Apriori Algorithm
New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, [email protected] Associate
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE
www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan
Association Rule Mining: Exercises and Answers
Association Rule Mining: Exercises and Answers Contains both theoretical and practical exercises to be done using Weka. The exercises are part of the DBTech Virtual Workshop on KDD and BI. Exercise 1.
PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Clustering Connectionist and Statistical Language Processing
Clustering Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes Clustering p.1/21 Overview clustering vs. classification supervised
Experiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
A Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT
FREQUENT PATTERN MINING FOR EFFICIENT LIBRARY MANAGEMENT ANURADHA.T Assoc.prof, [email protected] SRI SAI KRISHNA.A [email protected] SATYATEJ.K [email protected] NAGA ANIL KUMAR.G
Using Association Rules for Product Assortment Decisions in Automated Convenience Stores
Using Association Rules for Product Assortment Decisions in Automated Convenience Stores Tom Brijs, Gilbert Swinnen, Koen Vanhoof and Geert Wets Limburg University Centre Department of Applied Economics
Using Data Mining Methods to Predict Personally Identifiable Information in Emails
Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
Data Mining Techniques
15.564 Information Technology I Business Intelligence Outline Operational vs. Decision Support Systems What is Data Mining? Overview of Data Mining Techniques Overview of Data Mining Process Data Warehouses
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
Web Usage Association Rule Mining System
Interdisciplinary Journal of Information, Knowledge, and Management Volume 6, 2011 Web Usage Association Rule Mining System Maja Dimitrijević The Advanced School of Technology, Novi Sad, Serbia [email protected]
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
Data Mining to Recognize Fail Parts in Manufacturing Process
122 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.7, NO.2 August 2009 Data Mining to Recognize Fail Parts in Manufacturing Process Wanida Kanarkard 1, Danaipong Chetchotsak
College information system research based on data mining
2009 International Conference on Machine Learning and Computing IPCSIT vol.3 (2011) (2011) IACSIT Press, Singapore College information system research based on data mining An-yi Lan 1, Jie Li 2 1 Hebei
Mining various patterns in sequential data in an SQL-like manner *
Mining various patterns in sequential data in an SQL-like manner * Marek Wojciechowski Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland [email protected]
E-Banking Integrated Data Utilization Platform WINBANK Case Study
E-Banking Integrated Data Utilization Platform WINBANK Case Study Vasilis Aggelis Senior Business Analyst, PIRAEUSBANK SA, [email protected] Abstract we all are living in information society. Companies
Mining Association Rules: A Database Perspective
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology
Data quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
Building A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical
A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment
A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,
Project Report. 1. Application Scenario
Project Report In this report, we briefly introduce the application scenario of association rule mining, give details of apriori algorithm implementation and comment on the mined rules. Also some instructions
Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI
Data Mining Knowledge Discovery, Data Warehousing and Machine Learning Final remarks Lecturer: JERZY STEFANOWSKI Email: [email protected] Data Mining a step in A KDD Process Data mining:
What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO
What is Data Mining? Data Mining (Knowledge discovery in database) Data Mining: "The non trivial extraction of implicit, previously unknown, and potentially useful information from data" William J Frawley,
KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: [email protected]
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: [email protected] Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
Mine Your Business A Novel Application of Association Rules for Insurance Claims Analytics
Mine Your Business A Novel Application of Association Rules for Insurance Claims Analytics Lucas Lau and Arun Tripathi, Ph.D. Abstract: This paper describes how a data mining technique known as Association
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
Data Mining with Weka
Data Mining with Weka Class 1 Lesson 1 Introduction Ian H. Witten Department of Computer Science University of Waikato New Zealand weka.waikato.ac.nz Data Mining with Weka a practical course on how to
Association Rule Mining: A Survey
Association Rule Mining: A Survey Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University, Singapore 1. DATA MINING OVERVIEW Data mining [Chen et
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Business Lead Generation for Online Real Estate Services: A Case Study
Business Lead Generation for Online Real Estate Services: A Case Study Md. Abdur Rahman, Xinghui Zhao, Maria Gabriella Mosquera, Qigang Gao and Vlado Keselj Faculty Of Computer Science Dalhousie University
A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING
A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor
LiDDM: A Data Mining System for Linked Data
LiDDM: A Data Mining System for Linked Data Venkata Narasimha Pavan Kappara Indian Institute of Information Technology Allahabad Allahabad, India [email protected] Ryutaro Ichise National Institute of
Mining changes in customer behavior in retail marketing
Expert Systems with Applications 28 (2005) 773 781 www.elsevier.com/locate/eswa Mining changes in customer behavior in retail marketing Mu-Chen Chen a, *, Ai-Lun Chiu b, Hsu-Hwa Chang c a Department of
Pentaho Data Mining Last Modified on January 22, 2007
Pentaho Data Mining Copyright 2007 Pentaho Corporation. Redistribution permitted. All trademarks are the property of their respective owners. For the latest information, please visit our web site at www.pentaho.org
A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model
arxiv:cs.db/0112013 v1 11 Dec 2001 A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model Tom Brijs Bart Goethals Gilbert Swinnen Koen Vanhoof Geert
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
Dynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
SPMF: a Java Open-Source Pattern Mining Library
Journal of Machine Learning Research 1 (2014) 1-5 Submitted 4/12; Published 10/14 SPMF: a Java Open-Source Pattern Mining Library Philippe Fournier-Viger [email protected] Department
Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm
Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm 1 Sanjeev Rao, 2 Priyanka Gupta 1,2 Dept. of CSE, RIMT-MAEC, Mandi Gobindgarh, Punjab, india Abstract In this paper we
How To Learn Data Analytics
COURSE DESCRIPTION Spring 2014 COURSE NAME COURSE CODE DESCRIPTION Data Analytics: Introduction, Methods and Practical Approaches INF2190H The influx of data that is created, gathered, stored and accessed
2.1. Data Mining for Biomedical and DNA data analysis
Applications of Data Mining Simmi Bagga Assistant Professor Sant Hira Dass Kanya Maha Vidyalaya, Kala Sanghian, Distt Kpt, India (Email: [email protected]) Dr. G.N. Singh Department of Physics and
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Facilitating Business Process Discovery using Email Analysis
Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Data Mining Apriori Algorithm
10 Data Mining Apriori Algorithm Apriori principle Frequent itemsets generation Association rules generation Section 6 of course book TNM033: Introduction to Data Mining 1 Association Rule Mining (ARM)
Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller [email protected] Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
Association Rules Mining: A Recent Overview
GESTS International Transactions on Computer Science and Engineering, Vol.32 (1), 2006, pp. 71-82 Association Rules Mining: A Recent Overview Sotiris Kotsiantis, Dimitris Kanellopoulos Educational Software
Data Mining for Data Cloud and Compute Cloud
Data Mining for Data Cloud and Compute Cloud Prof. Uzma Ali 1, Prof. Punam Khandar 2 Assistant Professor, Dept. Of Computer Application, SRCOEM, Nagpur, India 1 Assistant Professor, Dept. Of Computer Application,
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION
ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS
Scoring the Data Using Association Rules
Scoring the Data Using Association Rules Bing Liu, Yiming Ma, and Ching Kian Wong School of Computing National University of Singapore 3 Science Drive 2, Singapore 117543 {liub, maym, wongck}@comp.nus.edu.sg
Clinic + - A Clinical Decision Support System Using Association Rule Mining
Clinic + - A Clinical Decision Support System Using Association Rule Mining Sangeetha Santhosh, Mercelin Francis M.Tech Student, Dept. of CSE., Marian Engineering College, Kerala University, Trivandrum,
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter
VCU-TSA at Semeval-2016 Task 4: Sentiment Analysis in Twitter Gerard Briones and Kasun Amarasinghe and Bridget T. McInnes, PhD. Department of Computer Science Virginia Commonwealth University Richmond,
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
Customer Analytics. Turn Big Data into Big Value
Turn Big Data into Big Value All Your Data Integrated in Just One Place BIRT Analytics lets you capture the value of Big Data that speeds right by most enterprises. It analyzes massive volumes of data
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
Implementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
DATA MINING TECHNIQUES: A SOURCE FOR CONSUMER BEHAVIOR ANALYSIS
DATA MINING TECHNIQUES: A SOURCE FOR CONSUMER BEHAVIOR ANALYSIS Abhijit Raorane 1 & R.V.Kulkarni 2 1 Department of computer science, Vivekanand College, Tarabai park Kolhapur [email protected] 2 Head
How To Write An Association Rules Mining For Business Intelligence
International Journal of Scientific and Research Publications, Volume 4, Issue 5, May 2014 1 Association Rules Mining for Business Intelligence Rashmi Jha NIELIT Center, Under Ministry of IT, New Delhi,
An Introduction to WEKA. As presented by PACE
An Introduction to WEKA As presented by PACE Download and Install WEKA Website: http://www.cs.waikato.ac.nz/~ml/weka/index.html 2 Content Intro and background Exploring WEKA Data Preparation Creating Models/
Data Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
DATA MINING TECHNIQUES: A SOURCE FOR CONSUMER BEHAVIOR ANALYSIS
DATA MINING TECHNIQUES: A SOURCE FOR CONSUMER BEHAVIOR ANALYSIS Abhijit Raorane 1 & R.V.Kulkarni 2 1 Department of computer science, Vivekanand College, Tarabai park Kolhapur [email protected] 2 Head
Association Rule Mining using Apriori Algorithm for Distributed System: a Survey
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 2, Ver. VIII (Mar-Apr. 2014), PP 112-118 Association Rule Mining using Apriori Algorithm for Distributed
Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results
, pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department
Data Mining Applications in Manufacturing
Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent
Introduction Predictive Analytics Tools: Weka
Introduction Predictive Analytics Tools: Weka Predictive Analytics Center of Excellence San Diego Supercomputer Center University of California, San Diego Tools Landscape Considerations Scale User Interface
