Vinification Mining A Case Study on Wine Production

Size: px
Start display at page:

Download "Vinification Mining A Case Study on Wine Production"

Transcription

1 Vinification Mining A Case Study on Wine Production Jorge RIBEIRO 1, José NEVES 2, Juan SANCHEZ 3, Paulo NOVAIS 2, José MACHADO 2 1 Viana do Castelo Polytechnic Institute, School of Technology and Management, Viana do Castelo, Portugal, jribeiro@estg.ipvc.pt 2 University of Minho, DI-CCTC,Department of Computer Science, Portugal,{jneves,pjon,jmac}@di.uminho.pt 3 Viana do Castelo Polytechnic Institute, Agrarian School, Portugal, xavier@esa.ipvc.pt Abstract Throughout time wine has performed a relevant role in almost every civilization. Demarcated regions to benefit from an Origin Denomination have to assure that every process on wine production is submitted to a strict control in every phase, since the vineyards till the costumer. The wine vinification process is one of the stages in wine s production that could influence the achievement of wine s quality. This assessment is traditionally realized by wine tasters that analyze some organoletic parameters such as colour, foam, flavour and savour being very important for the wine production and for its successful marketing. The use of Data Mining techniques in this field has a great relevance in revealing the importance of the numerous chemical parameters involved in the process of wine production, as well as to define classifying models to determine the parameters based on organoletic parameters from the chemical process of winemaking. The Decision Trees and the Linear Regression were used as Data Mining techniques to achieve the objectives of classification and regression. The experiments were oriented using the new Microsoft's SQL Server 2008 Business Intelligence Development Studio and an open-source Data Minig tool (WEKA). Very good results were achieved, with performances between 85% and 98% for all models. Key words Data Mining; Knowledge Discovery in Databases, Decision Trees, Linear Regression, Wine Vinification Process. 1. Introduction In the context of the wine production, the vinification process corresponds to the analysis over the time of the wine s quality. During this process (Fig. 1) several chemical parameters are analyzed such as ph, Anthocyanines, Chemistry Age, etc. [1, 2] are recorded. With these data it is possible to examine relationships between the attributes that allows to extract knowledge and create classification models in order to adjust some parameters to improve the quality of the wine and secondly, to analyze the chemical attributes that influence the best time to consume the wine. To complement the achievement of these results and in order to analyze the chemical quality of the samples the wine tasters analyze some organoletic/subjective attributes such as the savour, the colour, the flavour and the foam. In the case of the green red wines the process of winemaking begins with the wine grapes (in this case study of the vinhão wine). Next the wine is transported to an experimental winery and is made the grapes sampling. After the grape sampling it is made the fermentation process with the different types of maceration [1] and the process of racking, pressing and made the cold stabilizations (figure 1). Then is followed the procedure for the use of the glue and stabilization and for the wine bottling. To examine the organoletic/subjective quality of the

2 vinification, the vinification samples are evaluated by a set of wine tasters that reviews 8 times the same sample. The wines were produced following three different processes: pellicular fermentative maceration, a traditional method, rotary cube fermentation and the carbonic maceration as we present in the figure 1 and in the table 7 [1]. The total of wines phenolics were determined by colorimetry with phosphotungstic-phosphomolybdic acid [3] at 750nm. The results were expressed in units of the Folin-Ciocalteau Index(IFC). Grapes Vinhão Grapes sampling Transport to experimental winery Destemming/crushing Whole grapes into a CO2 satured tank Pellicular Fermentative Maceration (C) Rotary Cube Maceration (RF) Carbonic Maceration (CM) Racking after a week Racking after a week Racking after two weeks Pressing Pressing Destemming/crushing Cold stabilization/rackings Cold stabilization/rackings End of alcoolic fermentation Stabilization and bottling Stabilization and bottling Cold stabilization/rackings Stabilization and bottling Fig. 1 - Technological process used for the tree types of maceration in wine production. In recent years, the application Data Mining techniques [7] has become a very powerful tool and easy to use for analyzing relationships between various attributes of the data sets. The high volume of data stored by organizations through time origins a new challenge in the extraction of knowledge from the information stored. From the Knowledge Discovery from Databases process (KDD) [4] organizations can potentiate the stored data, discovering relationships or affinities between them and understand the behavior of the various agents that intervenue in the organization like customers, suppliers and sellers. Various tasks (selection, pre-processing, transformation, data mining and interpretation) are associated to this process. The Data Mining task is centred in the application of algorithms including: artificial neural networks, decision trees, association rules and genetic algorithms that are used to extract patterns from the previously treated data and are applied according to the KDD objectives (classification, rgression, clustering, forecasting and optimization). In this work we will use the classification and regression Data Mining objectives using the Decision Trees (DT) [5] and Linear Regression (LR) [6] as Data Mining techniques. This study focuses on the creation of classification models of subjective attributes (savour, colour, flavour and foam) [2] from the chemical parameters obtained during the process. To achieve this objective we used the DT and LR to represent mathematical functions that show the relationship between the chemical attributes to allow the creation of a function to obtain the values of a given subjective parameter. We use a data set of the green red wines vinification process of an agricultural cooperative of the North of Portugal. The tools used were an opensource tool (WEKA) [7] and a proprietary tool (Microsoft Business Intelligence 2008) [8]. With this work we intend to demonstrate the potential of the Data Mining techniques in

3 the extraction of knowledge in databases in particular for the creation of classification models of subjective attributes in the wine vinification process. With these techniques the managers of the agricultural cooperative could predict what will be the subjective values of the parameters varying the values of some chemical parameters. In this way, they can use a tool capable to assist them in the analysis of chemical parameters of the best wine to improve the wine s quality and analyse the best conditions to consume the wine. 2. Materials and Methods 2.1 Wine vinification Data This work adopted part of the data collected during the wine production phase during four years in a Wine Estate in Minho Region (North of Portugal) that produces and markets green red wine. During the process of wine vinification it was used three kinds of wine maceration [1] (figure 1): Vinification Maceration by Pellicular fermentative (C), Vinification by carbonic maceration (CM) and Vinification by Rotary Cube (CR). For each maceration type it was used five types of glue or clarification type [1]: Polyvinilpolipirrolidona, albumin, gelatin, casein more the witness, without any glue. These characteristics are mentioned in the table 1 the p, a, g, c and t respectively. Attribute Domain Values Categories/Classe Type Name Min Max A B C D E Sample Fermentation (time {6, 8, 12, 14, 24, 30, 36} in months) - SFTM Clarification type {t, p, a, g, c} Vinification Type (vt) {C, MC, CR} ph ,45 3,45 3,56 3,63 3,56 3,63 3,7 3,7 Absorbency -A ,21 0,42 0,42 0,5 0,6 0,5 0,6 0,7 0,7 Absorbency -A ,6 0,6 0,75 0,97 0,75 0,97 1,27 1,27 Absorbency -A ,16 0,16 0,19 0,23 0,19 0,23 0,28 0,28 Chemistry Age - 0,32 0,40 0, ,32 CA 0,4 0,48 0,56 0,56 Folin-Ciocalteeau Index (FCI) Anthocyanines Ant (mg/l) Chemical Savour 1,9 8 5,4 (*) 5,4 (**) Subjective Color 2,8 8,7 6,2 (*) 6,2 (**) Foam 2 8,4 5,6 (*) 5,6 (**) Aroma 1,5 8,1 4,7 (*) 4,7 (**) Tab.1: The main wine s vinification indicators.

4 The data set has two types of attributes: attributes with chemical characteristics and subjective attributes (colour, foam, flavour and savour). Table 1 presents the attributes of the data set with maximum and minimum values for continuous attributes and their correspondence in classes (A, B, C, D and E). The continued division of the values into five classes was decided by the production managers in the context of the green red wines. The subjective parameters (savour, colour, flavour and foam) are divided in two classes corresponding to "medium" for class "A" and "good" for class "B". As we mentioned the first objective is to create classification models for the various subjective attributes of the samples. It is intended to analyze the variation of chemical parameters and the predictive value of the subjective parameters in terms of two classes: "A" ( Medium represented as (*) in the table 1) and "B" ( Good represented by (**) in the table 1) corresponding to the "medium" and "good" evaluation of the attribute. Fig.2: Histograms for the attributes of the wine vinification data set. Several physico-chemical attributes associated with the production of wine [2, 9]. Despite the relevance of these parameters, the attributes that the managers of production considered most relevant to the analysis of the wine production are the ph, the absorbency at 420 nm (contribution to the colour blue), the absorbency a520nm (contribution to the colour red-blue), the absorbency to 620 nm (contribution to the color yellow), the Anthocyanins [9,10], the Chemistry Age (CA) [9] and the Folin-Ciocalteau Index [9]. The fermentation sample (SFTTM) corresponds to the time in months of the sample collection. This indicator has the values 6 till 36 months corresponding to the period of the vinification process. One particularity of the colour chemical parameter is that it is determined by the sum of absorbances at three a wavelength (420, 520 and 620 nm). For this reason the chemical parameter of the colour was removed from the dataset. Before attempting the DM modelling, the data was pre-processed. The original dataset contained attributes with missing values. Since it was not possible to obtain the correct values the blank records were discarded [11] remaining a total of 362 examples. The main features of the vinification data set are described in Table 1. The frequency distributions (or histograms) related to these variables are plotted in Figure 2.

5 According to the managers of this wine estate, the classification of the vinification wine s quality was defined as a typical classification and regression problem. 2.2 Decision Trees and Linear Regression The Decision Tree (DT) [12] is one of the most popular Data Mining and efficient classification algorithms. Corresponds to a representation of a set of rules that follow a hierarchy of classes or values, expressing a simple conditional logic and are graphically similar to a tree (figure 3). The DT corresponds to representations of a set of rules for classification, which classifies instances, from the root node to a terminal node (leaves), which provides the classification for the instance: each node of the tree specifies a test for the attributes of the instance (variable) and descending branch of each node corresponding to one of the possible values for this attribute. An instance is classified first by testing the attribute specified by the root node, then following the branch corresponding to the value of the attribute in the instance. Fig.3: Decision Tree example for the attribute Colour The most popular decision trees algorithms for classification are ID3, C4.5 and C5.0 proposed by Ross Quinlan [5]. The CART classification algorithm proposed by Breimann [6] is also widely adopted. In this study we use the C 4.5 implementation using the WEKA tool and the Microsoft Decision Trees has a hybrid of these algorithms (C4.5 and CART). The C4.5 is a decision tree algorithm that is based on the concept of information gain. The information gain represents the decrease in entropy caused by dividing a given data set according to an attribute. The attribute with the highest gain is chosen to divide the data set, and recursive

6 application of this procedure for different relevant attributes allows the structuring of the data set w.r.t. the relevant attributes. In this study the J48 [7], which is a Java re-implementation of C4.5 algorithm [13] and is a part of the machine learning package WEKA [7] was used to induce the decision trees under the open-source tool (WEKA). The other tool was the proprietary Microsoft Business Intelligence Studio 2008 [8]. The objective of the Linear Regression [6] is to find a basis for predicting one variable, i.e. find a function that represents a form to represent the variables behaviour (figure 4). Linear Regression uses interestingness and corresponds to rank and sort attributes in columns that contain continuous non-binary numeric date. The Interestingness score will be used to assess all input columns, to ensure consistency. Fig.4: Linear Regression between the attribute Flavour and Anthocyanin. The regression typically requires that both dependent and independent variables are continuous and numeric type. In this study, we applied linear regression to obtain lines for predicting the variables of subjective data set. For this reason were removed from the set of non-numeric data attributes: Clarification Vinification Type and Type. 3. Results Attending that the wine vinification analysis it was decided to develop the experimentation based on the classification of the subjective attributes of the data set. As we mentioned we use the Decision Trees and the linear regression. These two approaches will be compared and the criteria will be the predictive accuracy. Fig.5: Attribute dependency for the Flavour attribute. The classification models for wine vinification analysis were developed using the C4.5 algorithm [12]. To insure statistical significance of the attained results, 10 runs were applied

7 in all tests, being the accuracy estimates achieved using Holdout method [13]. The training strategy was separated in a balanced and non-balanced training sets. In each simulation, the available data is randomly divided into mutually exclusive partitions: the training set, with 2/3 of the available data and used during the modelling phase; and the test set, with remaining 1/3 examples, being used after training, in order to compute the accuracy values. A common tool for classification analysis is the confusion matrix [14]. This matrix is a structure of size N x N, where N denotes the number of possible cases. This matrix is created by matching the predicated given by the Data Mining model and the actual desired result. In the presented experiments, J48 [7] with defaults values of parameters was used for inducing classification trees. Model training and validation was based on 10-fold cross-validation and evaluated the number of correctly classification instances. 3.1 Experimental Results Table 2 presents the confusion matrix of the DT applicability for each tool, where the values denote the average of 10 runs. Both approaches have a predictive accuracy of about 90%. Analyzing the experimental results we can verify that when using the two different tools there are no improvements when using balanced training sets. The results reveal that the Model 1 (Microsoft Decision Tree) is more accurate than the model 2 (WEKA Decision Tree). Colour Foam Flavour Savour Model 1 - Microsoft Decision Tree Classification Matrix Predict Probability Score A B A B A ,36% 0,85 B A ,48% 0,95 B A ,18% 0,89 B A ,31% 0,96 B Tab. 2 Confusion Matrix of the obtained models. Model 2 WEKA Decision Tree Correct Confusion Matrix Classified Instances Model 1 - Microsoft Decision Tree Model 2 WEKA Decision Tree Colour Vinification Type and SFTM SFTM Foam Clarificant Type and SFTM SFTM and Clarification Type Flavour Vinification Type, CA SFTM Savour SFTM, Clarification Type and SFTM and Clarification Type Vinification Type Tab. 3 Releveant attributes for the Microsoft BI model and WEKA model. 83,3% 87,2% 92,2% 87,5% Table 3 presents the most relevant attributes for the various classification models obtained by the applying of the DT. A particularity of both tools is that both tools selects the attribute "SFTM" as the most relevant for classifying the various subjective attributes. The second most important attribute is the clarification type. For the classification of the attribute "Flavour" the most relevant attributes are the "vinification type" for the tool WEKA Vinification type and Chemistry Age for the Microsoft tool. The figure 5 presents the parameter dependency for the flavour attribute. Despite practically the tools obtain the same accuracy (91.18% and 92%) the difference in the selection of the attributes is justified by the more detailed analysis of correlation between the attributes by a tool against the other.

8 General rules of the type IF THEN can be deducted from decision trees by following the path from the leave node to the root node of the tree. From the tree of the figure 3 it could be derived that if Chemistry Age is equal to "D" (between 0.48 and 0.56) and the vinification type not equal to "CR" then the Flavour attribute is "B" ( "good") with a probability of 91.18%. Colour Foam Flavour Savour Model 1 - Microsoft Decision Tree IF SFTM= 36 AND ca= B THEN COLOUR= B (99,36%) IF SFTM = 26 AND a420= B THEN FOAM= A (99,81%) IF ca= A AND ant = A THEN FLAVOUR= A (99,35%) IF SFTM {8, 30, 36} and vt = 'C' and CA = 'D' THEN SAVOUR= B (98,98%) Model 2 WEKA Decision Tree IF SFTM="20" AND vt="cr" AND ant=e THEN B (75%) IF SFTM="20" AND ct="g" and ct="cr" THEN A (85%) IF SFTM="26" AND ant="b" THEN A (70%) IF SFTM="14" AND ct="g" AND vt="mc" THEN B (65%) Tab 4 Rules derived by the Data Mining Tools applying the Decision Trees technique. The rules presented in table 4, corresponds to the top of the path tree. For the "colour" attribute one example of a rule can be extracted as: IF SFTM is different than 30 and 36 and the vinification is type 'C' then the colour will be "good" (class "B") with an predictive probability of 94%. For the model 2, if the SFTM is equal to 26 and the Anthocyanins equal to "B" (between 160 and 230) then the Falvour attribute is "A" (Medium) with a probability of 70%. Colour Flavour Foam Model 5 - Microsoft Linear Regression Model 2 WEKA Decision Tree Model Score Model CC Colour = 6,241-0,060*(SFTM- 1,38 Colour = * SFTM * ph 0.71% 22,087)+0,002*(AntmgL-351,970) * A * A * Flavour = 4,358+1,556*(CA-0,455)- 0,141*(SFTM-22,429) Foam = 5,430+1,228*(p H-3,571)+1,583*(CA- 0,458)-0,088*(SFTM-22,270) FCI * Ant(mg/L) ,91 Flavour = * SFTM * ph * A * CA ,41 Foam = * SFTM * ph * A * CA * Ant(mg/L) Savour Savour = 5,133-0,123*(SFTM-22,143) 1,49 Savour = * SFTM * ph * A * A (*)Correctly Classified Instances Tab.6 Linear Regression results 0.88% 0,76% 0,79% As we mentioned the objective of the regression is to find a function (Figure 4) which represents an approximate form of the variables behaviour. The linear regression obtained by the application of Microsoft Linear Regression for the attribute flavour is presented in the figure 3 and the equations in the table Discussion As we present in the tables, the performance of the Microsoft Decision Tree was better that the open-source tool. Accuracies between 85% and 98% were achieved by the Microsoft tool and 83% to 93% for the open-source tool. The most influence attributes for both tools were the SFTM and the clarification type that influences the prediction of the subjective attributes from the chemical parameters of the wine vinification process. This shows the importance of the time of the sampling and by the clarificat used. As the SFTM value increases, the quality

9 of the sample in the various subjective attributes decrease indicating that this type of wine should be consumed between 8 and 12 months after the vinification process. Given the results, the production managers can use such tools in other data sets with more chemical parameters in the wine production providing additional support to the production managers. 4. Conclusion This paper presented a study of the organoletic prediction attributes (colour, foam, flavour and savour) in the wine vinification process using the Decision Trees and Linear Regression models as Data Mining techniques. The experiments were conducted using the new Microsoft Business Intelligence Studio 2008 and the open-source WEKA tool. Accuracies between 85% and 98% were obtained, indicating that the use of Data Mining models can be used to predictive subjective attributes in the wine vinification process based on chemical parameters. It was possible to create classification models for the various subjective attributes in order to identify the relevance of other attributes. Although the data set contains few attributes quite good results were attained. In the future it should be interesting also to consider a new set of chemical attributes in the wine production. With this work we present the advantages of using Data Mining tools to support decision-making process in particular in the winemaking field. Literature [1] Castillo-Sanchez, J.X., Arantes J. et Maia, M.O. Étude de l' Évolution des Composés Phénoliques des Vins du Nord du Portugal Issues des Différentes Processus de Vinification. In: Polyphenols Comunications 96 Vol. I, 18th International Conference on Polyphenols, July 15-18, Bordeaux, pp: 55-56, [2] Castillo-Sanchez, J.X, Mejuto, J.C., Garrido, J. and Garcia-Falcón, S. Influence of winemaking protocol and fining agents on the evolution of the anthocyanin content, color and general organoleptic quality of Vinhão wines. Food Chemistry, 97, 1, pp: , [3] OIV. Office Internationale de la Vigne et du Vin. Recueil des Méthodes Internationales d Analyse des Vins et des Moûts., Paris, [4] Fayyad, U.M., Pialetski, G., Smith, P. Advances in Knowledge Discovery and Data Mining., The MIT Press, Massachussets, USA, [5] Quilan, J.R., Induction of decision trees. Machine Learning, pp: , [6] Breimann, L., Friedman, J., Olshen A., Stone J., Classification and Regression trees. Wadsworth, Pacific Grove, [7] Witten, I.H., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco, p. 369, [8] Larson, B., Delivering Business Intelligence with Microsoft SQL Server 2008, McGraw-Hill Osborne Media; 2 edition, [9] Somers, T.C. and Evans, M.E. Spectral Evaluation of Young Red Wines: Anthocyanin Equilibria, Total Phenolics, Free and Molecular SO 2,"Chemical Age". J. Sci. Food Agric., 28, pp: , [10] Papadopoulou C., Kalliopi, S., Ioannis, R., Potential Antimicrobial Activity of Red and White Wine Phenolic Extracts against Strains of Staphylococcus aureus, Escherichia

10 coli and Candida albicans, CAntimicrobial Activity of Wine Phenolic Extracts, Food Technol. Biotechnol. 43 (1) pp.41 46, [11] Pyle, D., Data Preparation for Data Mining, Morgan Kauffman Publishers, [12] Quilan, J.R., Bagging Boosting and C4.5, Proceedings of the fourteenth National Conference on Artificial Intelligence. [13] Souza, J., Matwin, S., Japkowicz, N., Evaluating Data Mining Models: A Pattern Language, Proceedings of the 9 th Conference on Pattern Language of Programs, Illinois, USA, [14] Kohavi, R., Provost, F., Glossary of Terms, Machine Learning, 30 (2/3), pp , Apendix A Alcohol (vol.%) Sugar (gl -1 ) Volatile acidity (gl -1 ) Total acidity (gl -1 ) Sulphur dioxide total (mgl -1 Free sulphur dioxide (mgl -1 ) C 10,5+/-0,05 1,70+/-0,015 0,31+/-0,012 9,97+/-0,34 111,02+/-3,6 30,21+/-0,95 3,29+/-0,1 CM 10,7+/-0,05 1,77+/-0,02 0,55+/-0,022 6,69+/-0,24 99,0+/-2,8 25,36+/-0,75 3,49+/-0,12 RF 10,1+/-0,06 1,80+/-0,025 0,49+/-0,019 10,45+/-0,64 109,59+/-2,6 26,12+/-0,75 3,38+/-0,12 Tab. 7 Chemical parameters of the green red wines (three vinifications; average of the three samples) ph

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification

Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde

More information

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS

ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com

More information

Weather forecast prediction: a Data Mining application

Weather forecast prediction: a Data Mining application Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES

DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com

More information

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES

IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering

More information

DATA MINING METHODS WITH TREES

DATA MINING METHODS WITH TREES DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes

More information

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control

Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;

More information

Benchmarking Open-Source Tree Learners in R/RWeka

Benchmarking Open-Source Tree Learners in R/RWeka Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems

More information

Evaluating Data Mining Models: A Pattern Language

Evaluating Data Mining Models: A Pattern Language Evaluating Data Mining Models: A Pattern Language Jerffeson Souza Stan Matwin Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa K1N 6N5, Canada {jsouza,stan,nat}@site.uottawa.ca

More information

Data quality in Accounting Information Systems

Data quality in Accounting Information Systems Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania

More information

The grapes were harvested at full technological maturity and processed according to white wines production protocol. The must obtained from

The grapes were harvested at full technological maturity and processed according to white wines production protocol. The must obtained from ABSTRACT Key words: pre-fermentative treatments, oxalic acid, activated carbon, papain. Currently, winemaking has the necessary technologies, oenological practices and products that enable the development

More information

Data Mining Algorithms Part 1. Dejan Sarka

Data Mining Algorithms Part 1. Dejan Sarka Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

How To Predict Web Site Visits

How To Predict Web Site Visits Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM. DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,

More information

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA

Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Volume 54 No13, September 2012 Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Rohit Arora MTech CSE Deptt Hindu College of Engineering Sonepat, Haryana, India Suman

More information

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One

More information

Interactive Exploration of Decision Tree Results

Interactive Exploration of Decision Tree Results Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,amorin@irisa.fr) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant

More information

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

More information

Studying Auto Insurance Data

Studying Auto Insurance Data Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION

HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan

More information

Automatic Resolver Group Assignment of IT Service Desk Outsourcing

Automatic Resolver Group Assignment of IT Service Desk Outsourcing Automatic Resolver Group Assignment of IT Service Desk Outsourcing in Banking Business Padej Phomasakha Na Sakolnakorn*, Phayung Meesad ** and Gareth Clayton*** Abstract This paper proposes a framework

More information

Course Syllabus For Operations Management. Management Information Systems

Course Syllabus For Operations Management. Management Information Systems For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third

More information

Data Mining Solutions for the Business Environment

Data Mining Solutions for the Business Environment Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over

More information

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/

More information

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty

More information

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods

Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing

More information

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES

PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer

More information

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing

Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University

More information

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms

Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya

More information

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION

ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical

More information

D A T A M I N I N G C L A S S I F I C A T I O N

D A T A M I N I N G C L A S S I F I C A T I O N D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.

More information

not possible or was possible at a high cost for collecting the data.

not possible or was possible at a high cost for collecting the data. Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

Data Mining Applications in Fund Raising

Data Mining Applications in Fund Raising Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING

GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL

More information

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results

Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Open Source Software: How Can Design Metrics Facilitate Architecture Recovery?

Open Source Software: How Can Design Metrics Facilitate Architecture Recovery? Open Source Software: How Can Design Metrics Facilitate Architecture Recovery? Eleni Constantinou 1, George Kakarontzas 2, and Ioannis Stamelos 1 1 Computer Science Department Aristotle University of Thessaloniki

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Predicting Students Final GPA Using Decision Trees: A Case Study

Predicting Students Final GPA Using Decision Trees: A Case Study Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)

Cost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO) Cost Drivers of a Parametric Cost Estimation Model for Mining Projects (DMCOMO) Oscar Marbán, Antonio de Amescua, Juan J. Cuadrado, Luis García Universidad Carlos III de Madrid (UC3M) Abstract Mining is

More information

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE

A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data

More information

Advanced analytics at your hands

Advanced analytics at your hands 2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data

Extension of Decision Tree Algorithm for Stream Data Mining Using Real Data Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream

More information

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.

Keywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques. International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant

More information

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set

Overview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification

More information

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,

More information

Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies

Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Spam

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn

More information

Chapter 12 Discovering New Knowledge Data Mining

Chapter 12 Discovering New Knowledge Data Mining Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to

More information

The Prophecy-Prototype of Prediction modeling tool

The Prophecy-Prototype of Prediction modeling tool The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

Data Mining with SQL Server Data Tools

Data Mining with SQL Server Data Tools Data Mining with SQL Server Data Tools Data mining tasks include classification (directed/supervised) models as well as (undirected/unsupervised) models of association analysis and clustering. 1 Data Mining

More information

On the effect of data set size on bias and variance in classification learning

On the effect of data set size on bias and variance in classification learning On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

Prediction of Stock Performance Using Analytical Techniques

Prediction of Stock Performance Using Analytical Techniques 136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University

More information

Maschinelles Lernen mit MATLAB

Maschinelles Lernen mit MATLAB Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical

More information

Data Mining and Visualization

Data Mining and Visualization Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Performance Analysis of Decision Trees

Performance Analysis of Decision Trees Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India

More information

Rule based Classification of BSE Stock Data with Data Mining

Rule based Classification of BSE Stock Data with Data Mining International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification

More information

Assessing Data Mining: The State of the Practice

Assessing Data Mining: The State of the Practice Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality

More information

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms

First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing

More information

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

In this presentation, you will be introduced to data mining and the relationship with meaningful use. In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine

More information

Classification On The Clouds Using MapReduce

Classification On The Clouds Using MapReduce Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt

More information

INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION

INVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION INVESTIGATIONS INTO EFFECTIVENESS OF AND CLASSIFIERS FOR SPAM DETECTION Upasna Attri C.S.E. Department, DAV Institute of Engineering and Technology, Jalandhar (India) upasnaa.8@gmail.com Harpreet Kaur

More information

Predicting Critical Problems from Execution Logs of a Large-Scale Software System

Predicting Critical Problems from Execution Logs of a Large-Scale Software System Predicting Critical Problems from Execution Logs of a Large-Scale Software System Árpád Beszédes, Lajos Jenő Fülöp and Tibor Gyimóthy Department of Software Engineering, University of Szeged Árpád tér

More information

Comparison of K-means and Backpropagation Data Mining Algorithms

Comparison of K-means and Backpropagation Data Mining Algorithms Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and

More information

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015 An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content

More information

Classification of Learners Using Linear Regression

Classification of Learners Using Linear Regression Proceedings of the Federated Conference on Computer Science and Information Systems pp. 717 721 ISBN 978-83-60810-22-4 Classification of Learners Using Linear Regression Marian Cristian Mihăescu Software

More information

Introduction to Data Mining Techniques

Introduction to Data Mining Techniques Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

WEKA Explorer User Guide for Version 3-4-3

WEKA Explorer User Guide for Version 3-4-3 WEKA Explorer User Guide for Version 3-4-3 Richard Kirkby Eibe Frank November 9, 2004 c 2002, 2004 University of Waikato Contents 1 Launching WEKA 2 2 The WEKA Explorer 2 Section Tabs................................

More information

An Overview and Evaluation of Decision Tree Methodology

An Overview and Evaluation of Decision Tree Methodology An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Data Mining and Soft Computing. Francisco Herrera

Data Mining and Soft Computing. Francisco Herrera Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: herrera@decsai.ugr.es http://sci2s.ugr.es

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis

More information

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News

Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati

More information

Three Perspectives of Data Mining

Three Perspectives of Data Mining Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining

More information

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Tatsuya Minegishi 1, Ayahiko Niimi 2 Graduate chool of ystems Information cience,

More information