Vinification Mining A Case Study on Wine Production
|
|
- Cori Houston
- 8 years ago
- Views:
Transcription
1 Vinification Mining A Case Study on Wine Production Jorge RIBEIRO 1, José NEVES 2, Juan SANCHEZ 3, Paulo NOVAIS 2, José MACHADO 2 1 Viana do Castelo Polytechnic Institute, School of Technology and Management, Viana do Castelo, Portugal, jribeiro@estg.ipvc.pt 2 University of Minho, DI-CCTC,Department of Computer Science, Portugal,{jneves,pjon,jmac}@di.uminho.pt 3 Viana do Castelo Polytechnic Institute, Agrarian School, Portugal, xavier@esa.ipvc.pt Abstract Throughout time wine has performed a relevant role in almost every civilization. Demarcated regions to benefit from an Origin Denomination have to assure that every process on wine production is submitted to a strict control in every phase, since the vineyards till the costumer. The wine vinification process is one of the stages in wine s production that could influence the achievement of wine s quality. This assessment is traditionally realized by wine tasters that analyze some organoletic parameters such as colour, foam, flavour and savour being very important for the wine production and for its successful marketing. The use of Data Mining techniques in this field has a great relevance in revealing the importance of the numerous chemical parameters involved in the process of wine production, as well as to define classifying models to determine the parameters based on organoletic parameters from the chemical process of winemaking. The Decision Trees and the Linear Regression were used as Data Mining techniques to achieve the objectives of classification and regression. The experiments were oriented using the new Microsoft's SQL Server 2008 Business Intelligence Development Studio and an open-source Data Minig tool (WEKA). Very good results were achieved, with performances between 85% and 98% for all models. Key words Data Mining; Knowledge Discovery in Databases, Decision Trees, Linear Regression, Wine Vinification Process. 1. Introduction In the context of the wine production, the vinification process corresponds to the analysis over the time of the wine s quality. During this process (Fig. 1) several chemical parameters are analyzed such as ph, Anthocyanines, Chemistry Age, etc. [1, 2] are recorded. With these data it is possible to examine relationships between the attributes that allows to extract knowledge and create classification models in order to adjust some parameters to improve the quality of the wine and secondly, to analyze the chemical attributes that influence the best time to consume the wine. To complement the achievement of these results and in order to analyze the chemical quality of the samples the wine tasters analyze some organoletic/subjective attributes such as the savour, the colour, the flavour and the foam. In the case of the green red wines the process of winemaking begins with the wine grapes (in this case study of the vinhão wine). Next the wine is transported to an experimental winery and is made the grapes sampling. After the grape sampling it is made the fermentation process with the different types of maceration [1] and the process of racking, pressing and made the cold stabilizations (figure 1). Then is followed the procedure for the use of the glue and stabilization and for the wine bottling. To examine the organoletic/subjective quality of the
2 vinification, the vinification samples are evaluated by a set of wine tasters that reviews 8 times the same sample. The wines were produced following three different processes: pellicular fermentative maceration, a traditional method, rotary cube fermentation and the carbonic maceration as we present in the figure 1 and in the table 7 [1]. The total of wines phenolics were determined by colorimetry with phosphotungstic-phosphomolybdic acid [3] at 750nm. The results were expressed in units of the Folin-Ciocalteau Index(IFC). Grapes Vinhão Grapes sampling Transport to experimental winery Destemming/crushing Whole grapes into a CO2 satured tank Pellicular Fermentative Maceration (C) Rotary Cube Maceration (RF) Carbonic Maceration (CM) Racking after a week Racking after a week Racking after two weeks Pressing Pressing Destemming/crushing Cold stabilization/rackings Cold stabilization/rackings End of alcoolic fermentation Stabilization and bottling Stabilization and bottling Cold stabilization/rackings Stabilization and bottling Fig. 1 - Technological process used for the tree types of maceration in wine production. In recent years, the application Data Mining techniques [7] has become a very powerful tool and easy to use for analyzing relationships between various attributes of the data sets. The high volume of data stored by organizations through time origins a new challenge in the extraction of knowledge from the information stored. From the Knowledge Discovery from Databases process (KDD) [4] organizations can potentiate the stored data, discovering relationships or affinities between them and understand the behavior of the various agents that intervenue in the organization like customers, suppliers and sellers. Various tasks (selection, pre-processing, transformation, data mining and interpretation) are associated to this process. The Data Mining task is centred in the application of algorithms including: artificial neural networks, decision trees, association rules and genetic algorithms that are used to extract patterns from the previously treated data and are applied according to the KDD objectives (classification, rgression, clustering, forecasting and optimization). In this work we will use the classification and regression Data Mining objectives using the Decision Trees (DT) [5] and Linear Regression (LR) [6] as Data Mining techniques. This study focuses on the creation of classification models of subjective attributes (savour, colour, flavour and foam) [2] from the chemical parameters obtained during the process. To achieve this objective we used the DT and LR to represent mathematical functions that show the relationship between the chemical attributes to allow the creation of a function to obtain the values of a given subjective parameter. We use a data set of the green red wines vinification process of an agricultural cooperative of the North of Portugal. The tools used were an opensource tool (WEKA) [7] and a proprietary tool (Microsoft Business Intelligence 2008) [8]. With this work we intend to demonstrate the potential of the Data Mining techniques in
3 the extraction of knowledge in databases in particular for the creation of classification models of subjective attributes in the wine vinification process. With these techniques the managers of the agricultural cooperative could predict what will be the subjective values of the parameters varying the values of some chemical parameters. In this way, they can use a tool capable to assist them in the analysis of chemical parameters of the best wine to improve the wine s quality and analyse the best conditions to consume the wine. 2. Materials and Methods 2.1 Wine vinification Data This work adopted part of the data collected during the wine production phase during four years in a Wine Estate in Minho Region (North of Portugal) that produces and markets green red wine. During the process of wine vinification it was used three kinds of wine maceration [1] (figure 1): Vinification Maceration by Pellicular fermentative (C), Vinification by carbonic maceration (CM) and Vinification by Rotary Cube (CR). For each maceration type it was used five types of glue or clarification type [1]: Polyvinilpolipirrolidona, albumin, gelatin, casein more the witness, without any glue. These characteristics are mentioned in the table 1 the p, a, g, c and t respectively. Attribute Domain Values Categories/Classe Type Name Min Max A B C D E Sample Fermentation (time {6, 8, 12, 14, 24, 30, 36} in months) - SFTM Clarification type {t, p, a, g, c} Vinification Type (vt) {C, MC, CR} ph ,45 3,45 3,56 3,63 3,56 3,63 3,7 3,7 Absorbency -A ,21 0,42 0,42 0,5 0,6 0,5 0,6 0,7 0,7 Absorbency -A ,6 0,6 0,75 0,97 0,75 0,97 1,27 1,27 Absorbency -A ,16 0,16 0,19 0,23 0,19 0,23 0,28 0,28 Chemistry Age - 0,32 0,40 0, ,32 CA 0,4 0,48 0,56 0,56 Folin-Ciocalteeau Index (FCI) Anthocyanines Ant (mg/l) Chemical Savour 1,9 8 5,4 (*) 5,4 (**) Subjective Color 2,8 8,7 6,2 (*) 6,2 (**) Foam 2 8,4 5,6 (*) 5,6 (**) Aroma 1,5 8,1 4,7 (*) 4,7 (**) Tab.1: The main wine s vinification indicators.
4 The data set has two types of attributes: attributes with chemical characteristics and subjective attributes (colour, foam, flavour and savour). Table 1 presents the attributes of the data set with maximum and minimum values for continuous attributes and their correspondence in classes (A, B, C, D and E). The continued division of the values into five classes was decided by the production managers in the context of the green red wines. The subjective parameters (savour, colour, flavour and foam) are divided in two classes corresponding to "medium" for class "A" and "good" for class "B". As we mentioned the first objective is to create classification models for the various subjective attributes of the samples. It is intended to analyze the variation of chemical parameters and the predictive value of the subjective parameters in terms of two classes: "A" ( Medium represented as (*) in the table 1) and "B" ( Good represented by (**) in the table 1) corresponding to the "medium" and "good" evaluation of the attribute. Fig.2: Histograms for the attributes of the wine vinification data set. Several physico-chemical attributes associated with the production of wine [2, 9]. Despite the relevance of these parameters, the attributes that the managers of production considered most relevant to the analysis of the wine production are the ph, the absorbency at 420 nm (contribution to the colour blue), the absorbency a520nm (contribution to the colour red-blue), the absorbency to 620 nm (contribution to the color yellow), the Anthocyanins [9,10], the Chemistry Age (CA) [9] and the Folin-Ciocalteau Index [9]. The fermentation sample (SFTTM) corresponds to the time in months of the sample collection. This indicator has the values 6 till 36 months corresponding to the period of the vinification process. One particularity of the colour chemical parameter is that it is determined by the sum of absorbances at three a wavelength (420, 520 and 620 nm). For this reason the chemical parameter of the colour was removed from the dataset. Before attempting the DM modelling, the data was pre-processed. The original dataset contained attributes with missing values. Since it was not possible to obtain the correct values the blank records were discarded [11] remaining a total of 362 examples. The main features of the vinification data set are described in Table 1. The frequency distributions (or histograms) related to these variables are plotted in Figure 2.
5 According to the managers of this wine estate, the classification of the vinification wine s quality was defined as a typical classification and regression problem. 2.2 Decision Trees and Linear Regression The Decision Tree (DT) [12] is one of the most popular Data Mining and efficient classification algorithms. Corresponds to a representation of a set of rules that follow a hierarchy of classes or values, expressing a simple conditional logic and are graphically similar to a tree (figure 3). The DT corresponds to representations of a set of rules for classification, which classifies instances, from the root node to a terminal node (leaves), which provides the classification for the instance: each node of the tree specifies a test for the attributes of the instance (variable) and descending branch of each node corresponding to one of the possible values for this attribute. An instance is classified first by testing the attribute specified by the root node, then following the branch corresponding to the value of the attribute in the instance. Fig.3: Decision Tree example for the attribute Colour The most popular decision trees algorithms for classification are ID3, C4.5 and C5.0 proposed by Ross Quinlan [5]. The CART classification algorithm proposed by Breimann [6] is also widely adopted. In this study we use the C 4.5 implementation using the WEKA tool and the Microsoft Decision Trees has a hybrid of these algorithms (C4.5 and CART). The C4.5 is a decision tree algorithm that is based on the concept of information gain. The information gain represents the decrease in entropy caused by dividing a given data set according to an attribute. The attribute with the highest gain is chosen to divide the data set, and recursive
6 application of this procedure for different relevant attributes allows the structuring of the data set w.r.t. the relevant attributes. In this study the J48 [7], which is a Java re-implementation of C4.5 algorithm [13] and is a part of the machine learning package WEKA [7] was used to induce the decision trees under the open-source tool (WEKA). The other tool was the proprietary Microsoft Business Intelligence Studio 2008 [8]. The objective of the Linear Regression [6] is to find a basis for predicting one variable, i.e. find a function that represents a form to represent the variables behaviour (figure 4). Linear Regression uses interestingness and corresponds to rank and sort attributes in columns that contain continuous non-binary numeric date. The Interestingness score will be used to assess all input columns, to ensure consistency. Fig.4: Linear Regression between the attribute Flavour and Anthocyanin. The regression typically requires that both dependent and independent variables are continuous and numeric type. In this study, we applied linear regression to obtain lines for predicting the variables of subjective data set. For this reason were removed from the set of non-numeric data attributes: Clarification Vinification Type and Type. 3. Results Attending that the wine vinification analysis it was decided to develop the experimentation based on the classification of the subjective attributes of the data set. As we mentioned we use the Decision Trees and the linear regression. These two approaches will be compared and the criteria will be the predictive accuracy. Fig.5: Attribute dependency for the Flavour attribute. The classification models for wine vinification analysis were developed using the C4.5 algorithm [12]. To insure statistical significance of the attained results, 10 runs were applied
7 in all tests, being the accuracy estimates achieved using Holdout method [13]. The training strategy was separated in a balanced and non-balanced training sets. In each simulation, the available data is randomly divided into mutually exclusive partitions: the training set, with 2/3 of the available data and used during the modelling phase; and the test set, with remaining 1/3 examples, being used after training, in order to compute the accuracy values. A common tool for classification analysis is the confusion matrix [14]. This matrix is a structure of size N x N, where N denotes the number of possible cases. This matrix is created by matching the predicated given by the Data Mining model and the actual desired result. In the presented experiments, J48 [7] with defaults values of parameters was used for inducing classification trees. Model training and validation was based on 10-fold cross-validation and evaluated the number of correctly classification instances. 3.1 Experimental Results Table 2 presents the confusion matrix of the DT applicability for each tool, where the values denote the average of 10 runs. Both approaches have a predictive accuracy of about 90%. Analyzing the experimental results we can verify that when using the two different tools there are no improvements when using balanced training sets. The results reveal that the Model 1 (Microsoft Decision Tree) is more accurate than the model 2 (WEKA Decision Tree). Colour Foam Flavour Savour Model 1 - Microsoft Decision Tree Classification Matrix Predict Probability Score A B A B A ,36% 0,85 B A ,48% 0,95 B A ,18% 0,89 B A ,31% 0,96 B Tab. 2 Confusion Matrix of the obtained models. Model 2 WEKA Decision Tree Correct Confusion Matrix Classified Instances Model 1 - Microsoft Decision Tree Model 2 WEKA Decision Tree Colour Vinification Type and SFTM SFTM Foam Clarificant Type and SFTM SFTM and Clarification Type Flavour Vinification Type, CA SFTM Savour SFTM, Clarification Type and SFTM and Clarification Type Vinification Type Tab. 3 Releveant attributes for the Microsoft BI model and WEKA model. 83,3% 87,2% 92,2% 87,5% Table 3 presents the most relevant attributes for the various classification models obtained by the applying of the DT. A particularity of both tools is that both tools selects the attribute "SFTM" as the most relevant for classifying the various subjective attributes. The second most important attribute is the clarification type. For the classification of the attribute "Flavour" the most relevant attributes are the "vinification type" for the tool WEKA Vinification type and Chemistry Age for the Microsoft tool. The figure 5 presents the parameter dependency for the flavour attribute. Despite practically the tools obtain the same accuracy (91.18% and 92%) the difference in the selection of the attributes is justified by the more detailed analysis of correlation between the attributes by a tool against the other.
8 General rules of the type IF THEN can be deducted from decision trees by following the path from the leave node to the root node of the tree. From the tree of the figure 3 it could be derived that if Chemistry Age is equal to "D" (between 0.48 and 0.56) and the vinification type not equal to "CR" then the Flavour attribute is "B" ( "good") with a probability of 91.18%. Colour Foam Flavour Savour Model 1 - Microsoft Decision Tree IF SFTM= 36 AND ca= B THEN COLOUR= B (99,36%) IF SFTM = 26 AND a420= B THEN FOAM= A (99,81%) IF ca= A AND ant = A THEN FLAVOUR= A (99,35%) IF SFTM {8, 30, 36} and vt = 'C' and CA = 'D' THEN SAVOUR= B (98,98%) Model 2 WEKA Decision Tree IF SFTM="20" AND vt="cr" AND ant=e THEN B (75%) IF SFTM="20" AND ct="g" and ct="cr" THEN A (85%) IF SFTM="26" AND ant="b" THEN A (70%) IF SFTM="14" AND ct="g" AND vt="mc" THEN B (65%) Tab 4 Rules derived by the Data Mining Tools applying the Decision Trees technique. The rules presented in table 4, corresponds to the top of the path tree. For the "colour" attribute one example of a rule can be extracted as: IF SFTM is different than 30 and 36 and the vinification is type 'C' then the colour will be "good" (class "B") with an predictive probability of 94%. For the model 2, if the SFTM is equal to 26 and the Anthocyanins equal to "B" (between 160 and 230) then the Falvour attribute is "A" (Medium) with a probability of 70%. Colour Flavour Foam Model 5 - Microsoft Linear Regression Model 2 WEKA Decision Tree Model Score Model CC Colour = 6,241-0,060*(SFTM- 1,38 Colour = * SFTM * ph 0.71% 22,087)+0,002*(AntmgL-351,970) * A * A * Flavour = 4,358+1,556*(CA-0,455)- 0,141*(SFTM-22,429) Foam = 5,430+1,228*(p H-3,571)+1,583*(CA- 0,458)-0,088*(SFTM-22,270) FCI * Ant(mg/L) ,91 Flavour = * SFTM * ph * A * CA ,41 Foam = * SFTM * ph * A * CA * Ant(mg/L) Savour Savour = 5,133-0,123*(SFTM-22,143) 1,49 Savour = * SFTM * ph * A * A (*)Correctly Classified Instances Tab.6 Linear Regression results 0.88% 0,76% 0,79% As we mentioned the objective of the regression is to find a function (Figure 4) which represents an approximate form of the variables behaviour. The linear regression obtained by the application of Microsoft Linear Regression for the attribute flavour is presented in the figure 3 and the equations in the table Discussion As we present in the tables, the performance of the Microsoft Decision Tree was better that the open-source tool. Accuracies between 85% and 98% were achieved by the Microsoft tool and 83% to 93% for the open-source tool. The most influence attributes for both tools were the SFTM and the clarification type that influences the prediction of the subjective attributes from the chemical parameters of the wine vinification process. This shows the importance of the time of the sampling and by the clarificat used. As the SFTM value increases, the quality
9 of the sample in the various subjective attributes decrease indicating that this type of wine should be consumed between 8 and 12 months after the vinification process. Given the results, the production managers can use such tools in other data sets with more chemical parameters in the wine production providing additional support to the production managers. 4. Conclusion This paper presented a study of the organoletic prediction attributes (colour, foam, flavour and savour) in the wine vinification process using the Decision Trees and Linear Regression models as Data Mining techniques. The experiments were conducted using the new Microsoft Business Intelligence Studio 2008 and the open-source WEKA tool. Accuracies between 85% and 98% were obtained, indicating that the use of Data Mining models can be used to predictive subjective attributes in the wine vinification process based on chemical parameters. It was possible to create classification models for the various subjective attributes in order to identify the relevance of other attributes. Although the data set contains few attributes quite good results were attained. In the future it should be interesting also to consider a new set of chemical attributes in the wine production. With this work we present the advantages of using Data Mining tools to support decision-making process in particular in the winemaking field. Literature [1] Castillo-Sanchez, J.X., Arantes J. et Maia, M.O. Étude de l' Évolution des Composés Phénoliques des Vins du Nord du Portugal Issues des Différentes Processus de Vinification. In: Polyphenols Comunications 96 Vol. I, 18th International Conference on Polyphenols, July 15-18, Bordeaux, pp: 55-56, [2] Castillo-Sanchez, J.X, Mejuto, J.C., Garrido, J. and Garcia-Falcón, S. Influence of winemaking protocol and fining agents on the evolution of the anthocyanin content, color and general organoleptic quality of Vinhão wines. Food Chemistry, 97, 1, pp: , [3] OIV. Office Internationale de la Vigne et du Vin. Recueil des Méthodes Internationales d Analyse des Vins et des Moûts., Paris, [4] Fayyad, U.M., Pialetski, G., Smith, P. Advances in Knowledge Discovery and Data Mining., The MIT Press, Massachussets, USA, [5] Quilan, J.R., Induction of decision trees. Machine Learning, pp: , [6] Breimann, L., Friedman, J., Olshen A., Stone J., Classification and Regression trees. Wadsworth, Pacific Grove, [7] Witten, I.H., Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann Publishers, San Francisco, p. 369, [8] Larson, B., Delivering Business Intelligence with Microsoft SQL Server 2008, McGraw-Hill Osborne Media; 2 edition, [9] Somers, T.C. and Evans, M.E. Spectral Evaluation of Young Red Wines: Anthocyanin Equilibria, Total Phenolics, Free and Molecular SO 2,"Chemical Age". J. Sci. Food Agric., 28, pp: , [10] Papadopoulou C., Kalliopi, S., Ioannis, R., Potential Antimicrobial Activity of Red and White Wine Phenolic Extracts against Strains of Staphylococcus aureus, Escherichia
10 coli and Candida albicans, CAntimicrobial Activity of Wine Phenolic Extracts, Food Technol. Biotechnol. 43 (1) pp.41 46, [11] Pyle, D., Data Preparation for Data Mining, Morgan Kauffman Publishers, [12] Quilan, J.R., Bagging Boosting and C4.5, Proceedings of the fourteenth National Conference on Artificial Intelligence. [13] Souza, J., Matwin, S., Japkowicz, N., Evaluating Data Mining Models: A Pattern Language, Proceedings of the 9 th Conference on Pattern Language of Programs, Illinois, USA, [14] Kohavi, R., Provost, F., Glossary of Terms, Machine Learning, 30 (2/3), pp , Apendix A Alcohol (vol.%) Sugar (gl -1 ) Volatile acidity (gl -1 ) Total acidity (gl -1 ) Sulphur dioxide total (mgl -1 Free sulphur dioxide (mgl -1 ) C 10,5+/-0,05 1,70+/-0,015 0,31+/-0,012 9,97+/-0,34 111,02+/-3,6 30,21+/-0,95 3,29+/-0,1 CM 10,7+/-0,05 1,77+/-0,02 0,55+/-0,022 6,69+/-0,24 99,0+/-2,8 25,36+/-0,75 3,49+/-0,12 RF 10,1+/-0,06 1,80+/-0,025 0,49+/-0,019 10,45+/-0,64 109,59+/-2,6 26,12+/-0,75 3,38+/-0,12 Tab. 7 Chemical parameters of the green red wines (three vinifications; average of the three samples) ph
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationFeature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification
Feature vs. Classifier Fusion for Predictive Data Mining a Case Study in Pesticide Classification Henrik Boström School of Humanities and Informatics University of Skövde P.O. Box 408, SE-541 28 Skövde
More informationANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS
ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS Abstract D.Lavanya * Department of Computer Science, Sri Padmavathi Mahila University Tirupati, Andhra Pradesh, 517501, India lav_dlr@yahoo.com
More informationWeather forecast prediction: a Data Mining application
Weather forecast prediction: a Data Mining application Ms. Ashwini Mandale, Mrs. Jadhawar B.A. Assistant professor, Dr.Daulatrao Aher College of engg,karad,ashwini.mandale@gmail.com,8407974457 Abstract
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationDECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 lakshmi.mahanra@gmail.com
More informationIDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES
IDENTIFYING BANK FRAUDS USING CRISP-DM AND DECISION TREES Bruno Carneiro da Rocha 1,2 and Rafael Timóteo de Sousa Júnior 2 1 Bank of Brazil, Brasília-DF, Brazil brunorocha_33@hotmail.com 2 Network Engineering
More informationDATA MINING METHODS WITH TREES
DATA MINING METHODS WITH TREES Marta Žambochová 1. Introduction The contemporary world is characterized by the explosion of an enormous volume of data deposited into databases. Sharp competition contributes
More informationData Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control
Data Mining for Manufacturing: Preventive Maintenance, Failure Prediction, Quality Control Andre BERGMANN Salzgitter Mannesmann Forschung GmbH; Duisburg, Germany Phone: +49 203 9993154, Fax: +49 203 9993234;
More informationBenchmarking Open-Source Tree Learners in R/RWeka
Benchmarking Open-Source Tree Learners in R/RWeka Michael Schauerhuber 1, Achim Zeileis 1, David Meyer 2, Kurt Hornik 1 Department of Statistics and Mathematics 1 Institute for Management Information Systems
More informationEvaluating Data Mining Models: A Pattern Language
Evaluating Data Mining Models: A Pattern Language Jerffeson Souza Stan Matwin Nathalie Japkowicz School of Information Technology and Engineering University of Ottawa K1N 6N5, Canada {jsouza,stan,nat}@site.uottawa.ca
More informationData quality in Accounting Information Systems
Data quality in Accounting Information Systems Comparing Several Data Mining Techniques Erjon Zoto Department of Statistics and Applied Informatics Faculty of Economy, University of Tirana Tirana, Albania
More informationThe grapes were harvested at full technological maturity and processed according to white wines production protocol. The must obtained from
ABSTRACT Key words: pre-fermentative treatments, oxalic acid, activated carbon, papain. Currently, winemaking has the necessary technologies, oenological practices and products that enable the development
More informationData Mining Algorithms Part 1. Dejan Sarka
Data Mining Algorithms Part 1 Dejan Sarka Join the conversation on Twitter: @DevWeek #DW2015 Instructor Bio Dejan Sarka (dsarka@solidq.com) 30 years of experience SQL Server MVP, MCT, 13 books 7+ courses
More informationAn Introduction to Data Mining
An Introduction to Intel Beijing wei.heng@intel.com January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail
More informationHow To Predict Web Site Visits
Web Site Visit Forecasting Using Data Mining Techniques Chandana Napagoda Abstract: Data mining is a technique which is used for identifying relationships between various large amounts of data in many
More informationDecision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationDATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
More informationComparative Analysis of Classification Algorithms on Different Datasets using WEKA
Volume 54 No13, September 2012 Comparative Analysis of Classification Algorithms on Different Datasets using WEKA Rohit Arora MTech CSE Deptt Hindu College of Engineering Sonepat, Haryana, India Suman
More informationEMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
More informationInteractive Exploration of Decision Tree Results
Interactive Exploration of Decision Tree Results 1 IRISA Campus de Beaulieu F35042 Rennes Cedex, France (email: pnguyenk,amorin@irisa.fr) 2 INRIA Futurs L.R.I., University Paris-Sud F91405 ORSAY Cedex,
More informationData Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
More informationA STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS
A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS Mrs. Jyoti Nawade 1, Dr. Balaji D 2, Mr. Pravin Nawade 3 1 Lecturer, JSPM S Bhivrabai Sawant Polytechnic, Pune (India) 2 Assistant
More informationENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
More informationStudying Auto Insurance Data
Studying Auto Insurance Data Ashutosh Nandeshwar February 23, 2010 1 Introduction To study auto insurance data using traditional and non-traditional tools, I downloaded a well-studied data from http://www.statsci.org/data/general/motorins.
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationBOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL
The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University
More informationChapter 6. The stacking ensemble approach
82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described
More informationHYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION
HYBRID PROBABILITY BASED ENSEMBLES FOR BANKRUPTCY PREDICTION Chihli Hung 1, Jing Hong Chen 2, Stefan Wermter 3, 1,2 Department of Management Information Systems, Chung Yuan Christian University, Taiwan
More informationAutomatic Resolver Group Assignment of IT Service Desk Outsourcing
Automatic Resolver Group Assignment of IT Service Desk Outsourcing in Banking Business Padej Phomasakha Na Sakolnakorn*, Phayung Meesad ** and Gareth Clayton*** Abstract This paper proposes a framework
More informationCourse Syllabus For Operations Management. Management Information Systems
For Operations Management and Management Information Systems Department School Year First Year First Year First Year Second year Second year Second year Third year Third year Third year Third year Third
More informationData Mining Solutions for the Business Environment
Database Systems Journal vol. IV, no. 4/2013 21 Data Mining Solutions for the Business Environment Ruxandra PETRE University of Economic Studies, Bucharest, Romania ruxandra_stefania.petre@yahoo.com Over
More informationWelcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA
Welcome Xindong Wu Data Mining: Updates in Technologies Dept of Math and Computer Science Colorado School of Mines Golden, Colorado 80401, USA Email: xwu@ mines.edu Home Page: http://kais.mines.edu/~xwu/
More informationQuality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report
Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report G. Banos 1, P.A. Mitkas 2, Z. Abas 3, A.L. Symeonidis 2, G. Milis 2 and U. Emanuelson 4 1 Faculty
More informationMining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods
Mining Direct Marketing Data by Ensembles of Weak Learners and Rough Set Methods Jerzy B laszczyński 1, Krzysztof Dembczyński 1, Wojciech Kot lowski 1, and Mariusz Paw lowski 2 1 Institute of Computing
More informationPREDICTING STOCK PRICES USING DATA MINING TECHNIQUES
The International Arab Conference on Information Technology (ACIT 2013) PREDICTING STOCK PRICES USING DATA MINING TECHNIQUES 1 QASEM A. AL-RADAIDEH, 2 ADEL ABU ASSAF 3 EMAN ALNAGI 1 Department of Computer
More informationData Mining Framework for Direct Marketing: A Case Study of Bank Marketing
www.ijcsi.org 198 Data Mining Framework for Direct Marketing: A Case Study of Bank Marketing Lilian Sing oei 1 and Jiayang Wang 2 1 School of Information Science and Engineering, Central South University
More informationFine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms
Fine Particulate Matter Concentration Level Prediction by using Tree-based Ensemble Classification Algorithms Yin Zhao School of Mathematical Sciences Universiti Sains Malaysia (USM) Penang, Malaysia Yahya
More informationON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION
ISSN 9 X INFORMATION TECHNOLOGY AND CONTROL, 00, Vol., No.A ON INTEGRATING UNSUPERVISED AND SUPERVISED CLASSIFICATION FOR CREDIT RISK EVALUATION Danuta Zakrzewska Institute of Computer Science, Technical
More informationD A T A M I N I N G C L A S S I F I C A T I O N
D A T A M I N I N G C L A S S I F I C A T I O N FABRICIO VOZNIKA LEO NARDO VIA NA INTRODUCTION Nowadays there is huge amount of data being collected and stored in databases everywhere across the globe.
More informationnot possible or was possible at a high cost for collecting the data.
Data Mining and Knowledge Discovery Generating knowledge from data Knowledge Discovery Data Mining White Paper Organizations collect a vast amount of data in the process of carrying out their day-to-day
More informationData Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1
Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints
More informationData Mining Applications in Fund Raising
Data Mining Applications in Fund Raising Nafisseh Heiat Data mining tools make it possible to apply mathematical models to the historical data to manipulate and discover new information. In this study,
More informationPredicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
More informationGEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL CLUSTERING
Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 GEO-VISUALIZATION SUPPORT FOR MULTIDIMENSIONAL
More informationCredit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results
From: AAAI Technical Report WS-97-07. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Credit Card Fraud Detection Using Meta-Learning: Issues 1 and Initial Results Salvatore 2 J.
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationOpen Source Software: How Can Design Metrics Facilitate Architecture Recovery?
Open Source Software: How Can Design Metrics Facilitate Architecture Recovery? Eleni Constantinou 1, George Kakarontzas 2, and Ioannis Stamelos 1 1 Computer Science Department Aristotle University of Thessaloniki
More informationAn Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
More informationPredicting Students Final GPA Using Decision Trees: A Case Study
Predicting Students Final GPA Using Decision Trees: A Case Study Mashael A. Al-Barrak and Muna Al-Razgan Abstract Educational data mining is the process of applying data mining tools and techniques to
More informationComparison of Data Mining Techniques used for Financial Data Analysis
Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationCost Drivers of a Parametric Cost Estimation Model for Data Mining Projects (DMCOMO)
Cost Drivers of a Parametric Cost Estimation Model for Mining Projects (DMCOMO) Oscar Marbán, Antonio de Amescua, Juan J. Cuadrado, Luis García Universidad Carlos III de Madrid (UC3M) Abstract Mining is
More informationA NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE
A NEW DECISION TREE METHOD FOR DATA MINING IN MEDICINE Kasra Madadipouya 1 1 Department of Computing and Science, Asia Pacific University of Technology & Innovation ABSTRACT Today, enormous amount of data
More informationAdvanced analytics at your hands
2.3 Advanced analytics at your hands Neural Designer is the most powerful predictive analytics software. It uses innovative neural networks techniques to provide data scientists with results in a way previously
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationExtension of Decision Tree Algorithm for Stream Data Mining Using Real Data
Fifth International Workshop on Computational Intelligence & Applications IEEE SMC Hiroshima Chapter, Hiroshima University, Japan, November 10, 11 & 12, 2009 Extension of Decision Tree Algorithm for Stream
More informationKeywords Data mining, Classification Algorithm, Decision tree, J48, Random forest, Random tree, LMT, WEKA 3.7. Fig.1. Data mining techniques.
International Journal of Emerging Research in Management &Technology Research Article October 2015 Comparative Study of Various Decision Tree Classification Algorithm Using WEKA Purva Sewaiwar, Kamal Kant
More informationOverview. Evaluation Connectionist and Statistical Language Processing. Test and Validation Set. Training and Test Set
Overview Evaluation Connectionist and Statistical Language Processing Frank Keller keller@coli.uni-sb.de Computerlinguistik Universität des Saarlandes training set, validation set, test set holdout, stratification
More informationEvaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -
Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining - Hidenao Abe, Miho Ohsaki, Hideto Yokoi, and Takahira Yamaguchi Department of Medical Informatics,
More informationVolume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies
Volume 4, Issue 1, January 2016 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com Spam
More informationHow To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China catch0327@yahoo.com yanxing@gdut.edu.cn
More informationChapter 12 Discovering New Knowledge Data Mining
Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu Chapter Objectives Introduce the student to
More informationThe Prophecy-Prototype of Prediction modeling tool
The Prophecy-Prototype of Prediction modeling tool Ms. Ashwini Dalvi 1, Ms. Dhvni K.Shah 2, Ms. Rujul B.Desai 3, Ms. Shraddha M.Vora 4, Mr. Vaibhav G.Tailor 5 Department of Information Technology, Mumbai
More informationImpact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationData Mining with SQL Server Data Tools
Data Mining with SQL Server Data Tools Data mining tasks include classification (directed/supervised) models as well as (undirected/unsupervised) models of association analysis and clustering. 1 Data Mining
More informationOn the effect of data set size on bias and variance in classification learning
On the effect of data set size on bias and variance in classification learning Abstract Damien Brain Geoffrey I Webb School of Computing and Mathematics Deakin University Geelong Vic 3217 With the advent
More informationIndex Contents Page No. Introduction . Data Mining & Knowledge Discovery
Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.
More informationPrediction of Stock Performance Using Analytical Techniques
136 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 5, NO. 2, MAY 2013 Prediction of Stock Performance Using Analytical Techniques Carol Hargreaves Institute of Systems Science National University
More informationMaschinelles Lernen mit MATLAB
Maschinelles Lernen mit MATLAB Jérémy Huard Applikationsingenieur The MathWorks GmbH 2015 The MathWorks, Inc. 1 Machine Learning is Everywhere Image Recognition Speech Recognition Stock Prediction Medical
More informationData Mining and Visualization
Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford Overview Data mining components Functionality Example application Quality control Visualization Use of 3D Example application Market research
More informationKnowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationPerformance Analysis of Decision Trees
Performance Analysis of Decision Trees Manpreet Singh Department of Information Technology, Guru Nanak Dev Engineering College, Ludhiana, Punjab, India Sonam Sharma CBS Group of Institutions, New Delhi,India
More informationRule based Classification of BSE Stock Data with Data Mining
International Journal of Information Sciences and Application. ISSN 0974-2255 Volume 4, Number 1 (2012), pp. 1-9 International Research Publication House http://www.irphouse.com Rule based Classification
More informationAssessing Data Mining: The State of the Practice
Assessing Data Mining: The State of the Practice 2003 Herbert A. Edelstein Two Crows Corporation 10500 Falls Road Potomac, Maryland 20854 www.twocrows.com (301) 983-3555 Objectives Separate myth from reality
More informationFirst Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms
First Semester Computer Science Students Academic Performances Analysis by Using Data Mining Classification Algorithms Azwa Abdul Aziz, Nor Hafieza IsmailandFadhilah Ahmad Faculty Informatics & Computing
More informationIn this presentation, you will be introduced to data mining and the relationship with meaningful use.
In this presentation, you will be introduced to data mining and the relationship with meaningful use. Data mining refers to the art and science of intelligent data analysis. It is the application of machine
More informationClassification On The Clouds Using MapReduce
Classification On The Clouds Using MapReduce Simão Martins Instituto Superior Técnico Lisbon, Portugal simao.martins@tecnico.ulisboa.pt Cláudia Antunes Instituto Superior Técnico Lisbon, Portugal claudia.antunes@tecnico.ulisboa.pt
More informationINVESTIGATIONS INTO EFFECTIVENESS OF GAUSSIAN AND NEAREST MEAN CLASSIFIERS FOR SPAM DETECTION
INVESTIGATIONS INTO EFFECTIVENESS OF AND CLASSIFIERS FOR SPAM DETECTION Upasna Attri C.S.E. Department, DAV Institute of Engineering and Technology, Jalandhar (India) upasnaa.8@gmail.com Harpreet Kaur
More informationPredicting Critical Problems from Execution Logs of a Large-Scale Software System
Predicting Critical Problems from Execution Logs of a Large-Scale Software System Árpád Beszédes, Lajos Jenő Fülöp and Tibor Gyimóthy Department of Software Engineering, University of Szeged Árpád tér
More informationComparison of K-means and Backpropagation Data Mining Algorithms
Comparison of K-means and Backpropagation Data Mining Algorithms Nitu Mathuriya, Dr. Ashish Bansal Abstract Data mining has got more and more mature as a field of basic research in computer science and
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationClassification of Learners Using Linear Regression
Proceedings of the Federated Conference on Computer Science and Information Systems pp. 717 721 ISBN 978-83-60810-22-4 Classification of Learners Using Linear Regression Marian Cristian Mihăescu Software
More informationIntroduction to Data Mining Techniques
Introduction to Data Mining Techniques Dr. Rajni Jain 1 Introduction The last decade has experienced a revolution in information availability and exchange via the internet. In the same spirit, more and
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationIntroducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
More informationWEKA Explorer User Guide for Version 3-4-3
WEKA Explorer User Guide for Version 3-4-3 Richard Kirkby Eibe Frank November 9, 2004 c 2002, 2004 University of Waikato Contents 1 Launching WEKA 2 2 The WEKA Explorer 2 Section Tabs................................
More informationAn Overview and Evaluation of Decision Tree Methodology
An Overview and Evaluation of Decision Tree Methodology ASA Quality and Productivity Conference Terri Moore Motorola Austin, TX terri.moore@motorola.com Carole Jesse Cargill, Inc. Wayzata, MN carole_jesse@cargill.com
More informationDynamic Data in terms of Data Mining Streams
International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining
More informationUniversité de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr
Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationData Mining and Soft Computing. Francisco Herrera
Francisco Herrera Research Group on Soft Computing and Information Intelligent Systems (SCI 2 S) Dept. of Computer Science and A.I. University of Granada, Spain Email: herrera@decsai.ugr.es http://sci2s.ugr.es
More informationA Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
More informationHorizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661, ISBN: 2278-8727 Volume 6, Issue 5 (Nov. - Dec. 2012), PP 36-41 Horizontal Aggregations in SQL to Prepare Data Sets for Data Mining Analysis
More informationAnalysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News
Analysis of WEKA Data Mining Algorithm REPTree, Simple Cart and RandomTree for Classification of Indian News Sushilkumar Kalmegh Associate Professor, Department of Computer Science, Sant Gadge Baba Amravati
More informationThree Perspectives of Data Mining
Three Perspectives of Data Mining Zhi-Hua Zhou * National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China Abstract This paper reviews three recent books on data mining
More informationProposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality
Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality Tatsuya Minegishi 1, Ayahiko Niimi 2 Graduate chool of ystems Information cience,
More information