Using Data Mining Techniques to Predict Product Quality from Physicochemical Data



Similar documents
6.7 Network analysis Introduction. References - Network analysis. Topological analysis

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

Simple Linear Regression

Green Master based on MapReduce Cluster

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Numerical Methods with MS Excel

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

The simple linear Regression Model

Robust Realtime Face Recognition And Tracking System

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

Average Price Ratios

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

Speeding up k-means Clustering by Bootstrap Averaging

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

APPENDIX III THE ENVELOPE PROPERTY

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Optimal Packetization Interval for VoIP Applications Over IEEE Networks

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Study on prediction of network security situation based on fuzzy neutral network

Settlement Prediction by Spatial-temporal Random Process

Chapter Eight. f : R R

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Developing tourism demand forecasting models using machine learning techniques with trend, seasonal, and cyclic components

Optimizing Software Effort Estimation Models Using Firefly Algorithm

Fault Tree Analysis of Software Reliability Allocation

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

Impact of Interference on the GPRS Multislot Link Level Performance

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Time Series Forecasting by Using Hybrid. Models for Monthly Streamflow Data

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

A COMPARATIVE STUDY BETWEEN POLYCLASS AND MULTICLASS LANGUAGE MODELS

The Digital Signature Scheme MQQ-SIG

On Error Detection with Block Codes

CHAPTER 2. Time Value of Money 6-1

Report 52 Fixed Maturity EUR Industrial Bond Funds

Regression Analysis. 1. Introduction

Software Aging Prediction based on Extreme Learning Machine

Proactive Detection of DDoS Attacks Utilizing k-nn Classifier in an Anti-DDos Framework

10.5 Future Value and Present Value of a General Annuity Due

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

AN ALGORITHM ABOUT PARTNER SELECTION PROBLEM ON CLOUD SERVICE PROVIDER BASED ON GENETIC

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

RUSSIAN ROULETTE AND PARTICLE SPLITTING

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

1. The Time Value of Money

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

Evaluating the Network and Information System Security Based on SVM Model

A particle Swarm Optimization-based Framework for Agile Software Effort Estimation

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

Statistical Intrusion Detector with Instance-Based Learning

Dynamic Provisioning Modeling for Virtualized Multi-tier Applications in Cloud Data Center

of the relationship between time and the value of money.

Learning to Filter Spam A Comparison of a Naive Bayesian and a Memory-Based Approach 1

The impact of service-oriented architecture on the scheduling algorithm in cloud computing

Using Phase Swapping to Solve Load Phase Balancing by ADSCHNN in LV Distribution Network

Suspicious Transaction Detection for Anti-Money Laundering

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM

Integrating Production Scheduling and Maintenance: Practical Implications

A Parallel Transmission Remote Backup System

Efficient Traceback of DoS Attacks using Small Worlds in MANET

Software Reliability Index Reasonable Allocation Based on UML

Low-Cost Side Channel Remote Traffic Analysis Attack in Packet Networks

Bayesian Network Representation

The paper presents Constant Rebalanced Portfolio first introduced by Thomas

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

The Time Value of Money

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

How To Make A Supply Chain System Work

Reinsurance and the distribution of term insurance claims

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

ANALYTICAL MODEL FOR TCP FILE TRANSFERS OVER UMTS. Janne Peisa Ericsson Research Jorvas, Finland. Michael Meyer Ericsson Research, Germany

VIDEO REPLICA PLACEMENT STRATEGY FOR STORAGE CLOUD-BASED CDN

Three Dimensional Interpolation of Video Signals

A Hybrid Data-Model Fusion Approach to Calibrate a Flush Air Data Sensing System

Transcription:

Usg Data Mg Techques to Predct Product Qualty from Physcochemcal Data A. Nachev 1, M. Hoga 1 1 Busess Iformato Systems, Cares Busess School, NUI, Galway, Irelad Abstract - Product qualty certfcato s sometmes expesve ad tme cosumg, partcularly f t requres assessmet made by huma experts. Ths study explores expermetally how data mg techques ca facltate that process. We use a dataset of physcochemcal characterstcs of red ad whte we samples, avalable from laboratory tests, order to buld models that predct we qualty. Four data mg techques are used: multlayer perceptros, cascade-correlato eural etworks, geeral regresso eural etworks, ad support vector maches. We study how hyper-parameters of the models fluece ther predctve abltes ad how reducto of dmesoalty affects ther performace. We also compare the models by the metrcs predcto accuracy, mea absolute devato, ad area over the regresso error characterstcs curve. Keywords: data mg, eural etworks, cascadecorrelato eural etworks, geeral regresso eural etworks, support vector maches. 1 Itroducto Today, wth mprovemet of techologes, dustres become more effcet ad the producto processes become qucker. I may cases, however, the huma expertse s stll essetal for the product qualty assurace process. Wth crease of demad for goods, qualty certfcato becomes a expesve step the producto process. The am of ths study s to explore the potetal of four predctve techques: eural etworks (NN) a.k.a. multlayer perceptros (MLP), cascade-correlato eural etworks (CCNN), geeral regresso eural etwork (GRNN), ad support vector maches (SVM), to facltate the qualty certfcato of a product, based o avalable product characterstcs. Ths would allow automatg the process ad mmzg usage of huma expertse. Here we focus o the we qualty predcto usg data from both physcochemcal laboratory tests ad sesory tests. We s usually charactersed by desty, alcohol or varous acds, whch ca be obtaed by lab tests, whle sesory tests are doe by huma experts. We classfcato s ot a easy task as the relatoshps betwee physcochemcal aalyss ad sesory tests aalyss are complex ad ot well uderstood [12]. Predctg we qualty by data mg techques s stll a early stage, but there are some promsg results the doma. Su et al. [16] used NNs fed wth 15 put varables used to predct sx geographc we orgs. The data cluded 170 samples. Vlassdes et al. [19] used NNs to classfy three sesory attrbutes (e.g. sweetess) of Calfora we, based o grape maturty levels ad chemcal aalyss. Moreo et al. [13] used probablstc eural etworks (PNN) to dscrmate 54 we samples to two red we classes. Yu et al. [20] used spectral measuremets from 147 bottles of rce we to predct 3 categores of we. Fe et al., [8] utlzed least squares support vector maches o physcochemcal data of red we samples. These chemometrcs were obtaed through the use of vsble ad ear frared (Vs/NIR) trasmttace spectroscopy. Beltra et al. [5] utlze SVM addto to, ad comparso wth, radal bass fucto eural etworks (RBFNN) ad lear dscrmat aalyss (LDA), the classfcato of Chlea we. The aalyses are carred out o data derved from we aroma chromatograms of three dfferet Chlea we varetes. Bapa ad Gagopadhyay [4] ad Cortez et al. [6] compared several data mg techques for classfcato of we. I ths paper, we estmate performace of NN, CCNN, GRNN, ad SVM predctg red ad we qualty based o 11 physcochemcal characterstcs ad explore how model hyper-parameters fluece ther ablty to dscrmate betwee qualty classes. The paper s orgazed as follows: Secto 2 provdes a overvew of the multlayer perceptros, cascade-correlato eural etworks, geeral regresso eural etwork, ad support vector maches used buld a predctve models; Secto 3 dscusses the dataset used the study, ts features, preprocessg steps, ad feature selecto; Secto 4 presets ad dscuses the expermetal results; ad Secto 5 gves the coclusos. 2 Data Mg Models We adopt four predctve techques: the most commo NN type - MLP, cascade-correlato eural etworks, geeral regresso eural etworks, ad support vector maches. Ths secto outles brefly each of those. 2.1 Multlayer Perceptros A MLP s a feedforward NN model that maps sets of put data oto a set of approprate output, ether values or class labels. It uses three layers of euros, called odes (see Fgure 1), wth olear actvato fuctos that ca

dstgush o-learly separable data, or separable by a hyperplae. Nodes of two adjacet layers are fully coected by weghted lks represeted by matrces IW, LW, ad bas vectors b. The two actvato fuctos are both sgmods: f H (x) = e2 x 1 e 2 x +1 f 1 L(x) =, (1) 1+ e βx where f H s a hyperbolc taget whch rages from -1 to 1; f L s a log-sgmod fucto, equvalet shape, but rages from 0 to 1. Here x s the weghted sum of the puts. Fdg a optmal sze of the hdde layer s a geeral problem wth all MLP. We used the heurstc: h = s α( + o ) where h s the sze of the hdde layer; s s the umber of trag samples; ad o are the sze of the put ad output layers respectvely; α [5, 10] s a scalg factor, smaller of osy data ad larger for relatvely less osy data. The etwork was traed by Leveberg-Marquardt (LM) backpropagato (BP) algorthm [9]. LM s a secod-order olear optmzato techque that uses a approxmato to the Hessa matrx. It was chose from the varous BP trag algorthms as t tras a moderate sze NN 10 to 100 tmes faster tha the usual gradet descet backpropagato method ad produces better results. IW{,m} b{1} fh(x) LW{m,1} Fgure 1. Archtecture of a MLP eural etwork wth oe hdde layer. 2.2 Cascade-Correlato Neural Networks CCNN are self-growg eural etworks smlar to MLP, but they do't have fxed sze or topology [2]. The CCNN have three layers: put, hdde, ad output, smlarly to MLP. The output layer cossts of a sgle ode f the etwork s used for regresso problems, or cotas several odes for classfcato problems, oe per class label. I cotrast to MPL, the CCNN start trag wthout hdde layer - put odes are fully coected to the output odes wth adjustable weghts. Durg the trag, the etwork adds ew hdde odes. It creates a mult-layer structure called a cascade, because the output from all put ad hdde odes exstg already the etwork, feed ew odes. I the begg,every put s coected to every output euro by a coecto wth a adjustable weght. The etwork adds ew hdde odes oe by oe (Fgure 2) utl the resdual error gets acceptably small or the user terrupts ths process. b{2} fl(x) (2) Fgure 2. Cascade archtecture after addg two hdde odes (adapted from [2]). The vertcal les sum all comg actvato. Boxed coectos are froze, x coectos are traed repeatedly. To stall a ew hdde euro, stead of a sgle caddate, the system uses a pool of traable caddate odes (usually four to eght), each wth a dfferet set of radomly selected tal weghts. All caddates receve the same put sgals ad see the same resdual error for each trag patter, but because they are ot stalled yet ad do ot teract wth oe aother or affect the actve eural etwork durg trag, all of these caddate uts are traed parallel; whe o further progress s beg made trag, the etwork stalls the caddate whose score s the best (mmses the resdual error). The use of ths pool of caddates s beefcal two ways: t greatly reduces the chace that a useless ut wll be permaetly stalled, ad t speeds up the trag because may parts of weght-space ca be explored smultaeously. Whle the caddate weghts are beg traed, oe of the weghts the actve etwork are chaged. Oce a ew hdde ode has bee added to the etwork, ts put-sde weghts (boxed coectos Fgure 2) are froze; the output-sde coectos ( x coectos) cotue to be adjustable. The learg algorthm modfes the weghts attemptg to mmze the resdual error of the etwork. Each ew euro becomes a permaet feature-detector the etwork, avalable for producg outputs or for creatg other, more complex feature detectors. Amog advatages of CCNN ca be metoed selforgazg archtecture, quck learg, applcable to large datasets, obtag good results wth lttle or o adjustmet parameters ad less chace to get trapped local mma, compared to the MLP. They have, however, a sgfcat potetal for overfttg the trag data, whch results a very good accuracy o the trag dataset but ot always good accuracy o ew, usee durg the trag data. 2.3 Geeral Regresso Neural Networks The GRNN are a kd of radal bass fucto (RBF) NN proposed by Specht [15]. They are a powerful regresso tool, whch features smple structure ad mplemetato ad fast trag. A GRNN cossts of four layers: put, hdde, summato, ad output (Fgure 3). The fucto of the put layer s to pass the put values x to the hdde layer.

Fgure 3. GRNN archtecture: feed-forward NN wth put, hdde, summato, ad output layers. The hdde layer cossts of all trag patters X. Whe a ukow patter X s preseted to the etwork, the squared dstace D 2 = (X X ) T (X X ) betwee the X ad each X s calculated ad passed to the kerel fucto. The summato layer has two odes (uts), A ad B, where A computes the summato fucto, whch s umerator of (3), ad B computes the deomator. Y (X) = Y exp( D 2 2σ ) 2 exp( D 2 2σ ) 2, (3) where σ s the wdth of the kerel. The output ode computes A/B, whch s Y. 2.4 Support Vector Maches SVM, orgally troduced by Vapk 1990s [17], provde a ew approach to the problem of patter recogto wth clear coectos to the uderlyg statstcal learg theory. They dffer radcally from comparable approaches such as NN because SVM trag always fds a global mmum cotrast to NN [18]. SVMs are supervsed learg methods used for classfcato ad regresso. Trag data s a set of pots of the form D = {(x, c ) x R p, c { 1,1}} (4) where the c s ether 1 or 1, dcatg the class to whch the pot x belogs. Each data pot x s a p-dmesoal real vector. Durg trag a lear SVM costructs a p 1- dmesoal hyperplae that separates the pots to two classes (Fgure 4). Ay hyperplae ca be represeted by w x b = 0 where w s a ormal vector ad deotes dot product. Amog all possble hyperplaes that mght classfy the data, SVM selects oe wth maxmal dstace (marg) to the earest data pots (support vectors). Fgure 4. Maxmum marg hyperplae for a SVM traed wth samples from two classes. Samples o the marg are support vectors. Whe the classes are ot learly separable (there s o hyperplae that ca splt the two classes), a varat of SVM, called soft-marg SVM, chooses a hyperplae that splts the pots as clealy as possble, whle stll maxmzg the dstace to the earest clealy splt examples. The method troduces slack varables, ξ, whch measure the degree of msclassfcato of the datum x. Soft-marg SVM pealzes msclassfcato errors ad employs a parameter (the soft-marg costat C) to cotrol the cost of msclassfcato. Trag a lear SVM classer solves the costraed optmzato problem (5). m w,b,ξk 1 2 w 2 + C ξ (5) s.t. w x + b 1 ξ I dual form the optmzato problem ca be represeted by: 1 m α α α j y y j x x j α (6) 2 j=1 s.t. 0 α C, α c = 0 The resultg decso fucto weght vector w = k=1 α k y k x k f (x) = w x + b has. Data pots x for whch α > 0 are called support vectors, sce they uquely defe the maxmum marg hyperplae. Maxmzg the marg allows oe to mmze bouds o geeralzato error. If every dot product s replaced by a o-lear kerel fucto, t trasforms the feature space to hgherdmesoal, thus though the classfer s a hyperplae the hgh-dmesoal feature space t may be o-lear the orgal put space. The resultg classfer fts the

maxmum-marg hyperplae the trasformed feature space. Some commo kerels clude: Polyomal kerel K( x, x j ) = (γ x T x j + r) d RBF kerel K( x, x j ) = exp(γ x x j 2 ) Sgmod kerel K( x, x j ) = tah(γ x T x j + r) A o-lear SVM s largely characterzed by the choce of ts kerel, ad SVMs thus lk the problems they are desged for wth a large body of exstg work o kerel based methods. Oce the kerel s fxed, SVM classfers have few user-chose parameters. The best choce of kerel for a gve problem s stll a research ssue. Because the sze of the marg does ot deped o the data dmeso, SVM are robust wth respect to data wth hgh put dmeso. However, SVM are sestve to the presece of outlers, due to the regularzato term for pealzg msclassfcato (whch depeds o the choce of C). The SVM algorthm requres O( 2 ) storage ad O( 3 ) to lear. 3 Dataset ad Preprocessg The data used ths study represet we sample collecto of Vho Verde wes, whte ad red (CVRVV, 2008), whch cossts of two dstct sets made up of 4898 whte ad 1599 red samples. Each stace cossts of 12 physochemcal varables: fxed acdty, volatle acdty, ctrc acd, resdual sugar, chlordes, free sulfur doxde, total sulfur doxde, desty, ph, sulphates, alcohol, ad a qualty ratg. The qualty ratg s based o a sesory taste test carred out by at least three sommelers ad scaled 11 qualty classes from 0 - very bad to 10 - very excellet. A summary the datasets s preseted Table 1. Table 1: The physcochemcal data statstcs per we type Attrbutes Red We Whte we M Max Mea M Max Mea Fxed acdty 4.6 15.9 8.3 3.8 14.2 6.9 Volatle acdty 0.1 1.6 0.5 0.1 1.1 0.3 Ctrc acd 0.0 1.0 0.3 0.0 1.7 0.3 Resdual sugar 0.9 15.5 2.5 0.6 65.8 6.4 Chlordes 0.01 0.61 0.08 0.01 0.35 0.05 Free sulpher 1 72 14 2 289 35 doxde Total sulfur 6 289 46 9 440 138 doxde Desty 0.990 1.004 0.996 0.987 1.039 0.994 ph 2.7 4.0 3.3 2.7 3.8 3.1 Sulphates 0.3 2.0 0.7 0.2 1.1 0.5 Alcohol 8.4 14.9 10.4 8.0 14.2 10.4 Usg the data ther orgal format for buldg models s approprate due to some defceces. A specfc problem s the large ampltude of the varable values due to the dfferet ature ad dfferet uts of measuremets of those values, e.g. sulfur doxde (1 72) vs. sulfates (0.3 2). Such a cosstecy could affect the predctve abltes of the models by makg some varables more fluetal tha others. Moreover, some models requre puts wth the ut hypercube,.e. betwee 0 ad 1. A atural approach of meetg that requremet could be a lear trasformato that dvdes all put values by the dataset maxmum, however mostly of the put values wll fall very close to zero, ad the model would perform poorly. A better approach s to process each data varable (data colum) separately. We dd so by usg the trasformato x ew = x old m, (7) max m whch scales dow the varables wth the ut hypercube. Aother problem wth utlzg the orgal data wthout preprocessg s that usg all features of a dataset does ot always lead to best or eve satsfactory results. Ths s due to the fact that too much formato used for both tratg ad testg ca lead to overfttg or overtrag. We explored how presece or absece of varables preseted to the model for trag ad testg affects the performace. Varable selecto, or reducto of dmesoalty, s a techque commoly used mache learg for buldg robust learg models. Removg most rrelevat ad redudat features from the data usually helps to allevate the effect of the curse of dmesoalty ad to ehace the geeralzato capablty of the model, yet to speed up the learg process ad to mprove the model terpretablty. The varable selecto also helps to acqure better uderstadg about data ad how they are related wth each other. Dmesoalty reducto s cosdered as a applcato-specfc problem, whch s ot backed by a uversal theory. The exhaustve search approach that cosders all possble subsets s the best strategy applcable for datasets wth small cardalty, but mpractcal for large umber of features, as our case s. There are two dstct groupgs of varable selecto algorthms, specfcally wrapper methods ad flter methods. The wrapper methods employ the feature subset selecto algorthm uso wth a ducto algorthm. The selecto algorthm proceeds to uearth a favorable subset of data whlst usg ths ducto algorthm to evaluate proposed subsets. The flter methods use a preprocessg step ad autoomously select varables depedet of the ducto algorthm. There are a umber of algorthms that fall uder the umbrella of the flter approach, such as the relef algorthm, whch assgs a weghtg of relevace to each feature, that s, the relevace of the selected varable to the target output; ad the decso tree algorthm, whch s used to select feature subsets for the earest eghbor algorthm [11]. Rueda et al. [14] hghlght a partcular stregth possessed by wrapper algorthms. The authors state that f varables are hghly correlated wth the respose, the flter algorthm would typcally clude them, eve f they dmshed the overall algorthm performace. Whle the wrapper approach, the ducto algorthm may dscover these dmshg effects, ad exclude them [3].

4 Emprcal Results Usg the data descrbed above, we bult ad tested a umber of predctve models, based o the four techques - SVM, CCNN, GRNN, ad NN. I order to estmate the models performace we used the followg metrcs: Predcto accuracy ACC t at certa error tolerace values t = 0.25, 0.5, 1, ad 2. Mea Absolute Devato (MAD), whch s a robust performace measure of the model varablty [1] N y y, (8) MAD = 1 N where y ad ŷ are the class label ad predcted value, respectvely. Area over the regresso error characterstc curve. The regresso error characterstc (REC) curve plots the error toleraces alog the horzotal axs versus the predcto accuracy o the vertcal. The area over the REC curve (AOC) s a scalar value that estmates the overall model performace regardless of the error tolerace values appled to each model stace. The lower the AOC s, the better t performs. We tested the models wth a umber of hyper-parameters order to fd ther optmal values ad esure maxmal performace. Avodg bas trag ad testg, we appled 5-fold cross-valdato. The dataset was dvded o fve subsets, each of whch s 20%. The overall performace estmato metrcs were calculated usg each of those 20% for testg after trag the model o the remag 80% of data. I order to estmate the fluece of reducto of data dmesoalty the model performace, we appled wrapper ad flter attrbute evaluator methods outled above. These methods combed dfferg search techques, whch resulted combatos of proposed varable subsets. Models were tested wth dfferet combatos of varables ad the results were compared by the aforemetoed metrcs. The results obtaed were dfferet for the red we ad the whte we datasets. I the red we case, the best results were obtaed by the ch-squared attrbute evaluato techque [10], whch calculates ch-squared worth of each attrbute wth respect to the class. Results are summarzed Table 2 The expermets showed that ch-squared worth cut-off pot betwee 169.86 ad 145.40 performs best, whch resulted four red we attrbutes used trag ad testg the models, amely alcohol, volatle acdty, sulphates, ad ctrc acd. We foud that the optmal SVM parameters used to produce a mmal mea squared error (MSE) of the model are: c=1.398; ε =0.746; kerel=polyomal; d=1; γ =0.572; ad r=0.530. Results obtaed from the error tolerace study of the models are compared by REC curves Fgure 5. It s the model wth the least area over the curve (AOC) that s most accurate, wth the pot closest to the 100% accurate ad zero threshold tersecto, dcatg the best threshold level of the model. Fgure 5 s quattatvely summarsed the Table 3. Table 2 Ch-squared attrbute evaluato for red we. Ch-squared Percetage Attrbute worth mportace Alcohol 497.7464 29.61 Volatle acdty 354.4793 21.09 Sulphates 252.0535 15.00 Ctrc acd 169.8607 10.11 Total sulphur doxde 145.3958 8.65 Desty 130.73 7.78 Chlordes 82.6207 4.92 Fxed acdty 48.0288 2.86 ph 0 0.00 Resdual sugar 0 0.00 Free sulphur doxde 0 0.00 Accuracy 100 90 80 70 60 50 40 30 20 10 Red we SVM GRNN CCNN NN 0 0 0.5 1 1.5 2 Error tolerace Fgure 5. REC curves of red we test set. SVM - tck sold le; CCNN - th sold le; GRNN - dash-dot le; NN - dot le. Table 3 Performace of red we qualty predcto models. Estmato metrcs clude: accuracy at certa error toleraces (ACCt), mea absolute devato (MAD), ad area above the REC curve (AOC). ACC 0.25 ACC 0.5 ACC 1 ACC 1.5 ACC 2 MAD AOC ANN 0.261 0.531 0.850 0.958 0.981 0.592 0.662 CCNN 0.279 0.568 0.884 0.966 0.983 0.548 0.630 GRNN 0.27 0.549 0.867 0.962 0.979 0.577 0.644 SVM 0.381 0.601 0.888 0.969 0.994 0.496 0.506

It should be oted that accordg to the metrcs MAD ad AOC, SVM outperform all other models. They show clear advatage the low error tolerace rages where drect hts predctos s mportat, or oe-away hts, where error tolerace less tha 0.5 s acceptable. Whe error tolerace creases ad requremet for correct classfcatos relaxes, CCNN etworks become equally good to SVM. Last performace s the classc feed-forward NN ad the secod last s GRNN, whch s betwee NN ad CCNN. Smlarly, we explored dmesoalty reducto the whte we case. Results showed that best techque for rakg attrbutes s symmetrcal ucertaty rakg, whch s oe of the most effectve etropy-based feature selecto approaches. Expermetally we foud that alcohol cotet whte we bears most mportace (26.47%); desty raks secod mportace (19.19%); wth chlordes followg ext (14.35%). Total sulfur doxde, ctrc acd, free sulfur doxde ad volatle acdty complete the model, all regsterg close mportace percetage betwee 9.9% ad 10.4%. Results are summarzed Table 4. Table 4 Symmetrcal ucertaty attrbute evaluato for whte we. Attrbute Symmetrcal Ucertaty Percetage mportace alcohol 0.08998 26.46 desty 0.06524 19.18 chlordes 0.04878 14.34 total sulphur doxde 0.03513 10.33 ctrc acd 0.03468 10.20 free sulphur doxde 0.03376 9.92 volatle acdty 0.03241 9.53 Accuracy 100 90 80 70 60 50 40 30 20 10 Whte we SVM GRNN CCNN NN 0 0 0.5 1 1.5 2 Error tolerace Fgure 6. REC curves of red we test set. SVM - tck sold le; CCNN - th sold le; GRNN - dash-dot le; NN - dot le. Fdgs also show that wth whte we, SVM performs best wth c=2.438; ε =0.684; kerel=polyomal; d=1; γ =1.266; ad r=1.522. Fgure 6 graphcally compares the models performace the terms of REC ad Table 5 summarzes the estmato metrcs. Smlarly to the red we case, the whte we results show that SVM outperforms the three eural etworks, wth eve hgher accuracy the low error tolerace values, but t also outperforms the other models hgher error tolerace values (betwee 0.5 ad 1.5). The three eural etwork models show smlar performace wth lttle advatage of CCNN over GRNN ad the classc NN. I a relatvely large error tolerace (above 1), CCNN ad GRNN perform smlarly ad slghtly better tha NN. Table 5 Performace of whte we qualty predcto models. Estmato metrcs clude: accuracy at certa error toleraces (ACCt), mea absolute devato (MAD), ad area above the REC curve (AOC). ACC 0.25 ACC 0.5 ACC 1 ACC 1.5 ACC 2 MAD AOC ANN 0.261 0.531 0.850 0.968 0.988 0.594 0.658 CCNN 0.339 0.576 0.868 9.969 0.988 0.514 0.581 GRNN 0.290 0.549 0.867 0.962 0.988 0.589 0.630 SVM 0.486 0.661 0.902 0.971 0.988 0.477 0.566 Fally, t ca be summarzed that SVM could be a better alteratve of predcto models based o eural etworks for applcato areas, lke the oe explored here. At the same tme, certa eural etwork types, such as CCNN ad GRNN ca be cosdered as good caddates for predctg models, both outperformg the classc eural etwork. 5 Coclusos Recetly, we dustry expads ts marketplace, whch ecourages adopto of advaced techologes the producto process. The qualty certfcato s a mportat step that. Tradtoally, t s based o sesory tests carred out by huma experts. Ths, however, s ot as effcet as eeded, because the procedure s tme cosumg ad expesve. Data mg may help the qualty certfcato by processg physcochemcal laboratory test data ad buldg models that predct product qualty classes. Varous modelg techques ca be appled to solve the task ad each of them shows specfc performace characterstcs. The goal of ths study s to explore how model hyperparameters of the classc backpropagato eural etwork, cascade-correlato eural etwork, geeral regresso eural etwork, ad support vector mache, affect ther predctve abltes solvg that task. We used a exstg data set of 1599 red we samples, ad 4898 whte we samples, each of whch cosstg of 11 physcochemcal characterstcs. I order to quatfy the model performace, we used metrcs, such as predcto accuracy, mea absolute devato, ad area over the regresso error characterstcs curve. Our fdgs show that support vector mache wth polyomal kerel outperforms the three eural etwork

models all the metrcs. The SVM advatage ca clearly be see wth small values of error tolerace, that s where predcted qualty s requred to be very close to the real oe. From aother had, the CCNN ad GRNN show smlar performace wth lttle advatage of the CCNN over GRNN. Last rakg s the classc NN, whch despte ts popularty as classfcato ad regresso tool, s ot the best choce ths applcato doma. We also tested how varous techques for reducto of dmesoalty fluece the models performace. Emprcally we foud that best varable set selecto techques are ch-squared attrbute evaluato ad symmetrcal ucertaty rakg for the red ad whte we, respectvely. 6 Refereces [1] J. B ad K. P. Beett. Regresso error characterstc curves. I Proceedgs of the 20th Iteratoal Coferece o Mache Learg, 2003. [2] Fahlma, S. ad Lebere C. The Cascade-Correlato Learg Archtecture D. S. Touretzky (ed.), Advaces Neural Iformato Processg Systems 2, Morga Kaufma, 1990. [3] Guetle, M., Frak, E., Hall, M., Karwath, A. Large Scale Attrbute Selecto Usg Wrappers ; I Proc. IEEE Symposum o CIDM, pp.332-339, 2009. [4] Bapa, S. ad Gagopadhyay, A. A Wavelet-Based Approach to Preserve Prvacy for Classfcato Mg. Decso Sceces, 37, 623-642, 2006. [5] Beltra, N. H., Duarte-Mermoud, M. A., Soto Vceco, V. A., Salah, S. A. & Bustos, M. A. Chlea We Classfcato Usg Volatle Orgac Compouds Data Obtaed Wth a Fast GC Aalyzer. IEEE Trasactos o Istrumetato ad Measuremet, 57, 2421-2436, 2008. [6] Cortez, P., Cerdera, A., Almeda, F., Matos, T. & Res, J. Modelg we prefereces by data mg from physcochemcal propertes. Decso Support Systems, 47, 547-553, 2009. [7] CVRVV. Portuguese We - Vho Verde. Comssa õ de Vtcultura da Rega õ dos Vhos Verdes (CVRVV), http://www.vhoverde.pt, July 2008. [8] Fe, L., L, W. & Yog, H. Applcato of least squares support vector maches for dscrmato of red we usg vsble ad ear frared spectroscopy. Itellget System ad Kowledge Egeerg, ISKE' 08, 2008. [9] Haga, M. T., ad Mehaj, M. B. Trag feedforward etworks wth the Marquardt algorthm. IEEE Trasactos o Neural Networks, 5, 989-993, 1994. [10] Hall, M., Frak, E., Holmes, G., Fahrger, B., Reutema, P. & Wtte, I. H. The WEKA Data Mg Software: A Update. SIGKDD Exploratos, 11, 2009. [11] Kohav, R. & Joh, G. H. 1997. Wrappers for feature subset selecto. Artfcal Itellgece, 97, 273-324, 1997. [12] Leg, A., Rudtskaya, A., Luvova, L., Vlasov, Y.,Natale, C., ad D'Amco, A. Evaluato of Itala we by the electroc togue: recogto, quattatve aalyss, ad correlato wth huma sesory percepto. Aalytca Chmca Acta, 33-34, 2003. [13] Moreo, I., D. Gozalez-Weller, V. Guterrez, M. Maro, A. Camea, A. Gozalez, ad A. Hardsso. Dfferetato of two Caary DO red wes accordg to ther metal cotet from ductvely coupled plasma optcal emsso spectrometry ad graphte furace atomc absorpto spectrometry by usg Probablstc Neural Networks. Talata, 72:263 268, 2007. [14] Rueda, I. E. A., Arcegas, F. A. & Embrechts, M. J. SVM sestvty aalyss: a applcato to currecy crses aftermaths. IEEE Trasactos o Systems, Ma ad Cyberetcs, 34, 387-398, 2004. [15] Specht, D. Ehacemet to probablstc eural etworks. I: Proceedgs of the Iteratoal Jot Coferece o Neural Networks, vol.1, pp. 761-768, 1991. [16] Su, L., K. Dazer, ad G. Thel. Classfcato of we samples by meas of artfcal eural etworks ad dscrmato aalytcal methods. Freseus Joural of Aalytcal Chemstry, 359:143 149, 1997. [17] Vapk, V., The Nature of Statstcal Learg Theory. Sprger, New York, 1995. [18] Vapk, V., Kotz, S., Estmato of Depedeces Based o Emprcal Data, Sprger, New York, 2006. [19] Vlassdes, S., J. Ferrer, ad D. Block. Usg Hstorcal Data for Boprocess Optmzato: Modelg We Characterstcs Usg Artfcal Neural Networks ad Archved Process Iformato. Botechology ad Boegeerg, 73(1), 2001. [20] Yu, H. L, H. Xu, Y. Yg, B. L, ad X. Pa. Predcto of Eologcal Parameters ad Dscrmato of Rce We Age Usg Least-Squares Support Vector Maches ad Near Ifrared Spectroscopy. Agrcultural ad Food Chemstry, 56:307 313, 2008.