CLASSIFYING FEATURE DESCRIPTION FOR SOFTWARE DEFECT PREDICTION



Similar documents
Cartelisation of Oligopoly

Multi-class kernel logistic regression: a fixed-size implementation

Modeling and Tracking the Driving Environment with a Particle Based Occupancy Grid

Cluster Analysis. Cluster Analysis

Forecasting the Direction and Strength of Stock Market Movement

Use of Multi-attribute Utility Functions in Evaluating Security Systems

Face Recognition in the Scrambled Domain via Salience-Aware Ensembles of Many Kernels

High Performance Latent Dirichlet Allocation for Text Mining

The Application of Qubit Neural Networks for Time Series Forecasting with Automatic Phase Adjustment Mechanism

Modern Problem Solving Techniques in Engineering with POLYMATH, Excel and MATLAB. Introduction

DECOMPOSITION ALGORITHM FOR OPTIMAL SECURITY-CONSTRAINED POWER SCHEDULING

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

A Binary Quantum-behaved Particle Swarm Optimization Algorithm with Cooperative Approach

24. Impact of Piracy on Innovation at Software Firms and Implications for Piracy Policy

DEGREES OF EQUIVALENCE IN A KEY COMPARISON 1 Thang H. L., Nguyen D. D. Vietnam Metrology Institute, Address: 8 Hoang Quoc Viet, Hanoi, Vietnam

Peer-to-peer systems have attracted considerable attention

Figure 1. Inventory Level vs. Time - EOQ Problem

Network Life Time Prolonged Routing by Distributed Load Balancing

Altruism, Foreign Aid and Humanitarian Military Intervention

When can bundling help adoption of network technologies or services?

Optimal Adaptive Voice Smoother with Lagrangian Multiplier Method for VoIP Service

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

Trust Network and Trust Community Clustering based on Shortest Path Analysis for E-commerce


Single and multiple stage classifiers implementing logistic discrimination

An Efficient Recovery Algorithm for Coverage Hole in WSNs

A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

The Design of Efficiently-Encodable Rate-Compatible LDPC Codes

On the Optimal Marginal Rate of Income Tax

Cyber-Security Via Computing With Words

Performance Analysis and Coding Strategy of ECOC SVMs

Exact GP Schema Theory for Headless Chicken Crossover and Subtree Mutation

What is Candidate Sampling

Series Solutions of ODEs 2 the Frobenius method. The basic idea of the Frobenius method is to look for solutions of the form 3

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Computer Administering of the Psychological Investigations: Set-Relational Representation

1 Example 1: Axis-aligned rectangles

SIMPLE LINEAR CORRELATION

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Gender Classification for Real-Time Audience Analysis System

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

L10: Linear discriminants analysis

APPLICATION OF BINARY DIVISION ALGORITHM FOR IMAGE ANALYSIS AND CHANGE DETECTION TO IDENTIFY THE HOTSPOTS IN MODIS IMAGES

CONSIDER a connected network of n nodes that all wish

Stock Profit Patterns

Laddered Multilevel DC/AC Inverters used in Solar Panel Energy Systems

A novel Method for Data Mining and Classification based on

FCC Form 471 Do not write in this area. Approval by OMB

North-South Trade-Related Technology Diffusion: Virtuous Growth Cycles in Latin America

Searching for Interacting Features for Spam Filtering

Abstract. 260 Business Intelligence Journal July IDENTIFICATION OF DEMAND THROUGH STATISTICAL DISTRIBUTION MODELING FOR IMPROVED DEMAND FORECASTING

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

On the computation of the capital multiplier in the Fortis Credit Economic Capital model

Decomposition Methods for Large Scale LP Decoding

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

SURVEYING THE INFLUENCE OF CUSTOMER RELATIONSHIP MANAGEMENT ON ORGANIZATIONAL PRODUCTIVITY

Behavior Coordination in E-commerce Supply Chains

BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688,

THE LOAD PLANNING PROBLEM FOR LESS-THAN-TRUCKLOAD MOTOR CARRIERS AND A SOLUTION APPROACH. Professor Naoto Katayama* and Professor Shigeru Yurimoto*

PROBABILISTIC DECISION ANALYSIS FOR SEISMIC REHABILITATION OF A REGIONAL BUILDING SYSTEM

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

The OC Curve of Attribute Acceptance Plans

Multi-settlement Systems for Electricity Markets: Zonal Aggregation under Network Uncertainty and Market Power 1


A Statistical Perspective on Data Mining

A GRID BASED VIRTUAL REACTOR: PARALLEL PERFORMANCE AND ADAPTIVE LOAD BALANCING

SVM Tutorial: Classification, Regression, and Ranking

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR

An Alternative Way to Measure Private Equity Performance

AUTHENTICATION OF OTTOMAN ART CALLIGRAPHERS

Economy-based Content Replication for Peering Content Delivery Networks

Calculation of Sampling Weights

A General and Practical Datacenter Selection Framework for Cloud Services

Optimization Model of Reliable Data Storage in Cloud Environment Using Genetic Algorithm

A Novel Approach for Multiple Moving Target Localization Using Dual-Frequency Radars and Time-Frequency Distributions

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Investigation of Modified Bee Colony Algorithm with Particle and Chaos Theory

Implementation of Boolean Functions through Multiplexers with the Help of Shannon Expansion Theorem

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Loop Parallelization

1. Introduction. 2. Derivation of a time-discrete algorithm for a PID controller

1 De nitions and Censoring

Transcription:

Proeengs of e 20 Internatonal Conferene on Wavelet Analyss an Pattern Reognton, Guln, 0-3 July, 20 CLASSIFYING FEAURE DESCRIPION FOR SOFWARE DEFEC PREDICION LING-FENG ZHANG, ZHAO-WEI SHANG College of Computer Sene, Chongqng Unversty, Chongqng 400030, Chna E-MAIL: zhanglngfeng@qu.eu.n, szw@qu.eu.n Abstrat: o overome e lmtaton of numer feature esrpton of software moules n Software efet preton, we propose a novel moule esrpton tehnology, whh employs e lassfyng feature, raer an numeral feature to esrbe e software moule. Frstly, we onstrut nepenent lassfer on eah software metr. hen e lassfyng results n eah feature are use to represent every moule. We apply two fferent feature lassfer algorms (base on mean rteron an mnmum error rate rteron, respetvely) to obtan e lassfyng feature esrpton of software moules. By usng e propose esrpton tehnology, e srmnaton of eah metr s enlarge stntly. Also, lassfyng feature esrpton s smpler ompare to numer esrpton, whh woul aelerate e spee of preton moel learnng an reue e storage spae of massve ata sets. Experment results on four NASA ata sets (CM, KC, KC2 an PC) emonstrate e effetveness of lassfyng feature esrpton, an our algorms an sgnfantly mprove e performane of software efet preton. Keywors: Feature lassfer esrpton; bnary lassfaton; software efet preton. Introuton As software systems grow n sze an omplexty, t beomes nreasngly ffult to mantan e relablty of software prouts. Usually, software efet s e maor fator nfluenng software relablty. he maorty of a system s faults, over 80%, exst n about 20% of moules whh s known as e 80:20 rule []. hus, e possblty of estmatng e fault moules s extremely mportant for mnmzng ost an mprovng e effetveness of software testng proess. he early preton of fault-proneness of e moules an also allow software evelopers to alloate e lmte resoures on ose efet-prone moules suh at hgh relablty software an be proue on tme an wn buget [2]. Software fault-proneness s estmate base on software metrs, whh prove quanttatve esrptons of software moule. A number of stues prove empral evene at orrelaton exsts between some software metrs an fault-proneness [3]. By usng ose metrs feature, software efet preton s usually vewe as a bnary lassfaton task, whh lassfes software moules nto fault-prone (fp) an non-fault-prone (nfp). Many mahne learnng an statstal tehnques have been apple to onstrut preton moels base on e measurement of stat oe attrbutes [4][5]. For example, Dsrmnant Analyss, Logst Regresson, Regresson rees, Nearest Neghbor (NN), Ranom Forest, Bayes, Artfal Neural Networks an Support Vetor Mahnes (SVM) an so on. he ommon pont of prevous stues s at eah moule s represente by e numer software metrs retly. he attrbute types of eah metr are real, ategoral an ntegral. We all t numer feature esrpton. hs s also e ommon esrpton tehnology use n pattern reognton an mahne learnng. Nevereless, ue to e omplex relatonshps between software metrs an fault-proneness, s kn of representaton s a lmtaton to e lassfaton effetveness of eah software metr. Inee, f we treat eah software metr nvually, e values of metr lak of srmnaton. ake MaCabe s EV(g) for example. MaCabe s EV(g) s a frequently-use software metr. Fgure shows e value strbuton of e metr n CM ata set. Fgure. Dstrbuton of MaCabe s EV(g) metr n CM ata set. he numer values of e metr feature are ve nto 0 ntervals rangng from to 30. he vertal axs shows e proporton of values at fall wn s nterval of eah lass. 978--4577-0282-2//$26.00 20 IEEE 38

Proeengs of e 20 Internatonal Conferene on Wavelet Analyss an Pattern Reognton, Guln, 0-3 July, 20 From Fgure we an see at e values of e metr are overlappng n eah nterval. Also, e strbuton of e two lasses are roughly e same. hat s, uner numer esrpton, hs metr ontans lttle lassfaton nformaton. he same phenomenon also ours n oer software metrs. o nrease e lassfaton effetveness of eah feature, we propose a novel feature esrpton tehnology whh s name lassfyng feature esrpton. Frstly, we onstrut nepenent lassfer on eah software metr. hen e lassfyng results, alle lassfyng feature, n eah feature are use to represent e software moules. In s paper, we obtan e lassfyng feature esrpton of software moules by two fferent feature lassfer algorms base on mean rteron an mnmum error rate rteron, respetvely. By usng lassfyng feature raer an e numer feature, we enoy e followng avantages: () Classfers on eah feature expen e lassfaton effetveness of eah software metr, an may obtan hgher lassfaton auray. Classfyng feature esrpton oesn t hange e feature menson of eah moule. As a onsequene, stanar mahne learnng preton tehnques, orgnally esgne for numer feature, are also avalable n lassfyng feature spae. (2) Classfer feature s smpler, whh oul aelerate e spee of moel learnng on e software efet ata sets. Software efet preton s a stanar bnary lassfaton problem. hus, e lassfaton results of eah feature lassfer are also efne by bnary values, whh makes oer operatons more onvenent. (3) Compare w tratonal numer feature esrpton, lassfyng feature esrpton oupes less storage spae. Bnary lassfyng feature esrpton of software moule oul be vewe as a sparse representaton of e numer feature. hs makes storage of massve ata sets avalable. he remaner of s paper s organze as follows. Seton 2 ntroues e learnng moel base on numer software metrs n preton of software efet. he propose lassfyng feature esrpton s esrbe n seton 3. he feature lassfyng algorms are susse n seton 4. Seton 5 follows w e experments, n whh e performane of lassfyng feature esrpton s teste n etal. Fnally, onlusons are presente n Seton 6. 2. Software efet preton learnng moel A number of stues prove empral evene at orrelaton exsts between some software metrs an fault-proneness. hus, we an maematally esrbe e software efet preton moel as a bnary lassfaton task. Let us assume e tranng set tr N + S = { X, Y} = {( x, y )}, w N, x = an y {0,}. Eah nstane x s represente by { x, x2,, x},where x s e feature representaton for x n feature. e -mensonal feature spae x oul be vewe as a pont n. y enotes e lass x, fault-prone or non-fault-prone. F = { x, x, x } label assoate w he attrbute types of eah feature 2 n oul be real, ategoral an ntegral. ratonally, Preton moels are onstrute on e feature spae retly. For a test moule x = ( x, x,, x ), we obtan e preton results y (fault-prone or non-fault-prone) from e moel etermne by e tranng set. Fgure 2(a) shows e tratonal learnng proess of e moel. he suess of e preton moel learnng reles manly on e representaton tehnology e moule esrbe, as well as on e preton moel operatng n ose tranng set. Varous aspets of preton moel have been stue base on mahne learnng strateges. However, e same mportant representaton tehnology s mostly gnore by e exstng lteratures. In fat, a sutable software moule esrpton ats as e bass of establshng suessful preton moel. o nrease e lassfaton effetveness of eah metr, we propose a novel moule representaton tehnology alle lassfyng feature, whh s obtane by e feature lassfers onstrute on eah software metr. he mplement proess of software efet preton moel base on lassfyng feature s shown n Fgure 2(b). tr S = { X, Y} hx () tr C y x x (a) (b) C {, } S = X Y y Fgure 2. Software efet preton learnng moel 39

Proeengs of e 20 Internatonal Conferene on Wavelet Analyss an Pattern Reognton, Guln, 0-3 July, 20 Usng e feature lassfers, bo e tranng moule set an test moule are esrbe by lassfyng feature before bulng a preton moel. 3. Classfyng feature esrpton Let S tr { C, } {(, )} N C = X Y = x y enote e lassfyng = + feature tranng set, w N, x an y {0,}. Eah nstane x = [ x, x 2,, x ] s a pont n e -mensonal bnary lassfyng feature spae. y enotes e lass label assoate w x, orresponng to e label of x. he attrbute types of eah lassfyng feature CF = { x, x2, xn} are hosen wn 0 an. Let hx ( ) = { h( x), h2( x), h ( x)} enote e feature lassfers on eah feature. An e lassfaton results of hx ( ) are equal to e lass labels of orgnal tranng set, whh oul be esrbe as: hx ( ): x [0,] () In oer wor, hx ( ) represents e mappng from e numer feature spae spae : Φ: to e Classfyng feature h( x) (2) Aorng to e above efnton, for eah nstane of tranng set, we have a smpler esrpton, where all e lassfyng features are represente by 0 an. Fgure 3 shows e mplementaton proess of lassfyng feature esrpton. he key ea s at lassfers n eah feature an help enlarge e srmnaton of eah metr between e fferent lasses. Fgure 3. Classfyng feature esrpton Fgure 4 shows an eal lassfaton example of numer feature esrpton an lassfyng feature esrpton. In s example, e two lasses are represente by stars an amons. Eah moule has two metr features x an y, whh are normalze between 0 an. Fgure 4 (a) shows e strbuton of numer feature n e 2D feature spae. he two lasses are lnearly separable n e 2D feature spae. Now, we onstrut nepenent lassfer on eah feature. A smple reshol lassfer s hosen n s example. Let s set 0.5 as e reshol of e two features. If e value of feature s bgger an e reshol, e lassfaton result s. An f e value s smaller an e reshol, e lassfaton result s 0. hen, we an obtan e lassfyng feature esrpton of eah nstane by e bnary lassfaton results. Fgure4 (b) shows e strbuton of lassfyng feature of e two lasses. (a) Fgure 4. Ieal lassfaton problem w numer feature an lassfyng feature In s example, we apply a feature reshol lassfer to obtan e bnary lassfyng feature representaton of eah metr. By applyng e lassfyng feature esrpton, moules n two lasses are represente as (0,) an (,0). From Fgure 4 we an see at lassfyng feature representaton mproves e average stane an margn between two lasses sgnfantly. Also, e smple esrpton of 0 an reues e storage spae of e ata an makes easer for oer operaton. 4. Feature lassfer algorm In bnary lassfaton task, e lassfaton results of eah feature lassfer are hosen for 0 an. All e lassfyng features are represente 0 an. he am of feature lassfers s expan e fferene between e two lasses usng smple lassfy rule. In s paper, eah feature lassfer s efne by a measure on e value of x, whh oul be smple as a reshol lassfer. he lassfer hx ( ) s efne as follows: f x > reshol h( x) (3) 0 else Let = { t, t2, t } enote e reshol set of e features etermne by e tranng set. he lassfaton results of eah feature lassfer are use as e novel feature of eah moule. In s way, all e lassfyng features are represente by 0 an. he lassfyng feature of eah moule an be efne as: (b) 40

Proeengs of e 20 Internatonal Conferene on Wavelet Analyss an Pattern Reognton, Guln, 0-3 July, 20 f x x (4) 0 else From e reshols obtane from e tranng set, we an get e feature esrpton of e numer testng sample. ake x = ( x, x,, x ) for example, e lassfyng feature esrpton x s also obtane by: f x x (5) 0 else Learnng a goo reshol plays a rual role n eah feature lassfer. In s paper, e optmal reshol of eah attrbute s etermne by two fferent tehnologes, e mean rteron an e mnmum error rate rteron. 4.. Mean rteron hs rteron assumes at e value eah feature n e two lasses obey e unform strbuton. hus e means of features an be use to represent e values of all e features. hen, we hoose e mpont of e means of eah feature of e two lasses as a lassfaton reshol. + + + Let X = { x, =, n } an X = { x, =, n } enote e fault-prone an non-fault-prone subsets, respetvely. he means of eah feature n two lasses oul be alulate by: n + + + m = x +, n + m = x (6) n n he reshol of mean rteron s efne as: + m + m t = (7) 2 he pseuo-oe of e feature lassfer algorm base on mean rteron s lste n Fgure 5. Feature Classfer Algorm Input: X /*tranng set*/ Y /*lass label*/ Varables: x /*e feature of e y /*e lass label of e Output: nstane*/ nstane */ /*reshol*/ X /*lassfyng feature tranng set*/ BEGIN. for to o 2. Calulate m + an 3. + t = mean( m, m ) m. 4. x 0 5. en for END f x else Fgure 5. he pseuo oe of feature lassfer algorm base on mean rteron 4.2. Mnmum error rate rteron hs rteron s base on a proess of omplete searh, whh ensures e mnmum of nstanes n e tranng set are mslassfe. he reshol t of feature s hosen from all e feasble nterval values. Frstly, e values of F are sorte for small to large, represente as * * * * F {, = x x xn}. hen, we assume e reshol t of e eah feature to be eah nterval value of e sorte featuresreshol, an alulate a error rate Error. he reshol whh leas to e mnmum error rate s hosen for e feature, whh s esrbe as t = reshol (8), agr mn ( error( ) he pseuo-oe of e feature lassfer algorm base on mnmum error rate rteron s lste n Fgure 6. Feature Classfer Algorm 2 Input: X /*tranng set*/ Y /*lass label*/ Varables: x /*e feature of e y /*e lass label of e Output: nstane*/ nstane */ /*reshol*/ X /*lassfyng feature tranng set*/ BEGIN. for to o 2. * * * * { 2 } F = x, x x = sort( F, as). n 3. for to n o 4. f = 5. * reshol = x ε 6. Else f = n 7. * reshol = xn + ε 8. Else 9. * * reshol = 0.5*( x + x + ) 4

Proeengs of e 20 Internatonal Conferene on Wavelet Analyss an Pattern Reognton, Guln, 0-3 July, 20 0. enf. Error = sum _ num( x < reshol + x > reshol)/ n 2. Error 2 = sum_ num( x > reshol + x < reshol)/ n 3. Error = mn( Error, Error2) 4. enfor 5. t = reshol, agr mn ( error) 6. x 0 7. enfor END f x else + + Fgure 6. he pseuo oe of feature lassfer algorm base on mean rteron Above all, we have onstrute lassfers on e features an obtane e lassfyng feature esrpton of e software moules by e two feature lassfer algorms. Fgure 7 shows e strbuton of lassfyng feature of MaCabe s EV(g) n CMata set. (a) Fgure 7. Dstrbuton of lassfyng feature of MaCabe s EV(g). (a). base on mean rteron. (b). base on mnmum error rate rteron. Compare w e strbuton of numer feature n Fgure, e lassfyng features of e metr are more separable between e two lasses. Espeally n e esrpton base on mnmum error rate rteron, f we e treat e metr as e sngle feature of moules, only selom moules are mslassfe. he key ea of lassfyng feature esrpton s mprovng e lassfaton performane of features by e lassfers onstrute on em. 5. Experments In s seton, we evaluate e effetveness of e propose Classfyng Feature (CF) esrpton an e two feature lassfer algorm, represente by CF (base on mean rteron) an CF2(base on mnmum error rate). he experment s onute on 4 benhmark ata set from NASA ata set (KC2, KC, CM an PC), whh are publly aessble from e NASA IV&V Falty Metrs Data Program. Eah ataset ontans twenty one metrs as feature an e assoate epenent Boolean varable, (b) fault-prone or non-fault-prone. he performane of preton of software efet preton s typally evaluate usng a onfuson matrx, whh s shown n able. In s seton we use e ommonly use performane measures: auray, preson, reall an F-measure. he preton results of ree algorm (NN, Bayes an SVM) on ese four ata sets are shown n able 2, able 3, able 4 an able 5 (%). Atual able. Preton results Prete Non-Fault-Prone Fault-Prone Non-Fault-Prone N (rue Negatve) FP (False Postve) Fault-Prone FN (False Negatve) P (rue Postve) able 2. Preton results on CMI ataset Auray Preson Reall F-measure NN 62.2530 63.7050 62.667 62.868 NN+CF 69.805 73.7753 65.9792 68.4305 NN+CF2 7.2872 70.766 77.458 73.453 Bayes 65.63 76.3280 48.9375 56.0973 Bayes+CF 67.2842 75.2083 54.7083 62.228 Bayes+CF2 69.9048 69.4953 76.0208 7.8738 SVM 57.4464 50.0080 88.667 63.3539 SVM+CF 70.7560 70.6232 74.6875 7.8267 SVM+CF2 70.5476 70.4383 75.3958 72.068 able 3. Preton results on KC ataset Auray Preson Reall F-measure NN 7.7895 73.879 70.3292 7.945 NN+CF 73.457 77.7793 67.8858 72.3474 NN+CF2 75.9856 75.0230 80.6224 77.4708 Bayes 65.6968 82.397 43.095 56.950 Bayes+CF 70.0498 78.9030 57.7469 66.547 Bayes+CF2 73.806 73.8372 76.7078 74.966 SVM 69.8030 63.8792 98.430 77.256 SVM+CF 74.0659 75.4972 74.3673 74.7933 SVM+CF2 74.3654 72.27 82.4023 76.797 able 4. Preton results on KC2 ataset Auray Preson Reall F-measure NN 69.420 70.955 67.7644 68.97 NN+CF 78.0460 82.29 72.9567 76.9966 42

Proeengs of e 20 Internatonal Conferene on Wavelet Analyss an Pattern Reognton, Guln, 0-3 July, 20 NN+CF2 76.4909 76.6820 77.9327 76.9494 Bayes 69.6679 87.0080 47.6442 60.7960 Bayes+CF 75.8396 85.2068 63.983 72.6060 Bayes+CF2 77.8562 78.6577 78.490 78.509 SVM 63.687 58.5960 95.600 72.5795 SVM+CF 77.5370 78.4068 77.572 77.7266 SVM+CF2 77.7838 76.8082 8.5385 78.833 able 5. Preton results on PC ataset Auray Preson Reall F-measure NN 62.24 63.4266 62.2556 62.3820 NN+CF 67.76 72.3669 59.299 64.0422 NN+CF2 73.965 74.8950 74.044 73.9790 Bayes 65.0083 8.9888 39.6992 5.8326 Bayes+CF 65.4569 74.550 50.2726 59.222 Bayes+CF2 67.699 68.938 67.4530 67.4687 SVM 58.6220 87.008 34.6992 36.2836 SVM+CF 65.7736 67.903 63.4868 64.8424 SVM+CF2 69.303 68.743 73.9944 70.7430 From able 2 to able 5, In all e four atasets, lassfyng feature esrpton aheves e hghest results n bo auray an F-measure uner e ree lassfers. In preson, exept Bayes, lassfyng feature esrpton performs better an numer feature ones. In reall, lassfyng feature aheves hgher preton results n NN an Bayes (SVM, hgher n PC only). Compare to numer feature esrpton, e two feature lassfer algorms prove slght mprovement n most of e measurements. o efntvely onfrm s fat, we ompute e average preton results of e four ataset, whh are shown n Fgure 8. From Fgure 8, t s observe at lassfyng feature esrpton outperforms e numer feature esrpton n auray uner e ree lassfers from 3.6% (Bayes, CF) to 0.4% (SVM, CF2). F-measure onsers e harmon mean of preson an reall. It an be observe at e feature lassfers algorms aheve hgher F-measure an e normal ones. Moreover, he F-measure are sgnfantly hgher from 4.0% (NN, CF) to 6.88% (Bayes, CF2). 6. Conlusons hs paper propose a novel feature esrpton meo for software moules n preton of software efets, alle lassfyng feature. he man avantage of s esrpton n omparson to tratonal numer metrs s at e lassfaton effetveness of eah metr s mprove effetvely. For future work, we wll nvestgate e applablty of e lassfyng feature esrpton to oer oman an generalze t to mult-lassfaton problem. (a) Auray (b) Preson () Reall () F-measure Aknowlegements Fgure 8. Average results of e four atasets. hs paper s supporte by Proet No.CDJXS08226 an No. CDJRC080009 of e Funamental Researh Funs for e Central Unverstes an Proet No. CSC200BB227 of e Natural Sene Founaton of Chongqng. Referenes [] Gonra, I., Applyng mahne learnng to software fault-proneness preton, he Journal of Systems an Software, Vol. 8, pp. 86-95, 2008. [2] Zheng, J., Cost-senstve boostng neural networks for software efet preton, Expert Systems w Applatons, Vol. 37, pp. 4537-4543, 200. [3] Gll, G. an Kemerer, C., Cylomat omplexty ensty an software mantenane proutvty, IEEE ransatons on Software Engneerng, Vol. 7, No. 2, pp. 284-288, 99. [4] Guo, L., Ma, Y., Cuk, B. an Sngh, H., Robust Preton of Fault-Proneness by Ranom Forests, Proeengs of e 5 Internatonal Symposum on Software Relablty Engneerng(ISSRE 04), pp. 47-428, 2004. [5] Guang-e, L. an Wen-yong, W., Reah on an euatonal software efet preton moel base on SVM, Entertanment for Euaton. Dgtal ehnques An Systems 6249, pp. 25-222, 20 43