Chapter 6. Classification and Prediction

Size: px
Start display at page:

Download "Chapter 6. Classification and Prediction"

Transcription

1 Chapter 6. Classfcaton and Predcton What s classfcaton? What s Lazy learners (or learnng from predcton? your neghbors) Issues regardng classfcaton and Frequent-pattern-based predcton classfcaton Classfcaton by decson tree Other classfcaton methods nducton Predcton Classfcaton by back propagaton Accuracy and error measures October 25, 2013 Data Mnng: Concepts and Technques 1

2 Supervsed vs. Unsupervsed Learnng Supervsed learnng (classfcaton) Supervson: The tranng data (observatons, measurements, etc.) are accompaned by labels ndcatng the class of the observatons New data s classfed based on the tranng set Unsupervsed learnng (clusterng) The class labels of tranng data s unknown Gven a set of measurements, observatons, etc. wth the am of establshng the exstence of classes or clusters n the data October 25, 2013 Data Mnng: Concepts and Technques 2

3 Classfcaton vs. Predcton Classfcaton predcts categorcal class labels (dscrete or nomnal) classfes data (constructs a model) based on the tranng set and the values (class labels) n a classfyng attrbute and uses t n classfyng new data Predcton models contnuous-valued functons,.e., predcts unknown or mssng values Typcal applcatons Credt/loan approval: Medcal dagnoss: f a tumor s cancerous or bengn Fraud detecton: f a transacton s fraudulent Web page categorzaton: whch category t s October 25, 2013 Data Mnng: Concepts and Technques 3

4 Classfcaton A Two-Step Process Model constructon: descrbng a set of predetermned classes Each tuple/sample s assumed to belong to a predefned class, as determned by the class label attrbute The set of tuples used for model constructon s tranng set The model s represented as classfcaton rules, decson trees, or mathematcal formulae Model usage: for classfyng future or unknown objects Estmate accuracy of the model The known label of test sample s compared wth the classfed result from the model Accuracy rate s the percentage of test set samples that are correctly classfed by the model Test set s ndependent of tranng set, otherwse over-fttng wll occur If the accuracy s acceptable, use the model to classfy data tuples whose class labels are not known October 25, 2013 Data Mnng: Concepts and Technques 4

5 Process (1): Model Constructon Tranng Data Classfcaton Algorthms NAM E RANK YEARS TENURED M ke Assstant Prof 3 no M ary Assstant Prof 7 yes Bll Professor 2 yes Jm Assocate Prof 7 yes Dave Assstant Prof 6 no Anne Assocate Prof 3 no Classfer (Model) IF rank = professor OR years > 6 THEN tenured = yes October 25, 2013 Data Mnng: Concepts and Technques 5

6 Process (2): Usng the Model n Predcton Classfer Testng Data Unseen Data (Jeff, Professor, 4) NAM E RANK YEARS TENURED Tom Assstant Prof 2 no M erlsa Assocate Prof 7 no G eorge Professor 5 yes Joseph Assstant Prof 7 yes Tenured? October 25, 2013 Data Mnng: Concepts and Technques 6

7 Issues: Data Preparaton Data cleanng Preprocess data n order to reduce nose and handle mssng values Relevance analyss (feature selecton) Remove the rrelevant or redundant attrbutes Data transformaton Generalze and/or normalze data October 25, 2013 Data Mnng: Concepts and Technques 7

8 Issues: Evaluatng Classfcaton Methods Accuracy classfer accuracy: predctng class label predctor accuracy: guessng value of predcted attrbutes Speed tme to construct the model (tranng tme) tme to use the model (classfcaton/predcton tme) Robustness: handlng nose and mssng values Scalablty: effcency n dsk-resdent databases Interpretablty understandng and nsght provded by the model Other measures, e.g., goodness of rules, such as decson tree sze or compactness of classfcaton rules October 25, 2013 Data Mnng: Concepts and Technques 8

9 Decson Tree Inducton: Tranng Dataset Ths follows an example of Qunlan s ID3 (Playng Tenns) age ncome student credt_ratng buys_computer <=30 hgh no far no <=30 hgh no excellent no hgh no far yes >40 medum no far yes >40 low yes far yes >40 low yes excellent no low yes excellent yes <=30 medum no far no <=30 low yes far yes >40 medum yes far yes <=30 medum yes excellent yes medum no excellent yes hgh yes far yes >40 medum no excellent no October 25, 2013 Data Mnng: Concepts and Technques 9

10 Output: A Decson Tree for buys_computer age? <=30 overcast >40 student? yes credt ratng? no yes excellent far no yes yes no October 25, 2013 Data Mnng: Concepts and Technques 10

11 Algorthm for Decson Tree Inducton Basc algorthm (a greedy algorthm) Tree s constructed n a top-down recursve dvde-and-conquer manner At start, all the tranng examples are at the root Attrbutes are categorcal (f contnuous-valued, they are dscretzed n advance) Examples are parttoned recursvely based on selected attrbutes Test attrbutes are selected on the bass of a heurstc or statstcal measure (e.g., nformaton gan) Condtons for stoppng parttonng All samples for a gven node belong to the same class There are no remanng attrbutes for further parttonng majorty votng s employed for classfyng the leaf There are no samples left October 25, 2013 Data Mnng: Concepts and Technques 11

12 Attrbute Selecton Measure: Informaton Gan (ID3/C4.5) Select the attrbute wth the hghest nformaton gan Let p be the probablty that an arbtrary tuple n D belongs to class C, estmated by C, D / D Expected nformaton (entropy) needed to classfy a tuple n D: m Info D) = p log ( p ) ( 2 = 1 Informaton needed (after usng A to splt D nto v v parttons) to classfy D: Dj Info A( D) = I( D D Informaton ganed by branchng on attrbute A j= 1 Gan(A) = Info(D) j ) Info (D) A October 25, 2013 Data Mnng: Concepts and Technques 12

13 Attrbute Selecton: Informaton Gan g Class P: buys_computer = yes g Class N: buys_computer = no Info ( D) = I (9,5) = log ( ) log age p n I(p, n ) <= > ( ) 14 2 = I 14 Info age ( D) = (2,3) I (2,3) + I (3,2) = means age <=30 has 5 out of 14 samples, wth 2 yes es and 3 no s. Hence I (4,0) Gan( age) = Info( D) Info ( D) = age ncome student credt_ratng buys_computer <=30 hgh no far no age <=30 hgh no excellent no hgh no far yes >40 medum no far yes Smlarly, >40 low yes far yes >40 low yes excellent no low yes excellent yes Gan( ncome) = <=30 medum no far no <=30 low yes far yes Gan( student) = >40 medum yes far yes <=30 medum yes excellent yes medum no excellent yes Gan( credt _ ratng) = hgh yes far yes >40October medum 25, 2013 no excellent Data Mnng: no Concepts and Technques 13

14 Computng Informaton-Gan for Contnuous-Value Attrbutes Let attrbute A be a contnuous-valued attrbute Must determne the best splt pont for A Sort the value A n ncreasng order Typcally, the mdpont between each par of adjacent values s consdered as a possble splt pont (a +a +1 )/2 s the mdpont between the values of a and a +1 The pont wth the mnmum expected nformaton requrement for A s selected as the splt-pont for A Splt: D1 s the set of tuples n D satsfyng A splt-pont, and D2 s the set of tuples n D satsfyng A > splt-pont October 25, 2013 Data Mnng: Concepts and Technques 14

15 Gan Rato for Attrbute Selecton (C4.5) Informaton gan measure s based towards attrbutes wth a large number of values C4.5 (a successor of ID3) uses gan rato to overcome the problem (normalzaton to nformaton gan) Ex. SpltInfo v D j D j D) = log ( ) j 1 D D A ( 2 = GanRato(A) = Gan(A)/SpltInfo(A) SpltInfo A ( D) = log ( ) log 2 ( ) log gan_rato(ncome) = 0.029/0.926 = ( ) 14 2 = The attrbute wth the maxmum gan rato s selected as the splttng attrbute October 25, 2013 Data Mnng: Concepts and Technques 15

16 Gn ndex (CART, IBM IntellgentMner) If a data set D contans examples from n classes, gn ndex, gn(d) s defned as where p j s the relatve frequency of class j n D If a data set D s splt on A nto two subsets D 1 and D 2, the gn ndex gn(d) s defned as Reducton n Impurty: gn ( D) = 1 n p 2 j j = 1 D ( ) 1 D ( ) 2 gn A D = gn D1 + gn ( D 2) D D gn( A) = gn( D) gn ( D) The attrbute provdes the smallest gn splt (D) (or the largest reducton n mpurty) s chosen to splt the node (need to enumerate all the possble splttng ponts for each attrbute) A October 25, 2013 Data Mnng: Concepts and Technques 16

17 Gn ndex (CART, IBM IntellgentMner) Ex. D has 9 tuples n buys_computer = yes and 5 n no gn( D) = 1 = Suppose the attrbute ncome parttons D nto 10 n D 1 : {low, medum} and 4 n D gnncome low, medum} D) = Gn( D1 ) + Gn( { ( D1 2 ) but gn {medum,hgh} s 0.30 and thus the best snce t s the lowest All attrbutes are assumed contnuous-valued May need other tools, e.g., clusterng, to get the possble splt values Can be modfed for categorcal attrbutes October 25, 2013 Data Mnng: Concepts and Technques 17

18 Comparng Attrbute Selecton Measures The three measures, n general, return good results but Informaton gan: based towards multvalued attrbutes Gan rato: tends to prefer unbalanced splts n whch one partton s much smaller than the others Gn ndex: based to multvalued attrbutes has dffculty when # of classes s large tends to favor tests that result n equal-szed parttons and purty n both parttons October 25, 2013 Data Mnng: Concepts and Technques 18

19 Overfttng and Tree Prunng Overfttng: An nduced tree may overft the tranng data Too many branches, some may reflect anomales due to nose or outlers Poor accuracy for unseen samples Two approaches to avod overfttng Preprunng: Halt tree constructon early do not splt a node f ths would result n the goodness measure fallng below a threshold Dffcult to choose an approprate threshold Postprunng: Remove branches from a fully grown tree get a sequence of progressvely pruned trees Use a set of data dfferent from the tranng data to decde whch s the best pruned tree October 25, 2013 Data Mnng: Concepts and Technques 19

20 Classfcaton n Large Databases Classfcaton a classcal problem extensvely studed by statstcans and machne learnng researchers Scalablty: Classfyng data sets wth mllons of examples and hundreds of attrbutes wth reasonable speed Why decson tree nducton n data mnng? relatvely faster learnng speed (than other classfcaton methods) convertble to smple and easy to understand classfcaton rules can use SQL queres for accessng databases comparable classfcaton accuracy wth other methods October 25, 2013 Data Mnng: Concepts and Technques 20

21 Classfcaton by Backpropagaton Backpropagaton: A neural network learnng algorthm Started by psychologsts and neurobologsts to develop and test computatonal analogues of neurons A neural network: A set of connected nput/output unts where each connecton has a weght assocated wth t Durng the learnng phase, the network learns by adjustng the weghts so as to be able to predct the correct class label of the nput tuples Also referred to as connectonst learnng due to the connectons between unts October 25, 2013 Data Mnng: Concepts and Technques 21

22 Neural Network as a Classfer Weakness Long tranng tme Requre a number of parameters typcally best determned emprcally, e.g., the network topology or structure. Poor nterpretablty: Dffcult to nterpret the symbolc meanng behnd the learned weghts and of hdden unts n the network Strength Hgh tolerance to nosy data Ablty to classfy untraned patterns Well-suted for contnuous-valued nputs and outputs Successful on a wde array of real-world data Algorthms are nherently parallel Technques have recently been developed for the extracton of rules from traned neural networks October 25, 2013 Data Mnng: Concepts and Technques 22

23 A Neuron (= a perceptron) - µ k x 0 w 0 x 1 x n w 1 w n f output y For Example Input vector x weght vector w weghted sum Actvaton functon y n = sgn( w x µ k ) = 0 The n-dmensonal nput vector x s mapped nto varable y by means of the scalar product and a nonlnear functon mappng October 25, 2013 Data Mnng: Concepts and Technques 23

24 A Mult-Layer Feed-Forward Neural Network Output vector w = w + λ( y yˆ ) x ( k + 1) j ( k ) j ( k ) j Output layer Hdden layer w j Input layer Input vector: X October 25, 2013 Data Mnng: Concepts and Technques 24

25 How A Mult-Layer Neural Network Works? The nputs to the network correspond to the attrbutes measured for each tranng tuple Inputs are fed smultaneously nto the unts makng up the nput layer They are then weghted and fed smultaneously to a hdden layer The number of hdden layers s arbtrary, although usually only one The weghted outputs of the last hdden layer are nput to unts makng up the output layer, whch emts the network's predcton The network s feed-forward n that none of the weghts cycles back to an nput unt or to an output unt of a prevous layer From a statstcal pont of vew, networks perform nonlnear regresson: Gven enough hdden unts and enough tranng samples, they can closely approxmate any functon October 25, 2013 Data Mnng: Concepts and Technques 25

26 Defnng a Network Topology Frst decde the network topology: # of unts n the nput layer, # of hdden layers (f > 1), # of unts n each hdden layer, and # of unts n the output layer Normalzng the nput values for each attrbute measured n the tranng tuples to [ ] One nput unt per doman value, each ntalzed to 0 Output, f for classfcaton and more than two classes, one output unt per class s used Once a network has been traned and ts accuracy s unacceptable, repeat the tranng process wth a dfferent network topology or a dfferent set of ntal weghts October 25, 2013 Data Mnng: Concepts and Technques 26

27 Backpropagaton Iteratvely process a set of tranng tuples & compare the network's predcton wth the actual known target value For each tranng tuple, the weghts are modfed to mnmze the mean squared error between the network's predcton and the actual target value Modfcatons are made n the backwards drecton: from the output layer, through each hdden layer down to the frst hdden layer, hence backpropagaton Steps Intalze weghts (to small random #s) and bases n the network Propagate the nputs forward (by applyng actvaton functon) Backpropagate the error (by updatng weghts and bases) Termnatng condton (when error s very small, etc.) October 25, 2013 Data Mnng: Concepts and Technques 27

28 Backpropagaton and Interpretablty Effcency of backpropagaton: Each epoch (one nteraton through the tranng set) takes O( D * w), wth D tuples and w weghts, but # of epochs can be exponental to n, the number of nputs, n the worst case Rule extracton from networks: network prunng Smplfy the network structure by removng weghted lnks that have the least effect on the traned network Then perform lnk, unt, or actvaton value clusterng The set of nput and actvaton values are studed to derve rules descrbng the relatonshp between the nput and hdden unt layers Senstvty analyss: assess the mpact that a gven nput varable has on a network output. The knowledge ganed from ths analyss can be represented n rules October 25, 2013 Data Mnng: Concepts and Technques 28

29 Lazy vs. Eager Learnng Lazy vs. eager learnng Lazy learnng (e.g., nstance-based learnng): Smply stores tranng data (or only mnor processng) and wats untl t s gven a test tuple Eager learnng (the above dscussed methods): Gven a set of tranng set, constructs a classfcaton model before recevng new (e.g., test) data to classfy Lazy: less tme n tranng but more tme n predctng Accuracy Lazy method effectvely uses a rcher hypothess space snce t uses many local lnear functons to form ts mplct global approxmaton to the target functon Eager: must commt to a sngle hypothess that covers the entre nstance space October 25, 2013 Data Mnng: Concepts and Technques 29

30 Lazy Learner: Instance-Based Methods Instance-based learnng: Store tranng examples and delay the processng ( lazy evaluaton ) untl a new nstance must be classfed Typcal approaches k-nearest neghbor approach Instances represented as ponts n a Eucldean space. Locally weghted regresson Constructs local approxmaton Case-based reasonng Uses symbolc representatons and knowledgebased nference October 25, 2013 Data Mnng: Concepts and Technques 30

31 The k-nearest Neghbor Algorthm All nstances correspond to ponts n the n-d space The nearest neghbor are defned n terms of Eucldean dstance, dst(x 1, X 2 ) Target functon could be dscrete- or real- valued For dscrete-valued, k-nn returns the most common value among the k tranng examples nearest to x q Vonoro dagram: the decson surface nduced by 1- NN for a typcal set of tranng examples _ + _. + _ + x q _ + October 25, 2013 Data Mnng: Concepts and Technques

32 Dscusson on the k-nn Algorthm k-nn for real-valued predcton for a gven unknown tuple Returns the mean values of the k nearest neghbors Dstance-weghted nearest neghbor algorthm Weght the contrbuton of each of the k neghbors accordng to ther dstance to the query x q w Gve greater weght to closer neghbors d( x q Robust to nosy data by averagng k-nearest neghbors 1, x ) 2 Curse of dmensonalty: dstance between neghbors could be domnated by rrelevant attrbutes To overcome t, axes stretch or elmnaton of the least relevant attrbutes October 25, 2013 Data Mnng: Concepts and Technques 32

33 Genetc Algorthms (GA) Genetc Algorthm: based on an analogy to bologcal evoluton An ntal populaton s created consstng of randomly generated rules Each rule s represented by a strng of bts E.g., f A 1 and A 2 then C 2 can be encoded as 100 If an attrbute has k > 2 values, k bts can be used Based on the noton of survval of the fttest, a new populaton s formed to consst of the fttest rules and ther offsprngs The ftness of a rule s represented by ts classfcaton accuracy on a set of tranng examples Offsprngs are generated by crossover and mutaton The process contnues untl a populaton P evolves when each rule n P satsfes a prespecfed threshold Slow but easly parallelzable October 25, 2013 Data Mnng: Concepts and Technques 33

34 What Is Predcton? (Numercal) predcton s smlar to classfcaton construct a model use model to predct contnuous or ordered value for a gven nput Predcton s dfferent from classfcaton Classfcaton refers to predct categorcal class label Predcton models contnuous-valued functons Major method for predcton: regresson model the relatonshp between one or more ndependent or predctor varables and a dependent or response varable Regresson analyss Lnear and multple regresson Non-lnear regresson Other regresson methods: generalzed lnear model, Posson regresson, log-lnear models, regresson trees October 25, 2013 Data Mnng: Concepts and Technques 34

35 Lnear Regresson Lnear regresson: nvolves a response varable y and a sngle predctor varable x y = w 0 + w 1 x where w 0 (y-ntercept) and w 1 (slope) are regresson coeffcents Method of least squares: estmates the best-fttng straght lne w D ( x = 1 = D 1 = 1 x)( y ( x x) 2 y) w = y w x 0 1 Multple lnear regresson: nvolves more than one predctor varable Tranng data s of the form (X 1, y 1 ), (X 2, y 2 ),, (X D, y D ) Ex. For 2-D data, we may have: y = w 0 + w 1 x 1 + w 2 x 2 Solvable by extenson of least square method or usng SAS, S-Plus Many nonlnear functons can be transformed nto the above October 25, 2013 Data Mnng: Concepts and Technques 35

36 Nonlnear Regresson Some nonlnear models can be modeled by a polynomal functon A polynomal regresson model can be transformed nto lnear regresson model. For example, y = w 0 + w 1 x + w 2 x 2 + w 3 x 3 convertble to lnear wth new varables: x 2 = x 2, x 3 = x 3 y = w 0 + w 1 x + w 2 x 2 + w 3 x 3 Other functons, such as power functon, can also be transformed to lnear model Some models are ntractable nonlnear (e.g., sum of exponental terms) possble to obtan least square estmates through extensve calculaton on more complex formulae October 25, 2013 Data Mnng: Concepts and Technques 36

37 Other Regresson-Based Models Generalzed lnear model: Foundaton on whch lnear regresson can be appled to modelng categorcal response varables Varance of y s a functon of the mean value of y, not a constant Logstc regresson: models the prob. of some event occurrng as a lnear functon of a set of predctor varables Posson regresson: models the data that exhbt a Posson dstrbuton Log-lnear models: (for categorcal data) Approxmate dscrete multdmensonal prob. dstrbutons Also useful for data compresson and smoothng Regresson trees and model trees Trees to predct contnuous values rather than class labels October 25, 2013 Data Mnng: Concepts and Technques 37

38 Predcton: Numercal Data October 25, 2013 Data Mnng: Concepts and Technques 38

39 Predcton: Categorcal Data October 25, 2013 Data Mnng: Concepts and Technques 39

40 Classfer Accuracy Measures Real class\predcted class C 1 ~C 1 C 1 True postve False negatve ~C 1 False postve True negatve Real class\predcted class buy_computer = yes buy_computer = no total recognton(%) buy_computer = yes buy_computer = no total Accuracy of a classfer M, acc(m): percentage of test set tuples that are correctly classfed by the model M Error rate (msclassfcaton rate) of M = 1 acc(m) Gven m classes, CM,j, an entry n a confuson matrx, ndcates # of tuples n class that are labeled by the classfer as class j Alternatve accuracy measures (e.g., for cancer dagnoss) senstvty = t-pos/pos /* true postve recognton rate */ specfcty = t-neg/neg /* true negatve recognton rate */ precson = t-pos/(t-pos + f-pos) accuracy = senstvty * pos/(pos + neg) + specfcty * neg/(pos + neg) Ths model can also be used for cost-beneft analyss October 25, 2013 Data Mnng: Concepts and Technques 40

41 October 25, 2013 Data Mnng: Concepts and Technques 41 Predctor Error Measures Measure predctor accuracy: measure how far off the predcted value s from the actual known value Loss functon: measures the error betw. y and the predcted value y Absolute error: y y Squared error: (y y ) 2 Test error (generalzaton error): the average loss over the test set Mean absolute error: Mean squared error: Relatve absolute error: Relatve squared error: The mean squared-error exaggerates the presence of outlers Popularly use (square) root mean-square error, smlarly, root relatve squared error d y y d = 1 ' d y y d = 1 2 ') ( = = d d y y y y 1 1 ' = = d d y y y y ) ( ') (

42 Evaluatng the Accuracy of a Classfer or Predctor (I) Holdout method Gven data s randomly parttoned nto two ndependent sets Tranng set (e.g., 2/3) for model constructon Test set (e.g., 1/3) for accuracy estmaton Random samplng: a varaton of holdout Repeat holdout k tmes, accuracy = avg. of the accuraces obtaned Cross-valdaton (k-fold, where k = 10 s most popular) Randomly partton the data nto k mutually exclusve subsets, each approxmately equal sze At -th teraton, use D as test set and others as tranng set Leave-one-out: k folds where k = # of tuples, for small szed data Stratfed cross-valdaton: folds are stratfed so that class dst. n each fold s approx. the same as that n the ntal data October 25, 2013 Data Mnng: Concepts and Technques 42

43 Model Selecton: ROC Curves ROC (Recever Operatng Characterstcs) curves: for vsual comparson of classfcaton models Orgnated from sgnal detecton theory Shows the trade-off between the true postve rate and the false postve rate The area under the ROC curve s a measure of the accuracy of the model Rank the test tuples n decreasng order: the one that s most lkely to belong to the postve class appears at the top of the lst The closer to the dagonal lne (.e., the closer the area s to 0.5), the less accurate s the model Vertcal axs represents the true postve rate Horzontal axs rep. the false postve rate The plot also shows a dagonal lne A model wth perfect accuracy wll have an area of 1.0 October 25, 2013 Data Mnng: Concepts and Technques 43

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

Lecture 2: Single Layer Perceptrons Kevin Swingler

Lecture 2: Single Layer Perceptrons Kevin Swingler Lecture 2: Sngle Layer Perceptrons Kevn Sngler kms@cs.str.ac.uk Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ). REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets

Improved Mining of Software Complexity Data on Evolutionary Filtered Training Sets Improved Mnng of Software Complexty Data on Evolutonary Fltered Tranng Sets VILI PODGORELEC Insttute of Informatcs, FERI Unversty of Marbor Smetanova ulca 17, SI-2000 Marbor SLOVENIA vl.podgorelec@un-mb.s

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background: SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Statistical Methods to Develop Rating Models

Statistical Methods to Develop Rating Models Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and

More information

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES In ths chapter, we wll learn how to descrbe the relatonshp between two quanttatve varables. Remember (from Chapter 2) that the terms quanttatve varable

More information

The Greedy Method. Introduction. 0/1 Knapsack Problem

The Greedy Method. Introduction. 0/1 Knapsack Problem The Greedy Method Introducton We have completed data structures. We now are gong to look at algorthm desgn methods. Often we are lookng at optmzaton problems whose performance s exponental. For an optmzaton

More information

CHAPTER 14 MORE ABOUT REGRESSION

CHAPTER 14 MORE ABOUT REGRESSION CHAPTER 14 MORE ABOUT REGRESSION We learned n Chapter 5 that often a straght lne descrbes the pattern of a relatonshp between two quanttatve varables. For nstance, n Example 5.1 we explored the relatonshp

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6 PAR TESTS If a WEIGHT varable s specfed, t s used to replcate a case as many tmes as ndcated by the weght value rounded to the nearest nteger. If the workspace requrements are exceeded and samplng has

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

v a 1 b 1 i, a 2 b 2 i,..., a n b n i. SECTION 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS 455 8.4 COMPLEX VECTOR SPACES AND INNER PRODUCTS All the vector spaces we have studed thus far n the text are real vector spaces snce the scalars are

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

A novel Method for Data Mining and Classification based on

A novel Method for Data Mining and Classification based on A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: lhan-gege@126.com Abstract Data mnng has been attached great

More information

Cluster Analysis. Cluster Analysis

Cluster Analysis. Cluster Analysis Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation Exhaustve Regresson An Exploraton of Regresson-Based Data Mnng Technques Usng Super Computaton Antony Daves, Ph.D. Assocate Professor of Economcs Duquesne Unversty Pttsburgh, PA 58 Research Fellow The

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12 14 The Ch-squared dstrbuton PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 1 If a normal varable X, havng mean µ and varance σ, s standardsed, the new varable Z has a mean 0 and varance 1. When ths standardsed

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

L10: Linear discriminants analysis

L10: Linear discriminants analysis L0: Lnear dscrmnants analyss Lnear dscrmnant analyss, two classes Lnear dscrmnant analyss, C classes LDA vs. PCA Lmtatons of LDA Varants of LDA Other dmensonalty reducton methods CSCE 666 Pattern Analyss

More information

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns

A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management

More information

Bag-of-Words models. Lecture 9. Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka

Bag-of-Words models. Lecture 9. Slides from: S. Lazebnik, A. Torralba, L. Fei-Fei, D. Lowe, C. Szurka Bag-of-Words models Lecture 9 Sldes from: S. Lazebnk, A. Torralba, L. Fe-Fe, D. Lowe, C. Szurka Bag-of-features models Overvew: Bag-of-features models Orgns and motvaton Image representaton Dscrmnatve

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP) 6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

Data Mining Analysis and Modeling for Marketing Based on Attributes of Customer Relationship

Data Mining Analysis and Modeling for Marketing Based on Attributes of Customer Relationship School of athematcs and Systems Engneerng Reports from SI - Rapporter från SI Data nng Analyss and odelng for arketng Based on Attrbutes of Customer Relatonshp Xaoshan Du Sep 2006 SI Report 06129 Väö Unversty

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing

Learning from Large Distributed Data: A Scaling Down Sampling Scheme for Efficient Data Processing Internatonal Journal of Machne Learnng and Computng, Vol. 4, No. 3, June 04 Learnng from Large Dstrbuted Data: A Scalng Down Samplng Scheme for Effcent Data Processng Che Ngufor and Janusz Wojtusak part

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Portfolio Loss Distribution

Portfolio Loss Distribution Portfolo Loss Dstrbuton Rsky assets n loan ortfolo hghly llqud assets hold-to-maturty n the bank s balance sheet Outstandngs The orton of the bank asset that has already been extended to borrowers. Commtment

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

A COLLABORATIVE TRADING MODEL BY SUPPORT VECTOR REGRESSION AND TS FUZZY RULE FOR DAILY STOCK TURNING POINTS DETECTION

A COLLABORATIVE TRADING MODEL BY SUPPORT VECTOR REGRESSION AND TS FUZZY RULE FOR DAILY STOCK TURNING POINTS DETECTION A COLLABORATIVE TRADING MODEL BY SUPPORT VECTOR REGRESSION AND TS FUZZY RULE FOR DAILY STOCK TURNING POINTS DETECTION JHENG-LONG WU, PEI-CHANN CHANG, KAI-TING CHANG Department of Informaton Management,

More information

Extending Probabilistic Dynamic Epistemic Logic

Extending Probabilistic Dynamic Epistemic Logic Extendng Probablstc Dynamc Epstemc Logc Joshua Sack May 29, 2008 Probablty Space Defnton A probablty space s a tuple (S, A, µ), where 1 S s a set called the sample space. 2 A P(S) s a σ-algebra: a set

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Lecture 5,6 Linear Methods for Classification. Summary

Lecture 5,6 Linear Methods for Classification. Summary Lecture 5,6 Lnear Methods for Classfcaton Rce ELEC 697 Farnaz Koushanfar Fall 2006 Summary Bayes Classfers Lnear Classfers Lnear regresson of an ndcator matrx Lnear dscrmnant analyss (LDA) Logstc regresson

More information

1. Measuring association using correlation and regression

1. Measuring association using correlation and regression How to measure assocaton I: Correlaton. 1. Measurng assocaton usng correlaton and regresson We often would lke to know how one varable, such as a mother's weght, s related to another varable, such as a

More information

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES The goal: to measure (determne) an unknown quantty x (the value of a RV X) Realsaton: n results: y 1, y 2,..., y j,..., y n, (the measured values of Y 1, Y 2,..., Y j,..., Y n ) every result s encumbered

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Dropout: A Simple Way to Prevent Neural Networks from Overfitting Journal of Machne Learnng Research 15 (2014) 1929-1958 Submtted 11/13; Publshed 6/14 Dropout: A Smple Way to Prevent Neural Networks from Overfttng Ntsh Srvastava Geoffrey Hnton Alex Krzhevsky Ilya Sutskever

More information

Abstract. Clustering ensembles have emerged as a powerful method for improving both the

Abstract. Clustering ensembles have emerged as a powerful method for improving both the Clusterng Ensembles: {topchyal, Models jan, of punch}@cse.msu.edu Consensus and Weak Parttons * Alexander Topchy, Anl K. Jan, and Wllam Punch Department of Computer Scence and Engneerng, Mchgan State Unversty

More information

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki*

An artificial Neural Network approach to monitor and diagnose multi-attribute quality control processes. S. T. A. Niaki* Journal of Industral Engneerng Internatonal July 008, Vol. 4, No. 7, 04 Islamc Azad Unversty, South Tehran Branch An artfcal Neural Network approach to montor and dagnose multattrbute qualty control processes

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

AUTHENTICATION OF OTTOMAN ART CALLIGRAPHERS

AUTHENTICATION OF OTTOMAN ART CALLIGRAPHERS INTERNATIONAL JOURNAL OF ELECTRONICS; MECHANICAL and MECHATRONICS ENGINEERING Vol.2 Num.2 pp.(2-22) AUTHENTICATION OF OTTOMAN ART CALLIGRAPHERS Osman N. Ucan Mustafa Istanbullu Nyaz Klc2 Ahmet Kala3 Istanbul

More information

A practical approach to combine data mining and prognostics for improved predictive maintenance

A practical approach to combine data mining and prognostics for improved predictive maintenance A practcal approach to combne data mnng and prognostcs for mproved predctve mantenance Abdellatf Bey- Temsaman +32 (0) 16328047 abdellatf.beytemsaman@ fmtc.be Marc Engels +32 (0) 16328031 marc.engels@

More information

Decision Tree Model for Count Data

Decision Tree Model for Count Data Proceedngs of the World Congress on Engneerng 2012 Vol I Decson Tree Model for Count Data Yap Bee Wah, Norashkn Nasaruddn, Wong Shaw Voon and Mohamad Alas Lazm Abstract The Posson Regresson and Negatve

More information

8 Algorithm for Binary Searching in Trees

8 Algorithm for Binary Searching in Trees 8 Algorthm for Bnary Searchng n Trees In ths secton we present our algorthm for bnary searchng n trees. A crucal observaton employed by the algorthm s that ths problem can be effcently solved when the

More information

Predicting Software Development Project Outcomes *

Predicting Software Development Project Outcomes * Predctng Software Development Project Outcomes * Rosna Weber, Mchael Waller, June Verner, Wllam Evanco College of Informaton Scence & Technology, Drexel Unversty 3141 Chestnut Street Phladelpha, PA 19104

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Mcroarray Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 6 Some Advanced Topcs Dr. Petr Nazarov 14-01-013 petr.nazarov@crp-sante.lu Statstcal data analyss n Ecel. 6. Some advanced topcs Correcton for

More information

Detecting Credit Card Fraud using Periodic Features

Detecting Credit Card Fraud using Periodic Features Detectng Credt Card Fraud usng Perodc Features Alejandro Correa Bahnsen, Djamla Aouada, Aleksandar Stojanovc and Björn Ottersten Interdscplnary Centre for Securty, Relablty and Trust Unversty of Luxembourg,

More information

How To Calculate The Accountng Perod Of Nequalty

How To Calculate The Accountng Perod Of Nequalty Inequalty and The Accountng Perod Quentn Wodon and Shlomo Ytzha World Ban and Hebrew Unversty September Abstract Income nequalty typcally declnes wth the length of tme taen nto account for measurement.

More information

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank.

Marginal Benefit Incidence Analysis Using a Single Cross-section of Data. Mohamed Ihsan Ajwad and Quentin Wodon 1. World Bank. Margnal Beneft Incdence Analyss Usng a Sngle Cross-secton of Data Mohamed Ihsan Ajwad and uentn Wodon World Bank August 200 Abstract In a recent paper, Lanjouw and Ravallon proposed an attractve and smple

More information

An RFID Distance Bounding Protocol

An RFID Distance Bounding Protocol An RFID Dstance Boundng Protocol Gerhard P. Hancke and Markus G. Kuhn May 22, 2006 An RFID Dstance Boundng Protocol p. 1 Dstance boundng Verfer d Prover Places an upper bound on physcal dstance Does not

More information

BANKRUPTCY PREDICTION BY USING SUPPORT VECTOR MACHINES AND GENETIC ALGORITHMS

BANKRUPTCY PREDICTION BY USING SUPPORT VECTOR MACHINES AND GENETIC ALGORITHMS BANKRUPCY PREDICION BY USING SUPPOR VECOR MACHINES AND GENEIC ALGORIHMS SALEHI Mahd Ferdows Unversty of Mashhad, Iran ROSAMI Neda Islamc Azad Unversty Scence and Research Khorasan-e-Razav Branch Abstract:

More information

Biometric Signature Processing & Recognition Using Radial Basis Function Network

Biometric Signature Processing & Recognition Using Radial Basis Function Network Bometrc Sgnature Processng & Recognton Usng Radal Bass Functon Network Ankt Chadha, Neha Satam, and Vbha Wal Abstract- Automatc recognton of sgnature s a challengng problem whch has receved much attenton

More information

Web Spam Detection Using Machine Learning in Specific Domain Features

Web Spam Detection Using Machine Learning in Specific Domain Features Journal of Informaton Assurance and Securty 3 (2008) 220-229 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features Hassan Najadat 1, Ismal Hmed 2 Department of Computer Informaton Systems Faculty

More information

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification

A Hierarchical Anomaly Network Intrusion Detection System using Neural Network Classification IDC IDC A Herarchcal Anomaly Network Intruson Detecton System usng Neural Network Classfcaton ZHENG ZHANG, JUN LI, C. N. MANIKOPOULOS, JAY JORGENSON and JOSE UCLES ECE Department, New Jersey Inst. of Tech.,

More information

SIMPLE LINEAR CORRELATION

SIMPLE LINEAR CORRELATION SIMPLE LINEAR CORRELATION Smple lnear correlaton s a measure of the degree to whch two varables vary together, or a measure of the ntensty of the assocaton between two varables. Correlaton often s abused.

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

Economic Interpretation of Regression. Theory and Applications

Economic Interpretation of Regression. Theory and Applications Economc Interpretaton of Regresson Theor and Applcatons Classcal and Baesan Econometrc Methods Applcaton of mathematcal statstcs to economc data for emprcal support Economc theor postulates a qualtatve

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information

Fast Fuzzy Clustering of Web Page Collections

Fast Fuzzy Clustering of Web Page Collections Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,

More information

Hybrid-Learning Methods for Stock Index Modeling

Hybrid-Learning Methods for Stock Index Modeling Hybrd-Learnng Methods for Stock Index Modelng 63 Chapter IV Hybrd-Learnng Methods for Stock Index Modelng Yuehu Chen, Jnan Unversty, Chna Ajth Abraham, Chung-Ang Unversty, Republc of Korea Abstract The

More information

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract

Chapter XX More advanced approaches to the analysis of survey data. Gad Nathan Hebrew University Jerusalem, Israel. Abstract Household Sample Surveys n Developng and Transton Countres Chapter More advanced approaches to the analyss of survey data Gad Nathan Hebrew Unversty Jerusalem, Israel Abstract In the present chapter, we

More information

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms

Cluster Analysis of Data Points using Partitioning and Probabilistic Model-based Algorithms Internatonal Journal of Appled Informaton Systems (IJAIS) ISSN : 2249-0868 Foundaton of Computer Scence FCS, New York, USA Volume 7 No.7, August 2014 www.jas.org Cluster Analyss of Data Ponts usng Parttonng

More information

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council

Using Supervised Clustering Technique to Classify Received Messages in 137 Call Center of Tehran City Council Usng Supervsed Clusterng Technque to Classfy Receved Messages n 137 Call Center of Tehran Cty Councl Mahdyeh Haghr 1*, Hamd Hassanpour 2 (1) Informaton Technology engneerng/e-commerce, Shraz Unversty (2)

More information

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data

Estimating the Number of Clusters in Genetics of Acute Lymphoblastic Leukemia Data Journal of Al Azhar Unversty-Gaza (Natural Scences), 2011, 13 : 109-118 Estmatng the Number of Clusters n Genetcs of Acute Lymphoblastc Leukema Data Mahmoud K. Okasha, Khaled I.A. Almghar Department of

More information

1 De nitions and Censoring

1 De nitions and Censoring De ntons and Censorng. Survval Analyss We begn by consderng smple analyses but we wll lead up to and take a look at regresson on explanatory factors., as n lnear regresson part A. The mportant d erence

More information

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis

Interpreting Patterns and Analysis of Acute Leukemia Gene Expression Data by Multivariate Statistical Analysis Interpretng Patterns and Analyss of Acute Leukema Gene Expresson Data by Multvarate Statstcal Analyss ChangKyoo Yoo * and Peter A. Vanrolleghem BIOMATH, Department of Appled Mathematcs, Bometrcs and Process

More information

Loop Parallelization

Loop Parallelization - - Loop Parallelzaton C-52 Complaton steps: nested loops operatng on arrays, sequentell executon of teraton space DECLARE B[..,..+] FOR I :=.. FOR J :=.. I B[I,J] := B[I-,J]+B[I-,J-] ED FOR ED FOR analyze

More information

Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals

Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals Automated nformaton technology for onosphere montorng of low-orbt navgaton satellte sgnals Alexander Romanov, Sergey Trusov and Alexey Romanov Federal State Untary Enterprse Russan Insttute of Space Devce

More information

Heuristic Static Load-Balancing Algorithm Applied to CESM

Heuristic Static Load-Balancing Algorithm Applied to CESM Heurstc Statc Load-Balancng Algorthm Appled to CESM 1 Yur Alexeev, 1 Sher Mckelson, 1 Sven Leyffer, 1 Robert Jacob, 2 Anthony Crag 1 Argonne Natonal Laboratory, 9700 S. Cass Avenue, Argonne, IL 60439,

More information

Prediction of Disability Frequencies in Life Insurance

Prediction of Disability Frequencies in Life Insurance Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng Fran Weber Maro V. Wüthrch October 28, 2011 Abstract For the predcton of dsablty frequences, not only the observed, but also the ncurred but

More information

DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION

DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION DATA MINING CLASSIFICATION ALGORITHMS FOR KIDNEY DISEASE PREDICTION Dr. S. Vjayaran 1, Mr.S.Dhayanand 2, Assstant Professor 1, M.Phl Research Scholar 2, Department of Computer Scence, School of Computer

More information

How To Find The Dsablty Frequency Of A Clam

How To Find The Dsablty Frequency Of A Clam 1 Predcton of Dsablty Frequences n Lfe Insurance Bernhard Köng 1, Fran Weber 1, Maro V. Wüthrch 2 Abstract: For the predcton of dsablty frequences, not only the observed, but also the ncurred but not yet

More information