Weighted Pseudo-Metric Discriminatory Power Improvement Using a Bayesian Logistic Regression Model Based on a Variational Method

Size: px
Start display at page:

Download "Weighted Pseudo-Metric Discriminatory Power Improvement Using a Bayesian Logistic Regression Model Based on a Variational Method"

Transcription

1 Weighted Pseudo-Metric Discriminatory Power Improvement Using a Bayesian Logistic Regression Model Based on a Variational Method R. Ksantini 1, D. Ziou 1, B. Colin 2, and F. Dubeau 2 1 Département d Informatique 2 Département de Mathématiques Université de Sherbrooke Université de Sherbrooke Sherbrooke (Qc), Canada, J1K 2R1 Sherbrooke (Qc), Canada, J1K 2R1 {riadh.ksantini, djemel.ziou}@usherbrooke.ca {bernard.colin, francois.dubeau}@usherbrooke.ca - Technical Report No Abstract Distance measures like the nearest neighbor rule distance and the Euclidean distance have been the most widely used to measure similarities between feature vectors in the content-based image retrieval (CBIR) systems. However, in these similarity measures no assumption is made about the probability distributions and the local relevances of the feature vectors, thereby irrelevant features might hurt retrieval performance. Probabilistic approaches have proven to be an effective solution to this CBIR problem. In this paper, we use a Bayesian logistic regression model based on a variational method, in order to compute the weights of a pseudo-metric, to improve its discriminatory capacity and then to increase image retrieval accuracy. This pseudo-metric makes use of the compressed and quantized versions of the Daubechies-8 wavelet decomposed feature vectors, and its weights were adjusted by the classical logistic regression. The evaluation and comparison of the Bayesian logistic regression model and the classical logistic regression one are performed independently of and in image retrieval context. Experimental results show that the Bayesian logistic regression model is a significantly better tool than the classical logistic regression model to compute the pseudo-metric weights and to improve the retrieval and classification performances. Keywords: Image Retrieval, Logistic Regression, Variational Method, Weighted Pseudo-Metric. Corresponding author. Tel.: x2853; fax:

2 Contents List of Tables 3 List of Figures 3 1 Introduction 4 2 The pseudo-metric and the fast feature vector querying algorithm The pseudo-metric The fast feature vector querying algorithm Pseudo-metric weight adjustment The classical logistic regression model The Bayesian logistic regression model Training Color image retrieval method Color image database preprocessing The querying algorithm Experimental results Decontextualized classification performance evaluation of both models Contextualized evaluation and comparison of both models Used feature vectors The choices of q 0 and q Querying evaluation and comparison Conclusion 27 Bibliography 29 2

3 List of Tables 5.1 Comparison between the CLR and the BLR (synthetic data 1) Comparison between the CLR and the BLR (synthetic data 2) Comparison between the CLR and the BLR (extracted data sets from the Iris data) List of Figures 5.1 Synthetic data Synthetic data Extracted data sets from the Iris data Evaluation (ZuBud database) Evaluation (WANG database) Comparison (ZuBud database) (m = 30) Comparison (WANG database) (m = 30)

4 1 Introduction The rapid expansion of the Internet and the wide use of digital data in many real world applications in the field of medecine, weather prediction, communications, commerce and academia, increased the need for both efficient image database creation and retrieval procedures. For this reason, content-based image retrieval (CBIR) approach was proposed [25], [3]. In this approach, the first step is to compute for each database image a feature vector capturing certain visual features of the image such as color, texture and shape. This feature vector is stored in a featurebase, and then given a query image chosen by a user, its feature vector is computed, compared to the featurebase feature vectors by a similarity measure, and finally the most similar database images to the query image are returned to the user. Distance measures like the nearest neighbor rule distance and the Euclidean distance have been the most widely used for feature vector comparison in the CBIR systems. However, these similarity measures are only based on the distances between feature vectors in the feature space. Therefore, because of the lack of information about the relative relevances of the featurebase feature vectors and because of the noise in these vectors, distance measures can fail and irrelevant features might hurt retrieval performance. Probabilistic approaches are a promising solution to this CBIR problem [15], that when compared to the standard CBIR methods based on the distance measures, can lead to a significant gain in retrieval accuracy. In fact, these approaches are capable of generating probabilistic similarity measures and highly customized metrics for computing image similarity based on the consideration and distinction of the relative feature vector relevances. In the following we review some of the CBIR systems based on these probabilistic approaches. J. Peng et al. [11] used a binary classification to classify the database color image feature vectors as relevant or irrelevant. The classified feature vectors and the query image feature vector constitute training data. From this latter, relevance weights of different features are computed. The components of the weight vector represent local relevance of each feature. They are adaptive to the location of the query image feature vector in the feature space. After the feature relevance has been determined, a weighted similarity metric is selected using reinforcement learning which is based on the classical logistic regression. Three different metrics were chosen: weighted Euclidean metric, weighted city-block metric and weighted dominance metric. G. Caenen and E. J. Pauwels [8] used the classical quadratic logistic regression model, in order to classify database image feature vectors as relevant or irrelevant. Based on this classification, a total relevance probability is generated to each image in the database. This total relevance probability is a linear combination of weights used to fine-tune the influence of each individual feature, with the natural logarithms of the logistic relevance probabilities of the feature vector components. Database images are ranked according to their total relevance probabilities. S. Aksoy et al. [21] used weighted L 1 and L 2 distances, in order to measure the similarity degree between two images, where the weights are the ratios of standard deviations of the feature values both for the whole database and also among the images selected as relevant. Each weight vector component represents the local relevance of each feature, and it assigns more importance to this latter if it is relevant. S. Aksoy and R. M. Haralick [20] investigated five different normalization method effectiveness in com- 4

5 bination with two different likelihood-based similarity measures that compute the likelihood of two images being similar or dissimilar, one being the query image and the other one being an image in the database. First, two classes are defined, the relevance class and the irrelevance class, and then the likelihood values are derived from the Bayesian classifier. Two different methods were used to estimate the conditional probabilities used in the classifier. The first method used multivariate normal assumption and the second one used independently fitted distributions for each feature. The similarity degree between a query image and a database image is measured by the likelihood ratio. In this paper, we investigate the effectiveness of a Bayesian logistic regression model based on a variational method [24], in order to adjust the weights of a pseudo-metric used in [18], and then to improve its discriminatory capacity and to increase image retrieval accuracy. This pseudo-metric makes use of the compressed and quantized versions of the Daubechies-8 wavelet decomposed feature vectors, and its weights were adjusted by the classical logistic regression. The basic principle of using the Bayesian logistic regression model and the classical logistic regression one is to allow a good separation between a relevance class and an irrelevance class extracted from the featurebase representing the color image database, and then to compute the pseudo-metric weights which represent the local relevances of scaling factor differences and the resolution level coefficient mismatches of the Daubechies-8 wavelet decomposed, compressed and quantized feature vectors. We will show that thanks to the variational method and to the prior information incorporation, the used Bayesian logistic regression model is a significantly better tool than the classical logistic regression model to compute the pseudo-metric weights and to improve the classification performance and the querying results. The classification performance evaluation and comparison of both models are performed independently of image retrieval context, using a cumulative distance error and a classification accuracy used in [14], and the evaluation of the retrieval method using both models separately, is performed using precision and scope curves as defined in [12]. In the next section, we briefly redefine the pseudo-metric and explain the fast feature vector querying algorithm. In section 3, we briefly describe the pseudo-metric weight adjustment using the classical logistic regression model, while showing the limitations of this latter and that the Bayesian logistic regression model based on a variational method is more appropriate for the pseudo-metric weight computation. Then, we detail the Bayesian logistic regression model based on a variational method and we present the weight computation algorithm. Finally, we describe the data training performed for both models, in order to validate the pseudo-metric weight computation. The color image retrieval method is briefly presented in section 4. Finally, in section 5, the decontextualized classification performance evaluation and comparison of both models are performed using a cumulative distance error and a classification accuracy [14]. Then, feature vectors that we use to represent the database color images are summarized. Moreover, we will perform some experiments to validate the Bayesian logistic regression model and we will use the precision and scope [12], in order to show the advantage of the Bayesian logistic regression model over the classical logistic regression one, in terms of querying results. 5

6 2 The pseudo-metric and the fast feature vector querying algorithm 2.1 The pseudo-metric Let us consider Q and T as the query and the target feature vectors, respectively, having 2 J components each. The pseudo-metric is given by the following expression Q, T = w 0 Q[0] T [0] w bin(i) ( Q c q[i] = T q c [i]), (2.1) where ( i: Q c q[i] 0 ) { Q c q[i] = T 1 if q c Qc [i] = q [i] = T q c [i] 0 otherwise, Q[0] and T [0] are the scaling factors of Q and T, Qc q [i] and T c q [i] represent the i-th coefficients of their Daubechies-8 wavelets decomposed, compressed to m coefficients and quantified versions, w 0 and the w bin(i) s are the weights to compute, and the bucketing function bin() groups these latters according to the J resolution levels, such as (2.2) bin(i) = log 2 (i) with i = 1,..., 2 J 1. (2.3) 2.2 The fast feature vector querying algorithm In order to optimize the pseudo-metric computation process, we introduce two search arrays denoted by Θ + and Θ as defined in [18]. After the creation of these latters and the weights w 0 and {w j } J 1 j=0 computation, the retrieval procedure of a query feature vector Q in the featurebase of feature vectors T k (k = 1,..., DB ), where DB denotes the featurebase size, is defined as follows Procedure Retrieval(Q: array [1..2 J ] of reals, m : integer,θ,θ + ) Q FastDaubechiesWaveletsDecomposition(Q) Initialize Score[k] = 0, for each k {1,..., DB } For each k {1,..., DB } do Score[position of T k in the (DB)] = w 0 Q[0] T k [0] end for Q c Compress( Q,m) 6

7 Q c q Quantify( Q c ) For each Q c q[i] 0 do If Q c q[i] > 0 then List Θ + [i] Else List Θ [i] End if for each l of List do End for Score[position of l in the (DB)] = Score[position of l in the (DB)] - w bin(i) End for Return Score End procedure This procedure returns an array Score such that Score[k] = Q, T k for each k {1,..., DB }. The array Score elements which are the similarity degrees between the query Q and the featurebase feature vectors T k (k {1,..., DB }), can be negative or positive. The most negative similarity degree corresponds to the closest target to the query Q. 3 Pseudo-metric weight adjustment In this section, we briefly describe the pseudo-metric weight adjustment using the classical logistic regression model, while showing the limitations of this latter and that the variational method based Bayesian logistic regression model is more appropriate for the pseudo-metric weight computation. Then, we detail the Bayesian logistic regression model based on a variational method and we present the weight computation algorithm. Finally, we describe the data training performed for both models. 3.1 The classical logistic regression model The weights w 0 and {w k } J 1 k=0 are adjusted such a way that the pseudo-metric should be efficient enough to match similar feature vectors as well as being able to discriminate dissimilar ones. The weight w 0 represents the relative relevance of the scaling factor differences of the featurebase feature vectors, and 7

8 a weight w k represents the relative relevance of the mismatches of the k-th resolution level coefficients of these vectors. We define two classes, the relevance class denoted by Ω 0 and the irrelevance class denoted by Ω 1, in order to classify the feature vector pairs as similar or dissimilar. In the classical logistic regression model, each feature vector pair is represented by an explanatory vector and a binary target variable. Specifically, for the i-th feature vector pair we have { 1 case of dissimilarity, S i = 0 case of similarity, X i = ( X 0,i, X 0,i,..., X J 1,i, 1) R J {1} : explanatory vector, where X 0,i is the absolute value of the difference between the scaling factors of the Daubechies-8 wavelets decomposed, compressed and quantified versions of the two feature vectors, {X k,i } J 1 k=0 are the numbers of mismatches between the J resolution level coefficients of these latters, and S i is the binary target variable, it s either 0 or 1, depending on whether or not the two feature vectors are intended to be similar. We suppose that we have n 0 pairs of similar feature vectors and n 1 pairs of dissimilar ones. Thus, the class Ω 0 contains n 0 explanatory vectors and their associated binary target variables {X i, S i = 0} n 0 i=1 to represent the pairs of the similar feature vectors, and the class Ω 1 contains n 1 explanatory vectors and their associated binary target variables {X j, S j = 1} n 1 j=1 to represent the pairs of the dissimilar feature vectors. Given the explanatory vector X i, we model the binary target variable S i by an irrelevance probability defined as follows where F (x) is given by and ) J 1 p i = P (S i = 1 X i = F ( w 0 X0,i + w k X k,i + v), (3.1) F (x) = k=0 ex 1 + e x, (3.2) ) J 1 p i = 1 p i = P (S i = 0 X i = F ( w 0 X0,i w k X k,i v), (3.3) where v is an unknown intercept and it will be considered as a weight when we will compute the pseudometric weights, but it will be not considered when we will use the pseudo-metric for feature vector comparison, since it is a constant for all query-target pairs and F ( x) is a decreasing function. The weights and the intercept are then determined using maximum likelihood estimation, i.e. they optimize the probability of the actual configuration occurring. More precisely, if we look up the relevance and irrelevance class explanatory vectors and their associated binary target variable values and use the equation (3.1) to compute the probabilities p i s, then the weights and the intercept are chosen to optimize the k=0 8

9 following conditional log-likelihood log ( L( w 0, w 0,..., w J 1, v) ) = = n ( (1 Sd )log(1 p d ) + S d log(p d ) ) (3.4) d=1 n 0 n 1 log(1 p i ) + log(p j ), (3.5) i=1 j=1 where n = (n 0 +n 1 ). Standard optimization algorithms such as the Fisher scoring and Newton-Raphson algorithms [5], [4], can be invoked to maximize the likelihood values for the weights and the intercept. However, according to [19], [7] and [4], in several cases, especially because of the exponential in the likelihood function or because this latter contains several local maxima or because of the existence of many zero explanatory vectors, these standard optimization algorithms converge to local maxima instead of absolute ones and may even have difficulties to converge. Thus, the maximum likelihood can fail and estimates of the parameters of interest (weights and intercept) may not be optimal or may not exist or may be on the boundary of the parameter space. These problems can be solved by using a Bayesian logistic regression model instead of the classical one. In fact, in the Bayesian logistic regression framework, there are three main components which are a chosen prior distribution over the parameters of interest, the likelihood function and the posterior distribution. These three components are formally combined by Bayes rule [1]. The posterior distribution contains all the available knowledge about the parameters of interest in the model. Therefore, in the Bayesian logistic regression model, the estimates of the parameters of interests which are considered as random variables, represent the components of the posterior distribution mean or mode [7]. Thus, the problems of nonexistence and overflow of the maximum likelihood routine estimates can be solved by smoothing these latters by assuming a certain prior distribution for the parameters of interest [7], and the problem of the non-optimality of these estimates can be solved by choosing a posterior distribution having a unique mode. In the literature, there have been many priors having different distributional forms chosen for different applications based on the Bayesian logistic regression. Examples include the Dirichlet prior, Jeffrey s prior, Uniform prior and Normal prior. Dirichlet prior was chosen for the log-linear analysis of sparse frequency tables in [7]. In fact, in this application the likelihood function is a multinomial density function which is conjugate of the Dirichlet prior and therefore the posterior distribution has an analytically tractable Dirichlet form. This has the effect of smoothing the estimates to a specific model [5]. Jeffrey s prior is based on a structural rule and has a good theoretical justification [7], [13]. However, in larger problems when the number of parameters of interest is large, it is difficult to apply because of its computational intensity. Using Uniform priors leads to the classical logistic regression model [13], [7]. In this case we will face again the problems of the nonexistence and overflow of the maximum likelihood function estimates. Normal prior has become popular in the logit modelling [7], [16], [19], [9], [17]. It has the advantage of having low computational intensity and of smoothing the estimates toward a fixed mean and away from unreasonable extremes. However, when the likelihood function is not conjugate of the gaussian prior, the posterior distribution has no tractable form, and then its mean and mode computations become inaccurate or expensive in term of computational intensity. For example, F. Galindo-Garre et al. [7] used a modified Newton-Raphson algorithm to compute the posterior mode, and then used the Markov Chain Monte Carlo method of sam- 9

10 pling to compute the posterior mean, which has a high computational complexity, R. Weiss et al. [19] used the maximum likelihood estimate and an asymptotic covariance matrix in order to approximate the posterior mean, which is an inaccurate approximation. According to [24], it s possible to use accurate variational transformations and the Jensen s inequality in order to approximate the likelihood function with a simpler tractable exponential form. In this case, thanks to the conjugacy, with a gaussian prior distribution over the parameters of interest combined with the likelihood approximation, we obtain a closed gaussian form approximation to the posterior distribution. This gaussian form is called the variational posterior approximation and is considered as a proper distribution and as an adjustable lower bound on the posterior distribution. The variational posterior approximation mean is easily calculated by an iterative algorithm based on the Bayesian learning, while using the EM algorithm to maximize the variational posterior approximation with respect to the variational parameters, in order to have optimal approximation. The iterative algorithm has low computational complexity and converges rapidly. Also, in this Bayesian logistic regression model which is detailed in the next subsection, explanatory vectors of the relevance and irrelevance classes are not observed but instead are distributed according to specific distributions, and then their prior knowledge is incorporated in order to optimize the computation of the mean of the variational posterior approximation. 3.2 The Bayesian logistic regression model Let us denote the random vectors whose realizations represent the explanatory vectors {X i } n 0 i=1 of the relevance class Ω 0 and the explanatory vectors {X j } n 1 j=1 of the irrelevance class Ω 1, by X 0 = ( X 0,0, X 0,0,..., X J 1,0, 1) and X 1 = ( X 0,1, X 0,1,..., X J 1,1, 1), respectively. We suppose that X 0 q 0 (X 0 ) and X 1 q 1 (X 1 ), where q 0 and q 1 are two chosen distributions. For X 0 we associate a binary random variable S 0 whose realizations are the target variables {S i = 0} n 0 i=1, and for X 1 we associate a binary random variable S 1 whose realizations are the target variables {S j = 1} n 1 j=1. We set S 0 equal to 0 for similarity and we set S 1 equal to 1 for dissimilarity. Parameters of interest (weights and intercept) are considered as random variables and are denoted by the random vector W = ( w 0, w 0,..., w J 1, v). We assume that W π(w), where π is a gaussian prior with prior mean µ and prior covariance matrix Σ. Using Bayes rule, the posterior distribution over W is given by where P (S 0 = 0, S 1 = 1 W) = P (W S 0 = 0, S 1 = 1) = P (S 0 = 0, S 1 = 1 W)π(W), (3.6) P (S 0 = 0, S 1 = 1) = = 1 P (S i = i W), x 0 Ω 0,x 1 Ω 1 x 0 Ω 0,x 1 Ω 1 1 P (S i = i, X i = x i, W) (π(w)) 2, 1 10 [ ] P (Si = i, X i = x i, W) P (W, X P (W, X i = x i )π(w) i = x i ).

11 Since in the Bayesian approach we generally suppose that the space of unknown parameters is independent from the space of observations, we assume that W and X i are independent for each i {0, 1}, and then the joint probability P (W, X i = x i ) = π(w)q i (X i = x i ) for each i {0, 1}. So we obtain P (S 0 = 0, S 1 = 1 W) = = 1 x 0 Ω 0,x 1 Ω 1 x 0 Ω 0,x 1 Ω 1 [ ] P (Si = i, X i = x i, W) π(w)q (π(w)) 2 i (X q i (X i = x i ) i = x i ) 1 P (S i = i X i = x i, W)q i (X i = x i ), where P (S i = i X i = x i, W) = F ((2i 1)W t x i ) for each i {0, 1}. Therefore [ 1 x P (W S 0 = 0, S 1 = 1) = 0 Ω 0,x 1 Ω 1 P (S i = i X i = x i, W)q i (X i = x i ) ] π(w), (3.7) P (S 0 = 0, S 1 = 1) where [ P (S 0 = 0, S 1 = 1) = x 0 Ω 0,x 1 Ω 1 1 P (S i = i X i = x i, W)q i (X i = x i ) ] π(w)dw. (3.8) The computation of the posterior distribution P (W S 0 = 0, S 1 = 1) is intractable. For this reason, we can approximate it by a variational posterior approximation having a gaussian form whose mean and covariance matrix computation is feasible. To obtain this variational posterior approximation, we perform two successive approximations to the posterior distribution nominator term [ x 0 Ω 0,x 1 Ω 1 1 P (S i = i X i = x i, W)q i (X i = x i ) ] in the formula (3.7), in order to bound it by an exponential form which is conjugate of the gaussian prior π(w). First approximation: This first approximation is based on a variational transformation of the sigmoid function F (x) of the logistic regression. Let us consider the logarithm of this latter log(f (x)) = log(1 + e x ) = x 2 log(e x 2 + e x 2 ). (3.9) The function g(x) = log(e x 2 + e x x 2 ) = log(e e x 2 2 ) is a convex function in the variable x 2 as its second derivative in x 2 is strictly positive. The tangent line to the convex function g(x) in x 2 is a global lower bound for this latter. Consequently, we can bound the function g(x) globally with a first order Taylor expansion in the variable x 2 and at the neighborhood of a variational parameter ɛ > 0. Explicitly g(x) g(ɛ) + (x 2 ɛ 2 ) g(x) x 2 (ɛ2 ), (3.10) = ɛ 2 + log(f (ɛ)) + 1 4ɛ (x2 ɛ 2 )tanh( ɛ ), (3.11) 2 11

12 where tanh( ɛ ) = e 2 ɛ e 2 ɛ. So we obtain 2 e 2 ɛ +e 2 ɛ log(f (x)) log(f (ɛ)) + 1 4ɛ (x2 ɛ 2 )tanh( ɛ ). (3.12) 2 Substituting x by H i = (2i 1)W t x i and ɛ by ɛ i and exponentiating yields the following desired variational approximation to the sigmoid function P (S i = i X i = x i, W) = F (H i ), (3.13) [ ( ) ] (H i ɛ i ) ϕ(ɛ 2 i ) Hi 2 ɛ2 i F (ɛ i )e = P (Si = i Xi = xi, W, ɛi), where ϕ(ɛ i ) = tanh( ɛ i 2 ) 4ɛ i. So the posterior distribution nominator in the formula (3.7) can be approximated as follows [ x 0 Ω 0,x 1 Ω 1 1 P (S i = i X i = x i, W)q i (X i = x i ) ] π(w) (3.14) [ x 0 Ω 0,x 1 Ω 1 1 P (S i = i X i = x i, W, ɛ i )q i (X i = x i ) ] π(w). Second approximation: In order to approximate the term [ x 0 Ω 0,x 1 Ω 1 1 P (S i = i X i = x i, W)q i (X i = x i ) ] by an exponential form, the first approximation is insufficient. For this reason, we perform a second approximation. This latter is based on the Jensen s inequality which uses the convexity of the function e x. Using the Jensen s inequality, we obtain x 0 Ω 0,x 1 Ω 1 [ 1 P (S i = i X i = x i, W, ɛ i )q i (X i = x i ) ] = = [ 1 [ 1 [ 1 ] F (ɛ i ) ] F (ɛ i ) ] F (ɛ i ) e x 0 Ω 0,x 1 Ω 1 x 0 Ω 0,x 1 Ω 1 [ 1 [ E q i [H i ] ɛ i 2 [ [ [ 1 (Hi ( ɛ i ) ϕ(ɛ 2 i ) e = P ( W S 0 = 0, S 1 = 1, { ɛ i, { q i ), 12 H 2 i ɛ2 i)] ] 1 ] q i (X i = x i ), [ [ [ 1 (Hi ( )] ] ɛ i ) ϕ(ɛ 2 i ] ) H i 2 ɛ2 1 i q i(x i =x i ) e, ] [ ( )] ] 1 ϕ(ɛ i ) E qi [Hi 2] ɛ2 i,

13 where E q0 and E q1 are the expectations with respect to the distributions q 0 and q 1, respectively. Finally, thanks to the two above approximations, the posterior distribution numerator in the formula (3.7) can be approximated as follows [ x 0 Ω 0,x 1 Ω 1 1 P (S i = i X i = x i, W)q i (X i = x i ) ] π(w) P ( W S 0 = 0, S 1 = 1, { ɛ i, { q i ) π(w). Thus, the variational posterior approximation is given by P ( W S 0 = 0, S 1 = 1, { ɛ i, { P q i ) ( W S 0 = 0, S 1 = 1, { ɛ i, { ) q = i π(w). P (S 0 = 0, S 1 = 1) Since P (S 0 = 0, S 1 = 1) is a constant which doesn t affect the form of the variational posterior approximation, we can ignore it and then we obtain P ( W S 0 = 0, S 1 = 1, { ɛ i, { q i ) P ( W S0 = 0, S 1 = 1, { ɛ i, { q i ) π(w). Finally, the posterior distribution is approximated as follows P (W S 0 = 0, S 1 = 1) P ( W S 0 = 0, S 1 = 1, { ɛ i, { q i ), (3.15) P ( W S 0 = 0, S 1 = 1, { ɛ i, { q i ) π(w). (3.16) Since π(w) is a gaussian which is conjugate of P ( W S 0 = 0, S 1 = 1, { ɛ i, { q i ) which has an exponential form, the variational posterior approximation is a gaussian with a posterior mean µ post and a posterior covariance matrix Σ post. Substituting π(w) and P ( W S 0 = 0, S 1 = 1, { ɛ i, { q i ) by their gaussian forms in the formula (3.16), we obtain e 1 2 (W µpost)t Σ 1 post (W µpost) P(W S 0 = 0, S 1 = 1, { ɛ i, { q i )e 1 2 (W µ)t Σ 1 (W µ). Thus, omitting the algebra, Σ post and µ post are given by the following Bayesian update equations (Σ post ) 1 = (Σ) [ ϕ(ɛi )E qi [x i (x i ) t ] ], (3.17) µ post = Σ post [(Σ) 1 µ + 1 [ 1 (i 2 )E q i [x i ] ] ]. (3.18) According to equation (3.17), Σ post depends on the variational parameters { ɛ i. For this reason, we have to specify these latters. We have to find the values of { ɛ i that yields a tight lower bound in 13

14 equation (3.15) and then an optimal approximation to the posterior distribution. This can be done by an EM algorithm which is derived in the Appendix A. The variational parameters are given by ɛ 2 i = E ( ) old, { ) P W S 0 =0,S 1 =1,({ [ E qi [(W t x i ) 2 ] ] (3.19) ɛ i q i ] = E qi [(x i ) t Σ post x i ] + (µ post ) [E t qi [x i (x i ) t ] µ post, i {0, 1}, where P ( W S 0 = 0, S 1 = 1, ({ ɛ i ) old, { qi ) is the variational posterior approximation based on the previous values of { ɛ i. The weight and intercept computation algorithm is in two phases. The first phase is the initialization and the second phase is iterative and allows the computation of Σ post and µ post through the Bayesian update equations (3.17) and (3.18), respectively, while using the equation (3.19), in order to find the variational parameters at each iteration. The values of µ post components are the desired estimates of the pseudo-metric weights w 0 and {w k } J 1 k=0 and the intercept v. Initialization: 1. Compute the parameters of the distributions q 0 and q 1 which model the relevance class Ω 0 and irrelevance class Ω 1 explanatory vectors, respectively. 2. Initialize the covariance matrix Σ old to the identity matrix and the mean µ old to a vector with components equal to Initialize the variational parameters as follows For each i {0, 1} do ] ɛ 2 i E qi [(x i ) t Σ old x i ] + (µ old ) [E t qi [x i (x i ) t ] End for Computation of Σ post and µ post : µ old 1. Do (Σ new post) 1 (Σ old ) [ ϕ(ɛi )E qi [x i (x i ) t ] ] [ µ new post Σ new post (Σ old ) 1 µ old + For each i {0, 1} do 1 [ 1 (i 2 )E q i [x i ] ] ] ɛ 2 i E qi [(x i ) t Σ new postx i ] + (µ new post) t [E qi [x i (x i ) t ] ] µ new post 14

15 End for While( Σ old Σ new post > threshold or µ old µ new post > threshold) Return µ new post 2. Affect the µ post component values to the pseudo-metric weights w 0 and {w k } J 1 k=0 and to the intercept v. 3.3 Training Let us consider a color image database which consists of several color image sets such that each set contains color images which are perceptually close to each other in terms of object shapes and colors. In order to compute the pseudo-metric weights and the intercept by the classical logistic regression model, we have to create the relevance class Ω 0 and the irrelevance class Ω 1. To create Ω 0, we draw all possible pairs of feature vectors representing color images belonging to the same database color image sets, and for each pair we compute an explanatory vector and we associate to this latter a binary target variable equal to 0. For example, for the i-th drawn feature vector we compute an explanatory vector X i and we associate to this latter an S i = 0. Similarly, to create Ω 1, we draw all possible pairs of feature vectors representing color images belonging to different database color image sets, and for each pair we compute an explanatory vector and we associate to this latter a binary target variable equal to 1. For example, for the j-th drawn feature vector we compute an explanatory vector X j and we associate to this latter an S j = 1. For the Bayesian logistic regression model, we create the Ω 0 and Ω 1 with the same way, but instead of associating a binary target variable value to each explanatory vector of Ω 0 and Ω 1, we associate a binary target variable S 0 equal to 0 to all Ω 0 explanatory vectors and we associate a binary target variable S 1 equal to 1 to all Ω 1 explanatory vectors. 4 Color image retrieval method The querying method is in two phases. The first phase is a preprocessing phase done once for the entire database containing DB color images. The second phase is the querying phase. 4.1 Color image database preprocessing We detail the preprocessing phase done once for all the database color images before the querying in a general case by the following steps. 1. Choose N feature vectors for comparison. 2. Compute the N feature vectors T li (l {1,..., N}) for each i-th color image of the database, where i {1,..., DB }. 3. The feature vectors representing the database color images are Daubechies-8 wavelets decomposed, compressed to m coefficients each and quantified. 15

16 4. Organize ( the decomposed, ) compressed and quantified feature vectors into search arrays Θ l + and Θ l l = 1,..., N. 5. Adjustment of the metric weights w l 0 and {w l k }J 1 k=0 for each featurebase T li (i = 1,..., DB ) representing the database color images, where l {1,..., N}. 4.2 The querying algorithm We detail the querying algorithm in a general case by the following steps. 1. Given a query color image, we denote the feature vectors representing the query image by Q l ( l = 1,..., N ). 2. The feature vectors representing the query image are Daubechies-8 wavelets decomposed, compressed to m coefficients each and quantified. 3. The similarity degrees between Q l ( l = 1,..., N ) and the database color image feature vectors T li ( l = 1,..., N ) (i = 1,..., DB ) are represented by the arrays Scorel ( l = 1,..., N ) such that Score l [i] = Q l, T li for each i {1,..., DB }. These arrays are returned by the procedure Retrieval(Q l, m, Θ l +, Θ l ) ( l = 1,..., N ), respectively. 4. The similarity degrees between the query color image and the database color images are represented by a resulted array T otalscore, such as, T otalscore[i] = N l=1 γ lscore l [i] for each i {1,..., DB }, where {γ l } N l=1 are weightfactors used to fine-tune the influence of each individual feature. 5. Organize the database color images in order of increasing resulted similarity degrees of the array T otalscore. The most negative resulted similarity degrees correspond to the closest target images to the query image. Finally, return to the user the closest target color images to the query color image and whose number is denoted by RI and chosen by the user. 5 Experimental results In this section, we will present the decontextualized classification performance evaluation and comparison of the Bayesian logistic regression model and the classical logistic regression one. Then, we will perform an evaluation and comparison of both models in the image retrieval context. 5.1 Decontextualized classification performance evaluation of both models The aim of the current section is to evaluate and to compare the classification performances of the Bayesian logistic regression (BLR) model and the classical logistic regression (CLR) one. To achieve this, we use a classification accuracy used in [14], and a cumulative distance error. Given K-dimensional 16

17 success class and failure class having Np s points {x i = (x i 1,..., x i s K )}N p i=1 and N p f points {y j = (x j 1,..., x j f K )}N p j=1, respectively, in the CLR model we associate to each point x i of the success class a binary target variable S i = 1, and to each point y j of the failure class a binary target variable S j = 0, while in the BLR model we associate to all of the success class points a binary target variable S 1 = 1, and to all of the failure class points a binary target variable S 0 = 0, and then we model the failure class and the success class by the distributions q 0 and q 1, respectively. The success classification accuracy (SCA) is defined as the percentage of the points correctly classified in the success class. To verify if a point x i is correctly classified or and v are respectively the weights and the intercept to adjust. Precisely, if P i > 0.5, x i is considered correctly classified and otherwise it is considered incorrectly classified. The failure classification accuracy (FCA) is defined as the success one, but it is computed over the failure class and the verification of a point y j if it is correctly not, we compute its success probability P i = F ( K k=1 w kx i k + v), where {w k} K k=1 classified or not is performed by computing its failure probability P j = F ( K k=1 w ky j k v). Precisely, if P j > 0.5, y j is considered correctly classified and otherwise it is considered incorrectly classified. The success and failure cumulative distance errors are denoted by D E and D E, respectively, and are defined as follows D E = N s p i=1 (P i 1) 2, Np s D E = N f p j=1 ( P j 1) 2. Thus, for both classes, we compute the failure and success classification accuracies and the failure and success cumulative distance errors, using weights and intercept adjusted by the CLR model, and then we compute them again using weights and intercept adjusted by the BLR model. The model whose weights and intercept lead to greater failure and success classification accuracies and to lower failure and success cumulative distance errors, is supposed to be better in term of classification performance. The validation of both models and the computation of the classification accuracies and the cumulative distance errors, are performed on synthetic data and on two extracted data sets from the well-known Iris data [6]. The synthetic data consists of two groups of artificial data sets. The first group which is labelled synthetic data 1 and shown in Figure 5.1, is a collection of three two-dimensional data sets. Each set has two clusters with a total of 1000 points and generated from two Gaussians, 500 points per Gaussian. The first set two clusters are well separated, while the second set two clusters are slightly overlapped and the third set ones are overlapped. The overlap between two clusters is controlled by moving a cluster towards the other after translating its mean. The second group which is labelled synthetic data 2 and shown in Figure 5.2, is a collection of three three-dimensional data sets which are generated with the same way as the synthetic data 1, but in higher dimension. Iris data consists of 50 samples for each of the three classes present in the data, Iris Versicolor, Iris Virginica and Iris Setosa. Each datum is four-dimensional and consists of measures of the plants morphology. The overlap here is very high between the Versicolor class and the Virginica class, while the Setosa class is well separated. In order to allow the validation of the BLR and CLR models, we extract two data sets from the Iris data. 17 N f p

18 The first set is shown in Figure 5.3(b) and consists of the Setosa class and the Virginica class, and the second set is shown in Figure 5.3(c) and consists of the classes Versicolor and Virginica. In order to ease the representation, both sets are represented in a three-dimensional space after discarding the forth dimension. For each data set of the synthetic data 1 and synthetic data 2, we consider the cluster which is closer to the reference system origin as failure class, while the other cluster is considered as success class. For the two extracted data sets from the Iris data, we consider the Virginica class as the failure class in the first data set, while the Setosa class is considered as the success class, and we consider the Versicolor class as the failure class in the second data set, while the Virginica class is considered as the success class. The computation of the weights and the intercept by the CLR model for each pair of failure and success classes, is performed by optimizing the likelihood function using the Fisher scoring algorithm. Before the computation of the weights and the intercept by the BLE model for each data set of the synthetic data 1, the distributions q 0 and q 1 are chosen as two-dimensional Gaussians whose parameters are computed directly from the failure and success class points, while for each data set of the synthetic data 2, the distributions q 0 and q 1 are chosen as three-dimensional Gaussians whose parameters are also computed directly from the failure and success class points. For each data set extracted from the Iris data, the distributions q 0 and q 1 are chosen as three-dimensional Gaussians whose parameters are computed by the expectation-maximization (EM) algorithm [2]. The Table 5.1, Table 5.2 and Table 5.3 illustrate the different failure and success classification accuracies and failure and success cumulative distance errors computed for the synthetic data 1, synthetic data 2 and the data sets extracted from the Iris data, using weights and intercepts adjusted by the CLR model and the BLR model. (a) (b) (c) Figure 5.1: Synthetic data 1: (a) well separated classes, (b) slightly overlapped classes and (c) overlapped classes. 18

19 (a) (b) (c) Figure 5.2: Synthetic data 2: (a) well separated classes, (b) slightly overlapped classes and (c) overlapped classes. (a) (b) (c) Figure 5.3: Extracted data sets from the Iris data: (a) Iris data represented in a three-dimensional space, (b) the Virginica class and the Setosa class and (c) the Virginica class and the Versicolor class. Well separated classes Slightly overlapped classes Overlapped classes SCA (CLR model) 75% 71% 67% FCA (CLR model) 74% 68% 64% D E (CLR model) D E (CLR model) SCA (BLR model) 100% 99% 97% FCA (BLR model) 100% 99% 96% D E (BLR model) D E (BLR model) Table 5.1: Comparison between the classification performances of the CLR model and the BLR model which are validated on the synthetic data 1. 19

20 Well separated classes Slightly overlapped classes Overlapped classes SCA (CLR model) 73% 69% 65% FCA (CLR model) 71% 67% 61% D E (CLR model) D E (CLR model) SCA (BLR model) 100% 98% 95% FCA (BLR model) 100% 97% 95% D E (BLR model) D E (BLR model) Table 5.2: Comparison between the classification performances of the CLR model and the BLR model which are validated on the synthetic data 2. Setosa-Virginica Versicolor-Virginica SCA (CLR model) 70% 46% FCA (CLR model) 69% 41% D E (CLR model) D E (CLR model) SCA (BLR model) 100% 72% FCA (BLR model) 99% 69% D E (BLR model) D E (BLR model) Table 5.3: Comparison between the classification performances of the CLR model and the BLR model which are validated on the extracted data sets from the Iris data. 20

21 If we analyze the Table 5.1, we can see that for the well separated classes, the average of the SCA (BLR model) and the FCA (BLR model) drops from 100% to 74.5% which is the average of the SCA (CLR model) and the FCA (CLR model), while the average of the D E (BLR model) and the D E (BLR model) grows from to which is the average of the D E (CLR model) and the D E (CLR model). Then, for the slightly overlapped classes, the average of the SCA (BLR model) and the FCA (BLR model) drops from 99% to 69.5% which is the average of the SCA (CLR model) and the FCA (CLR model), while the average of the D E (BLR model) and the D E (BLR model) grows from to 0.2 which is the average of the D E (CLR model) and the D E (CLR model). Finally, for the overlapped classes, the average of the SCA (BLR model) and the FCA (BLR model) drops from 96.5% to 65.5% which is the average of the SCA (CLR model) and the FCA (CLR model), while the average of the D E (BLR model) and the D E (BLR model) grows from 0.13 to which is the average of the D E (CLR model) and the D E (CLR model). In Table 5.2, for each data set, the average of the SCA (BLR model) and FCA (BLR model) drops to the CLR model one, and the average of the D E (BLR model) and D E (BLR model) grows to the CLR model one, with almost the same rates as in Table 5.1. Moreover, comparing Table 5.1 and Table 5.2, we notice that for each data set, the former SCA (BLR model) and FCA (BLR model) average is slightly higher and D E (BLR model) and D E (BLR model) average is slightly lower, which is also the case for the CLR model. In Table 5.3, we can see that for the Setosa and Virginica classes, the average of the SCA (BLR model) and the FCA (BLR model) drops from 99.5% to 69.5% which is the average of the SCA (CLR model) and the FCA (CLR model), while the average of the D E (BLR model) and the D E (BLR model) grows from to 0.16 which is the average of the D E (CLR model) and the D E (CLR model). Then, for the Versicolor and Virginica classes, the average of the SCA (BLR model) and the FCA (BLR model) drops from 70.5% to 43.5% which is the average of the SCA (CLR model) and the FCA (CLR model), while the average of the D E (BLR model) and the D E (BLR model) grows from 0.18 to 0.36 which is the average of the D E (CLR model) and the D E (CLR model). Moreover, comparing Table 5.3 Setosa-Virginica and Table 5.2 Well separated classes, we notice that the former SCA (BLR model) and the FCA (BLR model) average is slightly lower and D E (BLR model) and D E (BLR model) average is slightly higher. In the three tables, we notice that the average of the SCA (BLR model) and the FCA (BLR model) decreases and the average of the D E (BLR model) and the D E (BLR model) increases, as the overlap between classes becomes higher, which is also the case for the CLR model. Finally, from our tables, we can draw three main conclusions. First, we conclude that the BLR model is significantly better than the CLR model in term of classification performance. Second, we conclude that the BLR model classification performance decreases as the overlap between classes becomes higher, and it is slightly degraded as the dimension increases. Third, we conclude that the validation of the BLR model on real data (Iris data) has negligible effect on its classification performance. 5.2 Contextualized evaluation and comparison of both models In this subsection, we will briefly present the feature vectors that we will use for color image representation. Then, we will discuss the choices of the distributions q 0 and q 1, in order to validate the Bayesian logistic regression model in the image retrieval context. Finally, we will use the precision and scope as defined in [12], to evaluate the querying method using both models separately. The choices of the distri- 21

22 butions q 0 and q 1 and the querying evaluation will be conducted in the WANG and ZuBuD color image databases proposed by [23]. The WANG database is a subset of the Corel database of DB = 1000 color images which were selected manually to form 10 sets (e.g. Africa, beach, ruins, food) of 100 images each. The Zurich Building Image Database (ZuBuD) was created by the Swiss Federal Institute of Technology in Zurich and it contains a training part of DB = 1005 color images and query part of 115 color images. The training part consists of 201 building image sets, where each set contains 5 color images of the same building taken from different positions using different cameras under different lighting conditions. Before the feature vector extractions, we represent the WANG and ZuBuD database color images in the perceptually uniform LAB color space Used feature vectors Luminance histogram and weighted histograms that were described in detail by [18], are used for image color description in this paper. Moreover, image texture description is performed by kurtosis and skewness histograms [10]. Given an M N pixel LAB color image, its luminance histogram h L contains the number of pixels of the luminance L, and can be written as follows h L (c) = M 1 N 1 j=0 δ(i L (i, j) c) for each c {0,..., 255}, (5.1) where I L is the luminance image and δ is the Kronecker symbol at 0. The weighted histograms are the color histogram constructed after edge region elimination and the multispectral gradient module mean histogram. The former is given by and the latter is given by h h k(c) = M 1 N 1 j=0 ) δ(i k (i, j) c)χ [0,η] (λ max (i, j), (5.2) h e k(c) = where N p,k (c) is the number of the edge region pixels and is defined as and h e k(c) = N p,k (c) = M 1 M 1 N 1 j=0 N 1 j=0 he k (c) N p,k (c), (5.3) ) δ(i k (i, j) c)χ ]η,+ [ (λ max (i, j), (5.4) ) δ(i k (i, j) c)λ max (i, j) χ ]η,+ [ (λ max (i, j), (5.5) for each c {0,..., 255} and k = a, b, where λ max represents the multispectral gradient module, η is a threshold defined by the mean of the multispectral gradient modules computed over all image pixels, 22

23 I a and I b are the images of the chrominances a red/green and b yellow/blue, respectively, and χ is the characteristic function. The multispectral gradient module mean histogram provides information about the overall contrast in the chrominance and the edge region elimination allows the avoidance of overlappings or noises between the color histogram populations caused by the edge pixels. The LAB color image kurtosis and skewness histograms are given by and h κ k(c) = h s k(c) = M 1 N 1 j=0 M 1 N 1 j=0 δ(i κ k (i, j) c), (5.6) δ(i s k(i, j) c), (5.7) respectively, for each c {0,..., 255} and k = L, a, b, where IL κ, Iκ a and Ib κ are the kurtosis images of the luminance L and the chrominances a and b, respectively, and IL s, Is a and Ib s are the skewness images of these latters. They are obtained by local computations of the kurtosis and skewness values at the luminance and chrominance image pixels. Then, a linear interpolation is used to represent the kurtosis and skewness values between 0 and 255. Since each used feature vector is a histogram having 256 components, we set J equal to 8 in the following subsections The choices of q 0 and q 1 Since from each color image of the WANG and ZuBuD databases we extract N = 11 feature vectors which are the histograms h L, h h a, h h b, h e a, h e b, hκ L, hκ a, h κ b, hs L, hs a and h s b given by (5.1), (5.2), (5.3), (5.6) and (5.7) respectively, each database is represented by eleven histogram featurebases. The choices of q 0 and q 1 will be separately performed for each of these latters. According to the training, there are no mutual direct effects between the realizations of the random variable X 0,0 and the realizations of the random vector (X 0,0,..., X J 1,0 ). Therefore, we assume that X 0,0 and (X 0,0,..., X J 1,0 ) are independent and then q 0 is constructed as a product of two independent distributions q 0,0 and q 0,1, i.e. q 0 ( X 0,0, X 0,0,..., X J 1,0 ) = q 0,0 ( X 0,0 )q 0,1 (X 0,0,..., X J 1,0 ). Analogically, for the same reason, we assume that q 1 ( X 0,1, X 0,1,..., X J 1,1 ) = q 1,0 ( X 0,1 )q 1,1 (X 0,1,..., X J 1,1 ), where q 1,0 and q 1,1 are two independent distributions. For each histogram featurebase, the realizations of each random variable of the random vector (X 0,0,..., X J 1,0 ) are positive integers. For this reason, taking into account that they are not representing frequencies that follow a multinomial distribution, we suppose that they follow a poisson distribution, i.e. after the consideration of the independence between the random vector (X 0,0,..., X J 1,0 ) random variables, we have q 0,1 (X 0,0,..., X J 1,0 ) = X j,0 J 1 λj,0 j=0 X j,0! e λ j,0, where λ j,0 R + for each j {0,..., J 1}. Analogically, with the same assumptions, we have q 1,1 (X 0,1,..., X J 1,1 ) = J 1 j=0 X j,1 λj,1 X j,1! e λ j,1, where λ j,1 R + for each j {0,..., J 1}. The parameters { } J 1 λ j,0 and { } J 1 λ j=0 j,1 j=0 23

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Java Modules for Time Series Analysis

Java Modules for Time Series Analysis Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Classification Problems

Classification Problems Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems

More information

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression

11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil.

Linear Discrimination. Linear Discrimination. Linear Discrimination. Linearly Separable Systems Pairwise Separation. Steven J Zeil. Steven J Zeil Old Dominion Univ. Fall 200 Discriminant-Based Classification Linearly Separable Systems Pairwise Separation 2 Posteriors 3 Logistic Discrimination 2 Discriminant-Based Classification Likelihood-based:

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Lecture 9: Introduction to Pattern Analysis

Lecture 9: Introduction to Pattern Analysis Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Topics Exploratory Data Analysis Summary Statistics Visualization What is data exploration?

More information

3F3: Signal and Pattern Processing

3F3: Signal and Pattern Processing 3F3: Signal and Pattern Processing Lecture 3: Classification Zoubin Ghahramani zoubin@eng.cam.ac.uk Department of Engineering University of Cambridge Lent Term Classification We will represent data by

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Bayes and Naïve Bayes. cs534-machine Learning

Bayes and Naïve Bayes. cs534-machine Learning Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006 More on Mean-shift R.Collins, CSE, PSU Spring 2006 Recall: Kernel Density Estimation Given a set of data samples x i ; i=1...n Convolve with a kernel function H to generate a smooth function f(x) Equivalent

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen

CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen CS 591.03 Introduction to Data Mining Instructor: Abdullah Mueen LECTURE 3: DATA TRANSFORMATION AND DIMENSIONALITY REDUCTION Chapter 3: Data Preprocessing Data Preprocessing: An Overview Data Quality Major

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

Local outlier detection in data forensics: data mining approach to flag unusual schools

Local outlier detection in data forensics: data mining approach to flag unusual schools Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

Introduction to Logistic Regression

Introduction to Logistic Regression OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

More information

Content-Based Image Retrieval

Content-Based Image Retrieval Content-Based Image Retrieval Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image retrieval Searching a large database for images that match a query: What kind

More information

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris Class #6: Non-linear classification ML4Bio 2012 February 17 th, 2012 Quaid Morris 1 Module #: Title of Module 2 Review Overview Linear separability Non-linear classification Linear Support Vector Machines

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

More information

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Predict Influencers in the Social Network

Predict Influencers in the Social Network Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, lyzhou@stanford.edu Department of Electrical Engineering, Stanford University Abstract Given two persons

More information

Cluster Analysis: Advanced Concepts

Cluster Analysis: Advanced Concepts Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means

More information

Introduction to Online Learning Theory

Introduction to Online Learning Theory Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014 Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about

More information

Gamma Distribution Fitting

Gamma Distribution Fitting Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

More information

Machine Learning Logistic Regression

Machine Learning Logistic Regression Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

More information

MACHINE LEARNING IN HIGH ENERGY PHYSICS

MACHINE LEARNING IN HIGH ENERGY PHYSICS MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!

More information

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Content-Based Recommendation

Content-Based Recommendation Content-Based Recommendation Content-based? Item descriptions to identify items that are of particular interest to the user Example Example Comparing with Noncontent based Items User-based CF Searches

More information

Image Segmentation and Registration

Image Segmentation and Registration Image Segmentation and Registration Dr. Christine Tanner (tanner@vision.ee.ethz.ch) Computer Vision Laboratory, ETH Zürich Dr. Verena Kaynig, Machine Learning Laboratory, ETH Zürich Outline Segmentation

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld.

Logistic Regression. Vibhav Gogate The University of Texas at Dallas. Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Logistic Regression Vibhav Gogate The University of Texas at Dallas Some Slides from Carlos Guestrin, Luke Zettlemoyer and Dan Weld. Generative vs. Discriminative Classifiers Want to Learn: h:x Y X features

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

More information

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms

Data Mining. Cluster Analysis: Advanced Concepts and Algorithms Data Mining Cluster Analysis: Advanced Concepts and Algorithms Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 More Clustering Methods Prototype-based clustering Density-based clustering Graph-based

More information

1 Prior Probability and Posterior Probability

1 Prior Probability and Posterior Probability Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard

Algebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Document Image Retrieval using Signatures as Queries

Document Image Retrieval using Signatures as Queries Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

Data Exploration Data Visualization

Data Exploration Data Visualization Data Exploration Data Visualization What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include Helping to select

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

Classification by Pairwise Coupling

Classification by Pairwise Coupling Classification by Pairwise Coupling TREVOR HASTIE * Stanford University and ROBERT TIBSHIRANI t University of Toronto Abstract We discuss a strategy for polychotomous classification that involves estimating

More information

Predict the Popularity of YouTube Videos Using Early View Data

Predict the Popularity of YouTube Videos Using Early View Data 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

CONTENT-BASED IMAGE RETRIEVAL FOR ASSET MANAGEMENT BASED ON WEIGHTED FEATURE AND K-MEANS CLUSTERING

CONTENT-BASED IMAGE RETRIEVAL FOR ASSET MANAGEMENT BASED ON WEIGHTED FEATURE AND K-MEANS CLUSTERING CONTENT-BASED IMAGE RETRIEVAL FOR ASSET MANAGEMENT BASED ON WEIGHTED FEATURE AND K-MEANS CLUSTERING JUMI¹, AGUS HARJOKO 2, AHMAD ASHARI 3 1,2,3 Department of Computer Science and Electronics, Faculty of

More information

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

Simultaneous Gamma Correction and Registration in the Frequency Domain

Simultaneous Gamma Correction and Registration in the Frequency Domain Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong a28wong@uwaterloo.ca William Bishop wdbishop@uwaterloo.ca Department of Electrical and Computer Engineering University

More information

Chapter ML:XI (continued)

Chapter ML:XI (continued) Chapter ML:XI (continued) XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on

Pa8ern Recogni6on. and Machine Learning. Chapter 4: Linear Models for Classifica6on Pa8ern Recogni6on and Machine Learning Chapter 4: Linear Models for Classifica6on Represen'ng the target values for classifica'on If there are only two classes, we typically use a single real valued output

More information

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S. AUTOMATION OF ENERGY DEMAND FORECASTING by Sanzad Siddique, B.S. A Thesis submitted to the Faculty of the Graduate School, Marquette University, in Partial Fulfillment of the Requirements for the Degree

More information

The Optimality of Naive Bayes

The Optimality of Naive Bayes The Optimality of Naive Bayes Harry Zhang Faculty of Computer Science University of New Brunswick Fredericton, New Brunswick, Canada email: hzhang@unbca E3B 5A3 Abstract Naive Bayes is one of the most

More information

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC

Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation

More information