Learning Locally-Adaptive Decision Functions for Person Verification
|
|
|
- Audra Allison
- 10 years ago
- Views:
Transcription
1 Learning Locally-Adaptive Decision Functions for Person Verification Zhen Li UIUC Thomas S. Huang UIUC Shiyu Chang UIUC Liangliang Cao IBM Research Feng Liang UIUC John R. Smith IBM Research Abstract This paper considers the person verification problem in modern surveillance and video retrieval systems. The problem is to identify whether a pair of face or human body images is about the same person, even if the person is not seen before. Traditional methods usually look for a distance (or similarity) measure between images (e.g., by metric learning algorithms), and make decisions based on a fixed threshold. We show that this is nevertheless insufficient and sub-optimal for the verification problem. This paper proposes to learn a decision function for verification that can be viewed as a joint model of a distance metric and a locally adaptive thresholding rule. We further formulate the inference on our decision function as a second-order large-margin regularization problem, and provide an efficient algorithm in its dual from. We evaluate our algorithm on both human body verification and face verification problems. Our method outperforms not only the classical metric learning algorithm including LMNN and ITML, but also the state-of-the-art in the computer vision community.. Introduction Person verification, Are you the person you claim to be, is an important problem with many applications. Modern image retrieval systems often want to verify whether photos contain the same person or the same object. Person verification also gets more and more important for social network websites, where it is highly preferred to correctly assign personal photos to users. More importantly, the huge amount of surveillance cameras - there are more than 30 million surveillance cameras in U. S. recording about 4 billion hours of videos per week, calls for reliable systems which are able to identify the same person across differ- This research was supported in part by a research grant from Chongqing Institute of Green and Inteligent Technology, Chinese Academy of Sciences. Figure. Example images of George W. Bush showing huge intraperson variations. ent videos, a critical task that cannot merely rely on human labors. So developing an automatic verification system is of great interest in practice. There are two main visual clues for person verification: face images and human body figures. Although our human vision system has the amazing ability of performing verification - we can judge whether two faces are about the same person without even seeing that person before, it is difficult to build a computer-based automatic system for this purpose. For a given query image, the person in the image may not appear in the database or has only one or few images in the database. Furthermore, the query image and the other images in the database are rarely collected in exactly the same environment, which leads to huge intra-person variations including viewpoint, lighting condition, image quality, resolution, etc. Figure provides some examples illustrating the difficulties with the person verification problem. We can formally describe the verification problem as follows: for a pair of sample images represented by x, y R d, respectively, each of which corresponds to category label c(x) and c(y), we aim to decide whether they are from the same category, i.e., c(x) =c(y), nor not. Given a set of training samples, our goal is to learn a decision function
2 f(x, y) where { > 0, if c(x) =c(y) f(x, y) () < 0, otherwise. Note that to infer f we need not to know the respective value of c(x) or c(y), which means it has the generalization ability to verify samples from unseen categories. Learning the decision function f for verification is fundamentally different from learning a classifier for traditional machine learning problems. Traditional machine learning algorithms consider individual samples instead of a pair of samples. This paired setup for verification naturally imposes some symmetry constraint for the decision function, i.e., f(x, y) = f(y, x), a constraint seldom seen in ordinary learning algorithms. Most multi-class classifiers, which model the category-specific probability distributions (for generative models) or learn the decision boundaries (for discriminant models), are not appropriate for verification. For verification, of interest is to determine whether a pair of samples is from the same category or not, but not to answer which category/categories they belong to. The ability of dealing with unseen categories is the key for person verification, since most testing samples are from unseen persons which are not in the training pool. Recently, metric learning (ML) approaches [6] have been applied to person verification [, 9, 6, ]. The key idea behind ML is to learn a parametric distance metric between two images x and y, which in most cases take the form of (x y) t M(x y) where M is a semi-positive definite matrix Then one can decide whether x and y are from the same class based on some thresholding rule, i.e., (x y) t M(x y) d. Although ML is very important for many supervised learning applications (e.g. classification) that often deal with complex and high-dimensional features, it has a few limitations particularly in the verification setting. The objective of many ML algorithms is to ensure that samples from the same class be closer to each other than those from different classes. In other words, it enforces a relative ranking constraint between intra-class and inter-class pairs (in terms of pairwise distances), and this is why ML is often tied with the nearest neighbor classifier for a classification task. However, for verification where many test samples might come from unseen classes, nearest neighbor classifiers are not applicable. Then ML only leads to an absolute decision rule with a constant threshold d: f ML (x, y) =d (x y) t M(x y). () This intrinsic mismatch (classification vs. verification, relative ranking vs. absolute discrimination) leaves ML approaches not optimal for verification problems. In Section, after showing the sub-optimality of (), we propose to adjust the decision rule locally, i.e., consider f(x, y) =d(x, y) (x y) t M(x y) where d(x, y) is a function of x, y rather than a constant. As a starting point, we assume d(x, y) takes a simple quadratic form, which leads to our general second-order decision function. In Section 3, we formulate the inference on our second-order decision function as a large-margin regularization problem, and provide an efficient algorithm based on its dual. Further, we can interpret our approach as learning a linear SVM in a new feature space which is symmetric with respect to (x, y). With this new interpretation, our second-order decision function can be easily generalized to decision functions of high-orders by the kernel trick. In Section 4, we evaluated our proposed algorithm on three person verification tasks. We show that in all cases, our method achieves state-of-the-art results in comparison with existing works. Finally, we give the conclusion remarks in Section 5.. Bridging Distance Metric and Local Decision Rules Metric learning (related to feature selection, dimension reduction, or subspace projection, etc) plays a fundamental role in machine learning. It is particularly important for computer vision applications, where the feature representation of images or videos is usually of complex highdimensional form [8, ]. In these cases, the Euclidean norm associated with the original feature space usually does not provide much useful information for the subsequent learning tasks. In most applications we consider here, the sample data are sparse in the high-dimensional feature space. So we focus on metric learning with respect to a global metric, i.e., the matrix M in (), although learning a local metric has attracted an increasing interest in machine learning research. However, metric learning itself is insufficient for verification problem, as discussed in Section. The problem is that after metric learning, we still need to make a decision. As to be shown below, a simple constant threshold in () is sub-optimal, even if the associated metric is correct. A decision rule that can adapt to the local structures of data [9], is the key to achieve good verification performance. To this end, we consider a joint model that bridges a global distance metric and a local decision rule, and we further show the optimality of our method over ML in the verification setting. Consider f(x, y) =d(x, y) (x y) t M(x y) where d(x, y) acts as a local decision rule for a learned metric M. Since the metric itself is quadratic, as a starting point, we also assume d(x, y) takes a simple quadratic form. We will see later in Section 3 that, this formulation leads to a kernelized large-margin learning problem, and thus can be easily generalized to decision functions of high-orders by the kernel trick []. For now, let us focus on the second-order decision rule,
3 i.e., d(x, [ y) = zt Qz ] +w t z +b, where z t =[x t y t ] R d, Qxx Q Q = xy R Q yx Q d d, w t = [ wx t w t yy y] R d, and b R. Due to the symmetry property with respect to x and y, we can rewrite d(x, y) as follows: d(x, y) = xt Ãx + yt Ãy + x t By + c t (x + y)+b = 4 (x y)t (à B)(x y) + 4 (x + y)t (à + B)(x + y) +c t (x + y)+b, (3) where à = Q xx = Q yy and B = Q xy = Q yx are both d d real symmetric matrices (not necessarily positive semidefinite), c = w x = w y is a d-dimensional vector, and b is the bias term. Now we obtain the second-order decision function for verification: f(x, y) = d(x, y) (x y) t M(x y) = xt Ax + yt Ay + x t By +c t (x + y)+b, (4) by letting A B = à B 4M and A + B = à + B. Again, A and B are real symmetric and need not to be PSD. The above decision function has the following desirable properties: Learning globally, acting locally. We bridge a global metric M and a local decision rule using a joint model (4). Interestingly, the number of parameters is at the same order (O(d )) as that of ML. Fully informed decision making. The local decision rule in (3) depends not only on x y, the difference vector usually considered by ML, but also on x + y, which contains orthogonal information of (x, y) that would otherwise be neglected by x y alone. Kernelizable to higher order. As we will see in Section 3, the decision function in (4) leads to a kernelized large-margin learning problem, and thus can be easily generalized to decision functions of higher-orders by the kernel trick []. We now show the optimality of our decision function over ML, by considering a simple case where two categories of samples in R d are linearly separable. We show that in the verification setting, the performance of any given metric is inferior to that of our model, in this simple case. Observation. Given two linearly separable classes, the verification error rate by our second-order decision function (4) is always lower than that by a learned metric with a fixed threshold (). More specifically, in this particular setting, our model can always achieve zero verification error while ML does not. y 0 joint distributions of (x,y) c(x)=c(y) c(x)!=c(y) xy>0. 0 x (a) Joint distribution of (x,y), with zero-error decision function by our model: xy 0. > 0. pdf 0. distributions of Δ = x y c(x)=c(y) c(x)!=c(y) Δ = x y (b) Distribution of Δ=x y in case of metric learning, with finite verification error. Figure. Distributions of same-class pairs (red) vs. different-class pairs (blue). Proof. Suppose the two classes in R d satisfies: w t x+b >0 for class, and w t x + b<0 for class. In verification, we aim to identify if two samples x and y are from the same class or different ones.. We first show that our decision function in (4) always achieves zero verification error. x and y are from the same class if and only if (w t x + b) and (w t y + b) are of the same sign. In other words, we can perfectly identify pairs from the same class vs. those from different classes, by checking the sign of (w t x + b)(w t y + b) =x t (ww t )y + bw t (x + y)+b. This decision function is clearly a special case of (4).. We then show that the ML approach in () does not always achieve zero verification error. Any Mahalanobis distance between x and y can be regarded 3
4 as the Euclidean distance on the space transformed by L, namely, d(x, y) =(x y) t M(x y) =(x y)l t L(x y) = x y, where M = L t L, x = Lx and y = Ly. In this new space, the two classes are still linearly separable, since w t x + b = w t x + b and w = wl (assuming M is full rank). Therefore, in order for ML method in (), or simply x y <d, to achieve zero verification error, the following condition needs to be satisfied: max c(x)=c(y) x y < min c(x) c(y) x y. (5) Unfortunately, the above condition does not always hold. Consider a counter example in -D: class is uniformly distributed in [, 0.5] and class in [0.5, ]. The two classes are indeed separable, but condition (5) is not satisfied since max c(x)=c(y) x y =.5 and min c(x) c(y) x y =. In fact, from Figure (b) we see that ML method ( x y <d) inevitably results in finite verification error, while our model is able to perfectly separate the two types of pairs on the (x, y) space, shown in Figure. A more realistic example that also violates (5) is: face images of the same person but from different poses are usually more dissimilar than those from different person but of the same pose. Figure 3 shows such an example with selected image pairs of the LFW dataset [6]. 3. A Large-Margin Solution with an Efficient Algorithm 3.. A large margin formulation Recall that the objective of a verification problem is to learn a symmetric decision function: f(x, y) :R d R d R that takes a pair of samples x, y R d as inputs, with decision rule: { > 0, if c(x) =c(y) f(x, y) < 0, otherwise. Our goal is to find the optimal second-order decision function f(x, y) in (4) that is parametrized by {A, B, c, b}. This naturally leads to a choice of an SVM-like [4] objective function, as the resulting large-margin model generalizes well to unseen examples. Specifically, assume we are given a dataset of examples, and pairwise labels are assigned. A sample pair p i =(x i,y i ) is labeled as either positive (l i =+), if x i and y i are from the same class; or negative (l i = ), otherwise. We further denote by P the set of all labeled sample pairs. An SVM-like objective function can be for- (a) Intra-person distances (different poses). (b) Inter-person distances (same pose). Figure 3. Comparison of intra-person and inter-person distances under a learned metric. mulated as: min ( A F + B F + c ) + λ ξ i i P (6) s.t. l i f(x i,y i ) ξ i i P ξ i 0 i P. Here A F = tr(a t A) is the Frobenius matrix norm, and tr(a) denotes the trace of matrix A. Noticing the inner product defined on the matrix space, A, B = tr(a t B), we reformulate the decision function (4) into: f(x, y) = tr ( A (xx t + yy t ) ) + tr ( B (xy t + yx t ) ) +c t (x + y)+b = A, xx t + yy t + B, xy t + yx t + c, x + y + b = ζ,ψ(x, y) + b, (7) where ζ R d + is a vectorized representation of the hyper-parameters (excluding b), and ψ(x, y) defines a map- 4
5 ping R d R d R d +d : ζ = vec(a) vec(b) (8) c ψ(x, y) = vec(xxt + yy t ) vec(xyt + yx t ) x + y, (9) where vec( ) denotes the vectorization of a matrix. Note that ψ(x, y) can be viewed as a symmetrization of the original feature space (x, y), that is, any function of ψ(x, y) is now a symmetric function of x and y. Similarly, the objective function can be rewritten as: min ζ,ζ + λ ξ i i P (0) s.t. l i ( ζ,ψ i + b) ξ i i P ξ i 0 i P, where ψ i is an abbreviation of ψ(x i,y i ). This looks identical to the standard SVM problem [4]. Thus existing SVM solvers could be employed to solve this problem, such as stochastic gradient decent [3] that works on the primal problem directly, or sequential minimal optimization (SMO) [] that solves the dual problem instead. 3.. An efficient dual solver Though looking straightforward, solving (0) directly is infeasible due to the high dimensionality of d + d. For instance, a moderate image feature of 000 dimensions will lead to more than million parameters to estimate. What s more, direct application of existing SVM solvers may require forming ψ(x i,y i ) s explicitly, which is highly inefficient and prohibitive in memory usage. In this section, we will show that the original problem can actually be converted into a kernelized SVM problem that could be solved much more efficiently. We start with the Lagrange dual of (0): max s.t. α i α i α j l i l j ψ i,ψ j () i i,j α i l i =0, 0 α i λ, i where α i is the Lagrange multiplier corresponding to the i- th constraint. If we could have solved the above problem with optimal αi s, the solution for the primal is then given by: ζ = αi l i ψ i i b = l i ζ,ψ i, i : αi > 0. And the optimal decision function is therefore: f(x, y) = ζ,ψ(x, y) + b = i α i l i ψ i,ψ + b. () We notice that either solving the dual problem () or applying the optimal function () involves only the so-called kernel function K(ψ i,ψ j )= ψ i,ψ j. By substituting (9) and the equality vec(a) t vec(b) = A, B = tr(a t B), we arrive at: K(ψ i,ψ j ) = 4 tr ( (x i x t i + y i yi)(x t j x t j + y j yj) t ) + 4 tr ( (x i yi t + y i x t i)(x j yj t + y j x t j) ) +(x i + y i ) t (x j + y j ) = 4 (xt ix j + y t iy j ) + 4 (xt iy j + y t ix j ) +(x i + y i ) t (x j + y j ). (3) Note that the kernel function here is defined on a new space of ψ(x, y) that is symmetric with respect to x and y. More specifically, different from a traditional kernel function that is between two individual samples, K(ψ i,ψ j ) is defined between two pairs of samples. We now see that, to evaluate each kernel function K(ψ i,ψ j ), one only needs to calculate 4 inner products on R d : x t i x j, x t i y j, y t i x j, and y t i y j, rather than working on the (d + d)-dimensional space instead. In this way we reduce the complexity of each kernel evaluation from O(d ) to O(d), which is usually the most costly operation in solving large-scale dual SVM problems [7]. In addition, the memory cost is alleviated accordingly, as explicitly constructing ψ(x, y) s by (9) is no longer necessary. Based on (3), existing dual SVM solvers such as SMO algorithm [, 7] can be applied to solve () efficiently. Moreover, the fact that only inner products are involved in K(ψ i,ψ j ) implies the extension to implicit kernel embedding of original features, namely, K(ψ i,ψ j ) = 4 (G(x i,x j )+G(y i,y j )) + 4 (G(x i,y j )+G(y i,x j )) +G(x i,x j )+G(x i,y j ) +G(y i,x j )+G(y i,y j ), (4) where G(, ) is a kernel function of the original feature space. Based on this kernel embedding, we can thus extend our decision function (4) to higher orders by the kernel trick []. However, in practice, cubic polynomials or higher order functions often work less robustly, so in experiments, we will mainly use the second-order decision functions. 5
6 3.3. Regularizations In practice, especially when there is only limited amount of training data, we might consider further regularizing the parameters (A and B in particular). For instance, Huang et al. [3] impose various constraints on the learned Mahanalobis matrix (for metric leanring), including positive semi-definity, low rank, sparsity, etc. While all these regularizations can be applied in addition to the Frobenius norm used in (6), we find in practice that positive/negative semidefinity to be particularly useful. Note that here we have two matrices A and B that need to be constrained. Both metric learning and the toy example in Section indicate that we could force A to be positive semi-definite while requiring B be negative semi-definite. So we are adding two constraints to the objective function in (6): A PSD and B NSD. Gradient projection algorithms can be employed to solve the optimization problem, i.e., after each gradient descent step, we project the updated A onto the PSD space, and B onto the NSD space. Alternatively, we could let A = MM t and B = NN t and optimize on M and N instead. Though the problem on M and N is no longer convex, it does not seem to suffer from seveve local minimum issues [4]. 4. Experiments We conduct experiments on three different datasets: Viewpoint Invariant Pedestrian Recognition (VIPeR) [], Context Aware Vision using Image-based Active Recognition for Re-Identification (CAVIAR4REID) [3], and Labeled Faces in the Wild (LFW) [6]. The first two datasets focus on person verification from human body images, while the latter one on face verification. In each experiment, we present results by comparing with classic metric learning (ML) algorithms as well as other state-of-the-art approaches. We demonstrate that our proposed approach significantly outperforms existing works and achieves state-of-the-art results on all datasets. The image features and the code for the learning algorithm used in our experiments are available at zhenli3/learndecfunc. 4.. VIPeR The VIPeR dataset consists of images from 63 pedestrians with resolution For each person, a pair of images are taken from cameras with widely differing views. Viewpoint change of 90 degrees or more as well as huge lighting variations make this dataset one of the most challenging datasets available for human body verification. Example images are shown in Figure 4.. We follow [8] to extract high level image features based on simple patch color descriptors. To accelerate the learning process, we further reduce the dimensionality of the fi- Figure 4. Example images of VIPer dataset. Each column shows two images of the same pedestrian captured under two different cameras. nal feature representation to 600 using PCA (learned on the training set). We also follow exactly the same setup as in [0,, 3]: each time half of the 63 people are selected randomly to form the training set, and the remaining people are left for testing (so that no people will appear in both the training and testing). The cumulative matching characteristic (CMC) curve, an estimate of the expectation of finding the correct match in the top n matches, is calculated on the testing set to measure the verification performance (see [0] for details on computing the CMC curve). The final results are averaged over ten random runs. Figure 5(a) compares our proposed method with classic ML algorithms: LMNN [4] and ITML [5], using the same feature. It is apparent that, in the verification problem, the optimal second-order decision function (4) does significantly improve over traditional ML approaches with a fixed threshold (). Note that here LMNN performs the worst. One of possible reason is that each class contains only two examples with huge intra-class variations. We are also interested in comparing with other state-of-the-art methods on this dataset, though different features and/or learning algorithms have been used. Figure 5(b) shows the comparison with PS [3], SDALF [8], ELF [], and PRSVM [0]. Clearly our method outperforms all previous works and achieves state-of-art performance. 4.. CAVIAR4REID CAVIAR4REID [3], extracted from the CAVIAR dataset, is another famous dataset widely used for person verification tasks. This dataset not only covers a wide range of poses and real surveillance footage, but also includes multiple images per pedestrian with different view angles and resolutions. There are in total 7 pedestrians, and each person has images recorded from two different cameras in an indoor shopping mall in Lisbon. All the human body images have been cropped with respect to the ground truth, and the resolution various from 7 39 to Here we extract the same image feature as in Section 4., and we also use the same training/testing protocol. Again, we compare with popular ML algorithms as well 6
7 CMC Curve for VIPeR Dataset 0.9 CMC Curve for CAVIAR4REID Dataset Recognition Percentage Ours ITML LMNN Rank Score Recognition Percentage Ours ITML LMNN Rank Score (a) Comparison with metric learning algorithms. (a) Comparison with metric learning algorithms. CMC Curve for VIPeR Dataset 0.9 CMC Curve for CAVIAR4REID Dataset Recognition Percentage Ours PS PRSVM SDALF ELF Rank Score Recognition Percentage Ours PS SDALF Rank Score (b) Comparison with other state-of-art algorithms. Figure 5. Experimental results on VIPeR dataset. as other state-of-the-art approaches, as shown in Figure 6(a) and Figure 6(b), respectively. Similarly as in Section 4., we observe a substantial improvement over traditional ML algorithms, and our method also outperforms state-of-theart works including PS [3] and SDALF [8]. It should be noted that, the curves by both PS and SDALF shown in 6(b) have been extrapolated for the sake of fair comparison. The reason is that we have to separate a subset of 36 people for learning the parameters of our decision function (4) or distance metric. With only half of the people left in testing, we rescale the horizontal axis of PS and SDALF by 50% for a fair comparison LFW The Labeled Faces in the Wild (LFW) [6] is a database of face images designed for studying the problem of unconstrained face recognition. The face images were downloaded from Yahoo! News in , and demonstrate a large variety of pose, expression, lighting, etc. The dataset contains more than 3,000 face images from 5,749 people, among which,680 people have two or more distinct photos. We extract the same high level image feature (b) Comparison with other state-of-art algorithms. Figure 6. Experimental results on CAVIAR4REID dataset. as in Section 4. and 4., except that SIFT [8] descriptors are computed for local patches instead of color, as suggested by []. The features are reduced to 500 dimensions using PCA. We test our algorithm under the standard image restricted setting that is particularly designed for verification. In this setting, the dataset is divided into 0 fully independent folds, and it is ensured that not the same person appears across different folds. The identities of the people are hidden from use; instead, 300 positive and 300 negative image pairs are provided within each fold. Figure 3 shows some examples of positive and negative image pairs. Each time we learn both the PCA projection and the parameters of our decision function on 9 training folds, and evaluate on the remaining fold. Pairwise classification accuracy averaged over 0 runs is reported, as suggested by [6]. As shown in Table, our approach significantly outperforms state-of-the-art works on the LFW dataset. It should be noted that our verification accuracy of 89.6% outperfoms We have rescaled the curves by PS [3] and SDALF [8] for a fair comparison. See text. 7
8 Table. Comparison with state-of-the-art algorithms on LFW dataset. The best performance is highlighted in bold. Methods Accuracy (%) MERL+Nowak [4] 76. LDML [] 79.3 LBP + CSML [9] 85.6 CSML + SVM [9] 88.0 Combined b/g samples [5] 86.8 DML-eig combined [7] 85.7 Deep Learning combined [5] 87.8 Our method 89.6 the best reported results 3 in LFW under the category of no outside data is used beyond alignment/feature extraction. This result also significantly improves our previous work in [7]. 5. Conclusion In this paper, we propose to learn a decision function for the verification problem. Our second-order formulation generalizes from traditional metric learning (ML) approaches by offering a locally adaptive decision rule. Compared with existing approaches including ML, our approach demonstrates state-of-the-art performance on several person verification benchmark datasets such as VIPeR, CAVIAR4REID, and LFW. References [] C. Burges. Advances in kernel methods: support vector learning. The MIT press, 999., 3, 5 [] X. Chen, Z. Tong, H. Liu, and D. Cai. Metric learning with two-dimensional smoothness for visual analysis. In CVPR, pages , 0. [3] D. Cheng, M. Cristani, M. Stoppa, L. Bazzani, and V. Murino. Custom pictorial structures for re-identification. In BMVC, 0. 6, 7 [4] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 0(3):73 97, , 5 [5] J. Davis, B. Kulis, P. Jain, S. Sra, and I. Dhillon. Informationtheoretic metric learning. In ICML, [6] M. Dikmen, E. Akbas, T. Huang, and N. Ahuja. Pedestrian recognition with a learned metric. In ACCV, 00. [7] R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training support vector machines. Journal of Machine Learning Research, 6:889 98, [8] M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani. Person re-identification by symmetry-driven accumulation of local features. In CVPR, 00. 6, 7 [9] N. Gilardi and S. Bengio. Local machine learning models for spatial data analysis. Journal of Geographic Information and Decision Analysis, 4(): 8, [0] D. Gray, S. Brennan, and H. Tao. Evaluating appearance models for recognition, reacquisition, and tracking. In Workshop on Performance Evaluation for Tracking and Surveillance (PETS), [] D. Gray and H. Tao. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In ECCV, [] M. Guillaumin, J. Verbeek, and C. Schmid. Is that you? metric learning approaches for face identification. In ICCV, 009., 7, 8 [3] C. Huang, S. Zhu, and K. Yu. Large scale strongly supervised ensemble metric learning, with applications to face verification and retrieval. Technical report, NEC, 0. 6 [4] G. Huang, M. Jones, E. Learned-Miller, et al. Lfw results using a combined nowak plus merl recognizer. In Workshop on Faces in Real-Life Images at ECCV, [5] G. Huang, H. Lee, and E. Learned-Miller. Learning hierarchical representations for face verification with convolutional deep belief networks. In CVPR, pages 58 55, 0. 8 [6] G. B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst, October , 6, 7 [7] Z. Li, L. Cao, S. Chang, J. R. Smith, and T. S. Huang. Beyond mahalanobis distance: Learning second-order discriminant function for people verification. In CVPR Workshops, pages 45 50, 0. 8 [8] D. G. Lowe. Object recognition from local scale-invariant features. In ICCV, pages 50 57, [9] H. V. Nguyen and L. Bai. Cosine similarity metric learning for face verification. In ACCV, 00., 8 [0] K. Okuma, A. Taleghani, N. de Freitas, J. J. Little, and D. G. Lowe. A boosted particle filter: Multitarget detection and tracking. In ECCV, [] J. Platt. Sequential minimal optimization: A fast algorithm for training support vector machines [] J. Sánchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In CVPR, 0. [3] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In ICML, [4] K. Q. Weinberger and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 0:07 44, [5] L. Wolf, T. Hassner, and Y. Taigman. Similarity scores based on background samples. In ACCV, [6] L. Yang and R. Jin. Distance metric learning: A comprehensive survey. Michigan State Universiy, 006. [7] Y. Ying and P. Li. Distance metric learning with eigenvalue optimization. Journal of Machine Learning Research, 3: 6, 0. 8 [8] X. Zhou, N. Cui, Z. Li, F. Liang, and T. S. Huang. Hierarchical gaussianization for image classification. In ICCV, 009., 6 8
Relaxed Pairwise Learned Metric for Person Re-Identification
Relaxed Pairwise Learned Metric for Person Re-Identification Martin Hirzer, Peter M. Roth, Martin Köstinger, and Horst Bischof Institute for Computer Graphics and Vision Graz University of Technology Abstract.
Is that you? Metric Learning Approaches for Face Identification
Is that you? Metric Learning Approaches for Face Identification Matthieu Guillaumin, Jakob Verbeek and Cordelia Schmid LEAR, INRIA Grenoble Laboratoire Jean Kuntzmann [email protected] Abstract
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not?
Naive-Deep Face Recognition: Touching the Limit of LFW Benchmark or Not? Erjin Zhou [email protected] Zhimin Cao [email protected] Qi Yin [email protected] Abstract Face recognition performance improves rapidly
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data
CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear
LOGISTIC SIMILARITY METRIC LEARNING FOR FACE VERIFICATION
LOGISIC SIMILARIY MERIC LEARNING FOR FACE VERIFICAION Lilei Zheng, Khalid Idrissi, Christophe Garcia, Stefan Duffner and Atilla Baskurt Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR5205, F-69621, France
Large-Scale Similarity and Distance Metric Learning
Large-Scale Similarity and Distance Metric Learning Aurélien Bellet Télécom ParisTech Joint work with K. Liu, Y. Shi and F. Sha (USC), S. Clémençon and I. Colin (Télécom ParisTech) Séminaire Criteo March
A Simple Introduction to Support Vector Machines
A Simple Introduction to Support Vector Machines Martin Law Lecture for CSE 802 Department of Computer Science and Engineering Michigan State University Outline A brief history of SVM Large-margin linear
Local features and matching. Image classification & object localization
Overview Instance level search Local features and matching Efficient visual recognition Image classification & object localization Category recognition Image classification: assigning a class label to
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part I) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
Introduction to Support Vector Machines. Colin Campbell, Bristol University
Introduction to Support Vector Machines Colin Campbell, Bristol University 1 Outline of talk. Part 1. An Introduction to SVMs 1.1. SVMs for binary classification. 1.2. Soft margins and multi-class classification.
How To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
2.2 Creaseness operator
2.2. Creaseness operator 31 2.2 Creaseness operator Antonio López, a member of our group, has studied for his PhD dissertation the differential operators described in this section [72]. He has compared
Vector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
Support Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France [email protected] Massimiliano
Big Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
Support Vector Machine (SVM)
Support Vector Machine (SVM) CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
Lecture 2: The SVM classifier
Lecture 2: The SVM classifier C19 Machine Learning Hilary 2015 A. Zisserman Review of linear classifiers Linear separability Perceptron Support Vector Machine (SVM) classifier Wide margin Cost function
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Linear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
Visualization by Linear Projections as Information Retrieval
Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland [email protected]
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
How does Person Identity Recognition Help Multi-Person Tracking?
How does Person Identity Recognition Help Multi-Person Tracking? Cheng-Hao Kuo and Ram Nevatia University of Southern California, Institute for Robotics and Intelligent Systems Los Angeles, CA 90089, USA
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
Large Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. [email protected] Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
Duality in General Programs. Ryan Tibshirani Convex Optimization 10-725/36-725
Duality in General Programs Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: duality in linear programs Given c R n, A R m n, b R m, G R r n, h R r : min x R n c T x max u R m, v R r b T
Deep Learning Face Representation by Joint Identification-Verification
Deep Learning Face Representation by Joint Identification-Verification Yi Sun 1 Yuheng Chen 2 Xiaogang Wang 3,4 Xiaoou Tang 1,4 1 Department of Information Engineering, The Chinese University of Hong Kong
Recognizing Cats and Dogs with Shape and Appearance based Models. Group Member: Chu Wang, Landu Jiang
Recognizing Cats and Dogs with Shape and Appearance based Models Group Member: Chu Wang, Landu Jiang Abstract Recognizing cats and dogs from images is a challenging competition raised by Kaggle platform
The Power of Asymmetry in Binary Hashing
The Power of Asymmetry in Binary Hashing Behnam Neyshabur Payman Yadollahpour Yury Makarychev Toyota Technological Institute at Chicago [btavakoli,pyadolla,yury]@ttic.edu Ruslan Salakhutdinov Departments
Simple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRA-Lab,
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
Big Data Analytics CSCI 4030
High dim. data Graph data Infinite data Machine learning Apps Locality sensitive hashing PageRank, SimRank Filtering data streams SVM Recommen der systems Clustering Community Detection Web advertising
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals
The Role of Size Normalization on the Recognition Rate of Handwritten Numerals Chun Lei He, Ping Zhang, Jianxiong Dong, Ching Y. Suen, Tien D. Bui Centre for Pattern Recognition and Machine Intelligence,
Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification
29 Ninth IEEE International Conference on Data Mining Two Heads Better Than One: Metric+Active Learning and Its Applications for IT Service Classification Fei Wang 1,JimengSun 2,TaoLi 1, Nikos Anerousis
Transform-based Domain Adaptation for Big Data
Transform-based Domain Adaptation for Big Data Erik Rodner University of Jena Judy Hoffman Jeff Donahue Trevor Darrell Kate Saenko UMass Lowell Abstract Images seen during test time are often not from
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics
Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University
MACHINE LEARNING IN HIGH ENERGY PHYSICS
MACHINE LEARNING IN HIGH ENERGY PHYSICS LECTURE #1 Alex Rogozhnikov, 2015 INTRO NOTES 4 days two lectures, two practice seminars every day this is introductory track to machine learning kaggle competition!
Section 1.1. Introduction to R n
The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to
Open-Set Face Recognition-based Visitor Interface System
Open-Set Face Recognition-based Visitor Interface System Hazım K. Ekenel, Lorant Szasz-Toth, and Rainer Stiefelhagen Computer Science Department, Universität Karlsruhe (TH) Am Fasanengarten 5, Karlsruhe
A fast multi-class SVM learning method for huge databases
www.ijcsi.org 544 A fast multi-class SVM learning method for huge databases Djeffal Abdelhamid 1, Babahenini Mohamed Chaouki 2 and Taleb-Ahmed Abdelmalik 3 1,2 Computer science department, LESIA Laboratory,
Several Views of Support Vector Machines
Several Views of Support Vector Machines Ryan M. Rifkin Honda Research Institute USA, Inc. Human Intention Understanding Group 2007 Tikhonov Regularization We are considering algorithms of the form min
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
Classifying Manipulation Primitives from Visual Data
Classifying Manipulation Primitives from Visual Data Sandy Huang and Dylan Hadfield-Menell Abstract One approach to learning from demonstrations in robotics is to make use of a classifier to predict if
Character Image Patterns as Big Data
22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,
CSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 2014) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu September 22, 2014 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 2014) September 22, 2014 1 /
Inner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
Face Recognition in Low-resolution Images by Using Local Zernike Moments
Proceedings of the International Conference on Machine Vision and Machine Learning Prague, Czech Republic, August14-15, 014 Paper No. 15 Face Recognition in Low-resolution Images by Using Local Zernie
Support Vector Machines Explained
March 1, 2009 Support Vector Machines Explained Tristan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introduction This document has been written in an attempt to make the Support Vector Machines (SVM),
Group Sparse Coding. Fernando Pereira Google Mountain View, CA [email protected]. Dennis Strelow Google Mountain View, CA strelow@google.
Group Sparse Coding Samy Bengio Google Mountain View, CA [email protected] Fernando Pereira Google Mountain View, CA [email protected] Yoram Singer Google Mountain View, CA [email protected] Dennis Strelow
LINEAR ALGEBRA W W L CHEN
LINEAR ALGEBRA W W L CHEN c W W L Chen, 1997, 2008 This chapter is available free to all individuals, on understanding that it is not to be used for financial gain, and may be downloaded and/or photocopied,
Support Vector Machines
Support Vector Machines Charlie Frogner 1 MIT 2011 1 Slides mostly stolen from Ryan Rifkin (Google). Plan Regularization derivation of SVMs. Analyzing the SVM problem: optimization, duality. Geometric
Active Learning SVM for Blogs recommendation
Active Learning SVM for Blogs recommendation Xin Guan Computer Science, George Mason University Ⅰ.Introduction In the DH Now website, they try to review a big amount of blogs and articles and find the
Big Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang, Qihang Lin, Rong Jin Tutorial@SIGKDD 2015 Sydney, Australia Department of Computer Science, The University of Iowa, IA, USA Department of
Vision based Vehicle Tracking using a high angle camera
Vision based Vehicle Tracking using a high angle camera Raúl Ignacio Ramos García Dule Shu [email protected] [email protected] Abstract A vehicle tracking and grouping algorithm is presented in this work
Distance Metric Learning for Large Margin Nearest Neighbor Classification
Journal of Machine Learning Research 10 (2009) 207-244 Submitted 12/07; Revised 9/08; Published 2/09 Distance Metric Learning for Large Margin Nearest Neighbor Classification Kilian Q. Weinberger Yahoo!
Creating a Big Data Resource from the Faces of Wikipedia
Creating a Big Data Resource from the Faces of Wikipedia Md. Kamrul Hasan Génie informatique et génie logiciel École Polytechnique de Montreal Québec, Canada [email protected] Christopher J. Pal
Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval
Information Retrieval INFO 4300 / CS 4300! Retrieval models Older models» Boolean retrieval» Vector Space model Probabilistic Models» BM25» Language models Web search» Learning to Rank Search Taxonomy!
DUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 [email protected] Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012
Clustering Big Data Anil K. Jain (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012 Outline Big Data How to extract information? Data clustering
Distributed Machine Learning and Big Data
Distributed Machine Learning and Big Data Sourangshu Bhattacharya Dept. of Computer Science and Engineering, IIT Kharagpur. http://cse.iitkgp.ac.in/~sourangshu/ August 21, 2015 Sourangshu Bhattacharya
(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
MATH 425, PRACTICE FINAL EXAM SOLUTIONS.
MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator
Tree based ensemble models regularization by convex optimization
Tree based ensemble models regularization by convex optimization Bertrand Cornélusse, Pierre Geurts and Louis Wehenkel Department of Electrical Engineering and Computer Science University of Liège B-4000
Support Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe are indeed the best)
10. Proximal point method
L. Vandenberghe EE236C Spring 2013-14) 10. Proximal point method proximal point method augmented Lagrangian method Moreau-Yosida smoothing 10-1 Proximal point method a conceptual algorithm for minimizing
Probabilistic Latent Semantic Analysis (plsa)
Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg [email protected] www.multimedia-computing.{de,org} References
Mathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006. Principal Components Null Space Analysis for Image and Video Classification
1816 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 15, NO. 7, JULY 2006 Principal Components Null Space Analysis for Image and Video Classification Namrata Vaswani, Member, IEEE, and Rama Chellappa, Fellow,
Knowledge Discovery from patents using KMX Text Analytics
Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs [email protected] Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j
Analysis of kiva.com Microlending Service! Hoda Eydgahi Julia Ma Andy Bardagjy December 9, 2010 MAS.622j What is Kiva? An organization that allows people to lend small amounts of money via the Internet
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Making Sense of the Mayhem: Machine Learning and March Madness
Making Sense of the Mayhem: Machine Learning and March Madness Alex Tran and Adam Ginzberg Stanford University [email protected] [email protected] I. Introduction III. Model The goal of our research
Document Image Retrieval using Signatures as Queries
Document Image Retrieval using Signatures as Queries Sargur N. Srihari, Shravya Shetty, Siyuan Chen, Harish Srinivasan, Chen Huang CEDAR, University at Buffalo(SUNY) Amherst, New York 14228 Gady Agam and
CS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht [email protected] 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht [email protected] 539 Sennott
Linear algebra and the geometry of quadratic equations. Similarity transformations and orthogonal matrices
MATH 30 Differential Equations Spring 006 Linear algebra and the geometry of quadratic equations Similarity transformations and orthogonal matrices First, some things to recall from linear algebra Two
Factorization Machines
Factorization Machines Steffen Rendle Department of Reasoning for Intelligence The Institute of Scientific and Industrial Research Osaka University, Japan [email protected] Abstract In this
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
Machine Learning in Spam Filtering
Machine Learning in Spam Filtering A Crash Course in ML Konstantin Tretyakov [email protected] Institute of Computer Science, University of Tartu Overview Spam is Evil ML for Spam Filtering: General Idea, Problems.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.
MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets. Norm The notion of norm generalizes the notion of length of a vector in R n. Definition. Let V be a vector space. A function α
3. INNER PRODUCT SPACES
. INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.
Multiclass Classification. 9.520 Class 06, 25 Feb 2008 Ryan Rifkin
Multiclass Classification 9.520 Class 06, 25 Feb 2008 Ryan Rifkin It is a tale Told by an idiot, full of sound and fury, Signifying nothing. Macbeth, Act V, Scene V What Is Multiclass Classification? Each
Supporting Online Material for
www.sciencemag.org/cgi/content/full/313/5786/504/dc1 Supporting Online Material for Reducing the Dimensionality of Data with Neural Networks G. E. Hinton* and R. R. Salakhutdinov *To whom correspondence
Adapting Visual Category Models to New Domains
Adapting Visual Category Models to New Domains Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Darrell UC Berkeley EECS and ICSI, Berkeley, CA {saenko,kulis,mfritz,trevor}@eecs.berkeley.edu Abstract.
Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
Outline Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach Jinfeng Yi, Rong Jin, Anil K. Jain, Shaili Jain 2012 Presented By : KHALID ALKOBAYER Crowdsourcing and Crowdclustering
Support Vector Machine. Tutorial. (and Statistical Learning Theory)
Support Vector Machine (and Statistical Learning Theory) Tutorial Jason Weston NEC Labs America 4 Independence Way, Princeton, USA. [email protected] 1 Support Vector Machines: history SVMs introduced
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
SZTAKI @ ImageCLEF 2011
SZTAKI @ ImageCLEF 2011 Bálint Daróczy Róbert Pethes András A. Benczúr Data Mining and Web search Research Group, Informatics Laboratory Computer and Automation Research Institute of the Hungarian Academy
The Artificial Prediction Market
The Artificial Prediction Market Adrian Barbu Department of Statistics Florida State University Joint work with Nathan Lay, Siemens Corporate Research 1 Overview Main Contributions A mathematical theory
A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS
A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors
Integrating Benders decomposition within Constraint Programming
Integrating Benders decomposition within Constraint Programming Hadrien Cambazard, Narendra Jussien email: {hcambaza,jussien}@emn.fr École des Mines de Nantes, LINA CNRS FRE 2729 4 rue Alfred Kastler BP
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Introduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
Mean-Shift Tracking with Random Sampling
1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of
Novelty Detection in image recognition using IRF Neural Networks properties
Novelty Detection in image recognition using IRF Neural Networks properties Philippe Smagghe, Jean-Luc Buessler, Jean-Philippe Urban Université de Haute-Alsace MIPS 4, rue des Frères Lumière, 68093 Mulhouse,
