Learning Bilinear Models for Two-Factor Problems in Vision

Size: px
Start display at page:

Download "Learning Bilinear Models for Two-Factor Problems in Vision"

Transcription

1 MITSUBISHI ELECTRIC RESEARCH LABORATORIES Learning Bilinear Models for Two-Factor Problems in Vision W. T. Freeman, J. B. Tenenbaum TR96-37 December 1996 Abstract In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy; rendering conditions from surface shape in shape-from-shading; face identity and head pose in face recognition; or font and letter class in character recognition. We refer to these two factors generically as styleänd content. This paper received Outstanding Paper prize, CVPR 97. Proc. IEEE Computer Vision and Pattern Recognition (CVPR 97), Puerto Rico This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved. Copyright c Mitsubishi Electric Research Laboratories, Inc., Broadway, Cambridge, Massachusetts 02139

2 MERLCoverPageSide2

3 MERL { A MITSUBISHI ELECTRIC RESEARCH LABORATORY Learning bilinear models for two-factor problems in vision W. T. Freeman and J. B. Tenenbaum TR May 1999 Abstract In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy; rendering conditions from surface shape in shape-from-shading; face identity and head pose in face recognition; or font and letter class in character recognition. We refer to these two factors generically as \style" and \content". Bilinear models oer a powerful framework for extracting the two-factor structure of a set of observations, and are familiar in computational vision from several well-known lines of research. This paper shows how bilinear models can be used to learn the style-content structure of a pattern analysis or synthesis problem, which can then be generalized to solve related tasks using dierent styles and/or content. We focus on three kinds of tasks: extrapolating the style of data to unseen content classes, classifying data with known content under a novel style, and translating two sets of data, generated in dierent styles and with distinct content, into each other's styles. We show examples from color constancy, face pose estimation, shape-from-shading, typography and speech. In Proceedings IEEE Computer Vision and Pattern Recognition, 1997, Puerto Rico. This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprot educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Information Technology Center America; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Information Technology Center America. All rights reserved. Copyright c Mitsubishi Electric Information Technology Center America, Broadway, Cambridge, Massachusetts 02139

4 1. First printing, TR96-37, March, 1997

5 Learning bilinear models for two-factor problems in vision W. T. Freeman J. B. Tenenbaum MERL, a Mitsubishi Electric Research Lab MIT Dept. of Brain and Cognitive Sciences 201 Broadway E Cambridge, MA Cambridge, MA freeman@merl.com jbt@psyche.mit.edu MERL Technical Report MERL, a Mitsubishi Electric Research Lab 201 Broadway Cambridge, MA November, 1996 Domain Content Style typography character font char. # 1 Chicago Not observed Observed Abstract In many vision problems, we want to infer two (or more) hidden factors which interact to produce our observations. We may want to disentangle illuminant and object colors in color constancy; rendering conditions from surface shape in shape-from-shading; face identity and head pose in face recognition; or font and letter class in character recognition. We refer to these two factors generically as \style" and \content". Bilinear models oer a powerful framework for extracting the two-factor structure of a set of observations, and are familiar in computational vision from several well-known lines of research. This paper shows how bilinear models can be used to learn the stylecontent structure of a pattern analysis or synthesis problem, which can then be generalized to solve related tasks using dierent styles and/or content. We focus on three kinds of tasks: extrapolating the style of data to unseen content classes, classifying data with known content under a novel style, and translating two sets of data, generated in dierent styles and with distinct content, into each other's styles. We show examples from color constancy, face pose estimation, shape-from-shading, typography and speech. 1 Introduction A set of observations is often inuenced by two or more independent factors. For example, in typography, character and font combine to yield the rendered letter, Fig. 1. We may think of one factor as \content" (the character) and the other as \style" (the font). Many estimation problems t this form (see also [10, 15]). In speech recognition, the speaker's accent modulates words to produce sounds. In face recognition, a person's image is modulated by both their identity and by the orientation of their head. In shape-from-shading, both the shape of the object and the lighting conditions inuence the image. In color perception, the unknown illumination color can be thought of as a style which modulates the unknown object surface reectances to produce the observed eye Figure 1: Example problem with two-factor structure. The observed letter is a function of two unseen factors, the character (content) and the font (style). cone responses. We will generically refer to the two factors as \style" and \content". In these problems, and others, we want to make inferences about the factors underlying the observations. We want to perceive the true shapes, colors, or faces, independent of the rendering conditions, or want to recognize the speaker's words independent of the accent. This style/content factorization is an essential problem in vision, and many researchers have addressed related issues (e.g., [10, 22, 4, 13]). A key feature of the style-content factorization, which has not been addressed in these previous papers, is that it is well-suited to learning the structure of analysis or synthesis problems, as we describe below. We demonstrate competency in a vision task by being able to predict how observations would change were the style or content class changed, or in classifying by content observations in a new style. We explore the learning issues, and emphasize the problems that a learning approach lets you solve. Learning the model parameters is analogous to learning from observations over the course of one's visual experience. We see how content-class data change when observed under dierent styles. We describe in the next section standard techniques for tting our model parameters to obesrvation data in the complete matrix form. We use a bi-linear model which explicitly represents the problem's factorial structure, and t the model parameters to the observations. We then use the tted models to solve a particular task. We identify and solve 3 example tasks for two-factor problems: extrapolation, classication, and translation.

6 In the next section, we describe those canonical tasks. Following that, we present our models, show how to learn the model parameters, and present solved examples for each task. 2 Tasks: extrapolation, classication, and translation The rst of our three problem tasks is extrapolation. Given some examples of previously observed content-classes in a new style, extrapolate to synthesize other content-classes in that new style. Referring to Fig. 2, this involves analyzing the style common to the letters of the bottom row, nding what the letters of a column have in common, and synthesizing that letter in that font. We apply our bilinear model to this example in Section 5.1. The second task is classication. We observe examples of a new style, but with no content-class labels, Fig. 3. An example of this is listening to speech by a speaker with an unusual accent. We need to develop a model for how observations change within a style (accent), and how they change across styles for a given content-class (word). This information can improve classication performance, as we show in the examples of Section 6. We call the third task translation, and it is the most dicult, although the most useful. We observe new content-classes in a new style, Fig. 4, and we want to translate those observations to a known style, or, holding style constant, to known content classes. To solve this problem, we have to build a model not only for style and content independent of each other, but for the problem itself, independent of style and content. We show two examples of this task in Section 7. Figure 3: The classication problem. Linear models have been successfully applied to many vision problems [23, 7, 24, 2, 13, 14, 19, 16]. These studies suggest that many data sets produced by varying a single style or content factor (with all other factors held constant) can be represented exactly or approximately by linear models. Since a bilinear mapping from style and content to observations is linear in the style component for a xed content class, and vice versa, bilinear models are thus naturally promising candidates for modeling data produced by the interaction of individually linear factors. Bilinear models inherit many convenient features of linear models: the are easy to t (either with closed form solutions or ecient iterative solutions with guarantees of convergence [12]. They can be embedded in a probabilistic model with gaussian noise to yield tractable maximum likelihood estimation with missing information, as described in [21] and applied here in Section 4. The models extend easily to multiple factors, yielding multilinear models. Bilinear models are simple, yet seem capable of modeling real image data, so we want to take this approach as far as we can. Figure 2: The extrapolation problem. 3 Bilinear models We write an observation vector in the style s and content class c as y sc, and let K be its dimension. We seek to t these observations with some model y sc = f(a s ;b c ;W), where a s and b c are parameter vectors describing style s and content c, and W is a set of parameters for the rendering function f that determines the mapping from the style and content spaces to the observation space. There are many reasons to choose a bilinear form for f: Figure 4: The translation problem.

7 We assume f is a bilinear mapping, given as follows: X = W ijk a s (1) i bc j y sc k ij The W ijk parameters represent a set of basis functions independent of style and content, which characterize the interaction between these two factors. Observations in style s and content c are generated by mixing these basis functions with coecients given by the (tensor) product of a sand vectors. These represent i bc j (respectively) style independently of content and content independently of style. In general, the dimensionalities of the style and content vectors, denoted I and J, should be less than or equal to the numbers of styles N s and content classes N c respectively; the model is capable of perfectly reproducing the observations when I = N s and J = N c, and nds increasingly compact (and hence more generalizable) representations of style and content as the dimensionality is decreased. We call this the symmetric model, because it treats the two factors symmetrically. Note that the symmetric bilinear model can reduce the dimensionality of both style and content, so that assuming the basis functions W ijk have been learned from experience, there may be fewer unknown degrees of freedom in style and content than there are constraints given by one high-dimensional image. Thus it is possible to do translation problems such as shapefrom-shading and color constancy, where both the rendering conditions and the content classes of the new observation are unknown 1. In many situations, it may be impractical or unnecessary to represent both style and content factors with low-dimensional parameter vectors. For instance, we may observe only a small sample of widely varying styles, which could in principle be expressed as linear combinations of some basis styles given a much larger sample of styles from which to learn this basis, but which cannot be so compactly represented given only the limited sample available. We can obtain a more exible bilinear model which still preserves separable style and content representations by letting the basis functions depend on style or content, instead of being independent of both factors. For example, P if the basis functions are allowed to depend on style, the bilinear model from Eq. (1) becomes y sc k = ij W ijk s a s i bc j, which simplies, without loss of generality, to the following form by summing out the i index, X = A s jk bc j : (2) y sc k j Style s is now completely represented by the J 2K matrix A s jk. Alternatively, if the basis functions depend on content, then we have content class c represented 1 It can be shown that if a symmetric bilinear model ts the observations, then, given the appropriate W, there is a unique solution for a s and b c as long as K I + J. as an I 2 K matrix B c ik, according to X = y sc k i B c ik as i : (3) Often, there is a natural intuitive interpretation of these bilinear models, which we call asymmetric. For example, in our typography example, Eq. (2) provides font-specic basis functions A s jk (look ahead to Fig. 7 for an illustration), which are combined according to letter-specic coecients b c j. In other situations, such as speech recognition, it may be natural to see the style of an observation as a modulation of its underlying content, which is represented as a linear transformation of the content vector in Eq. (2). The disadvantages of the asymmetric models over the symmetric model of Eq. (1) are: (1) there is no explicit parameterization of the rendering function f independent of style and content, so a model trained on observations from a xed set of styles and content classes may be of no use in analyzing new data generated from novel settings of both factors; (2) the matrix style model in Eq. (2) or the matrix content model in Eq. (3) may be too exible and overt the training data. Note that of the three tasks described in Section 2, only translation requires an explicit parameterization of the abstract rendering function independent of both style and content; classication and extrapolation do not. So for these tasks, if overtting can be controlled by some appropriate kind of regularization, an asymmetric bilinear model of style and content can be just as useful as the symmetric version, and can often be applied when much less data is available for training, and/or when one of the factors cannot be reduced to a linear combination of basis elements. 4 Fitting bilinear models Our applications involve two phases: an initial training phase and a subsequent testing phase. In the training phase, we t a bilinear model to a complete matrix of observations in N s styles and N c content classes. This gives explicit representations of style independent of content, content independent of style, and, for the symmetric bilinear model, the interaction of style and content. In the testing phase (Figs. 2, 3 and 4), the same model is t to a new sample of observations which is assumed to have something in common with the training set, in content, in style, or at least in the interaction between content and style. During testing, the parameters (a s, b c,orw) corresponding to the assumed commonalities are clamped to their values learned during training. This enables better performance on a classication, extrapolation, or translation task with the test data than would have been possible without the initial training on the set of related observations. Singular value decomposition (SVD) can be used to t the parameters of the asymmetric model [12, 13, 10, 21]. This technique works the same regardless of whether we have one or more observations for each style-content pair, as long as we have the same number of observations for each pair.

8 Let the K-dimensional column vector y sc denote the mean of the observed data generated by the stylecontent pair fa s ;b c g, and stack these vectors into a single (K 2N s )2N c -dimensional measurement matrix 2 Y = 64 y y 1Nc.... y Ns1 y NsNc 3 75: (4) We compute the SVD of Y = USV T, and dene the (K 2 N s ) 2 J-dimensional matrix ^A to be the rst J columns of U, and the J 2N c -dimensional matrix ^B to be the rst J rows of SV T. The model dimensionality J can be chosen in various ways: by a priori considerations, by requiring a suciently good approximation to the data (as measured by mean squared error or some more subjective metric), or by looking for a gap in the singular value spectrum. Finally, we can identify ^A and ^B as the desired parameter estimates in stacked form, 2 A = 64 A1. A Ns 3 75 ; B = 2 b 1 111b Nc3 ; (5) where each A s ; 1 s N s ; corresponds to the style matrix A s jk and each bc ; 1 c N c ; corresponds to the content vector b c jin Eq. (2). Note that this is formally identical to the Tomasi and Kanade's use of the SVD to solve the structurefrom-motion problem under orthographic projection, but instead of camera motion and shape matrices replaced by style and content matrices respectively. See [21] for how to t model to more complicated data patterns (e.g. varying numbers of observations per style and content, uncertain assignment of observations to styles or content classes). For the symmetric model, there is an iterative procedure for tting model parameters to the data which converges to the least squares model t [12, 13]. The algorithm iteratively applies SVD as in the asymmetric case, repeatedly re-ordering the observation matrix elements to alternate the roles of the style and content factors. We refer the reader to the pseudo-code in Marimont and Wandell [13]. 5 Extrapolation 5.1 Typography To show that the bilinear models can analyze and synthesize something like style even for a very nonlinear problem, we apply these models to typography. The extrapolation task is to synthesize letters in a new font, given a set of examples of letters in that font. The letter representation is important. We need to represent shapes of dierent topologies in comparable forms. This motivates using a particle model to represent shape [20]. Further, we want the letters in our representation to behave like a linear vector space, where linear combinations of letters also look like letters. Beymer and Poggio [3] advocate a dense warp map for related problems. Combining the above, we chose to represent shape by the set of displacements that a set of particles would have to undergo from a reference shape to form the target shape. With identical particles, there are many possible such warp maps. For our models to work well, we want similar warps to represent similarly shaped letters. To accomplish this, we use a physical model (in the spirit of [17, 18], but with the goal of dense correspondence, rather than modal representation). We give each particle of the reference shape (taken to be the full rectangular bitmap) unit positive charge, and each pixel of the target letter negative charge proportional to its grey level intensity. The total charge of the target letter is set equal to the total charge of the reference shape. We track the electrostatic force lines from each particle of the reference shape to where it would intersect the plane of the target letter, set to be in front of and parallel to the reference shape. The force lines therefore land in a uniform density over the target letter, resulting in a smooth, dense warp map from each pixel of the reference shape to the letter. The electrostatic forces are easily calculated from Coulomb's law. We call a \Coulomb warp" representation. Figure 5 shows two simple shapes of dierent topologies, and the average of the two shapes in a pixel representation and in a Coulomb warp representation. Averaging the shapes in a pixel representation yields a simple \double-exposure" of the two images; averaging in a Coulomb warp representation results in a shape that looks like a shape in between the two. To render a warp map representation of a shape, we rst translate each particle of the reference shape, using a grid at four times the linear pixel resolution. We blur that, then sub-sample to the original font resolution. By allowing non-integer charge values and subpixel translations, we can preserve font anti-aliasing information. shapes pixel Coulomb warp averages Figure 5: Behavior of the Coulomb warp representation under averaging, compared with a pixel representation.

9 We applied the asymmetric bilinear model, in the representation above, to the extrapolation task of Fig. 2. Our shape representation allowed us to work with familiar and natural fonts, in contrast to earlier work on extrapolation in typography [6, 8] We digitized 62 letters (uppercase letters, lowercase letters, digits 0-9) of six fonts at pixels using Adobe Photoshop (uppercase letters shown in the rst 5 columns and last column of Fig. 6). We trained the asymmetric model to learn separate A i models for the rst 5 fonts (Chicago, Zaph, Times, Mistral and TimesBold) and B j models for all 62 letters. Fig. 7) shows the style-specic basis functions for each font. To generate a letter in two dierent fonts, we use the same linear combination of basis warps, applied to the appropriate basis for each font. Note that the character of the warp basis functions reects the character of the font: slanted, italic, upright, etc. For the testing phase, we are presented with the letters of a 6th font, Monaco, with the rst third of the upper case letters omitted. The extrapolation task is to synthesize the missing letters in the Monaco font. For this very non-linear task, we chose a high model dimensionality of 60. To control such a high dimensional model and avoid over-tting, we added a prior model for the font style. The target prior was the optimum style from the symmetric model: the linear combination of training styles A i which t the Monaco font the best, A s jk. We then used a gradient descent procedure to nd the font-specic style matrix OLC A which optimized (1 0 )(y sc k 0 X j A s jk bc j )2 + (A s jk 0 A s jk OLC ) 2 : (6) We set = 0:99995, which controlled overtting to give the best results. The next to last column of Fig. 6 shows the results. The model has succeeded in learning the style of the new font. Each letter looks more like Monaco than any of the other fonts. Even partially succeeds in generating the extended crossbars of the \I", characteristic of the Monaco font, even though none of the example \I"s in the training set have such a stylistic feature. 6 Classication 6.1 Face Pose Estimation We collected a face database of 11 subjects viewed in 15 dierent poses, Fig. 10. The poses span a grid of three vertical positions (up, level, down) and ve horizontal positions (extreme-left, left, straight-ahead, right, extreme-right). The pictures were shifted to align the nose tip position, found manually. The images were blurred and cropped to 22 x 32 pixels. We tested the model's usefulness for classication by training on the face images of 10 subjects viewed in all 15 poses, and then using the learned pose models to classify the 15 images of the remaining subject's face into the appropriate pose category. For training, we took as input the face database with each face labeled according to pose and subject identity, and with one subject removed for later testing. We then t the asymmetric bilinear model in Eq. (3) to this training data using the SVD procedure described in Section 4, yielding models of subject c's head, B c ik, and pose s, a s i. The model dimensionality I = 6 was chosen as the minimum dimensionality giving an adequate t to the training data. For testing, we take as input the 15 images of the remaining subject c 3 in dierent poses, but without pose labels. The goal is now to classify these images into the appropriate pose category, using the pose models a s i learned during the training phase. Now, if we had a model B c3 ik of the new subject's head, we could simply combine this head model with the known pose models and then assign each test image to the pose that gives it the highest likelihood, assuming a gaussian likelihood function with mean given by the model predictions from Eq. (3). But we have been given neither B c3 ik nor the appropriate pose labels for the test subject, so we use the EM algorithm to solve this clustering problem. EM iterates between estimating the parameters of a new head model B c3 ik (M-step), and (soft) assigning each image to the appropriate pose categories (E-step). For details of the EM algorithm applied to classication problems with a new style or content, see [21]. Classication performance is determined by the percentage of test images for which the probability of pose s, as given by EM, is greatest for the actual pose class. We repeated the experiment 11 times, leaving each of the 11 subjects out of the training set in turn. The SMM achieved an average performance of 81% correct. In fact, the performance distribution is somewhat skewed, with two or fewer errors on 8 out of 11 subjects, and six or more errors on 3 out of 11 subjects, indicating that the bilinear model learns an essentially faithful representation for most subjects' heads. For comparison, a simple 1-nearest neighbor matching algorithm yielded 53% correct, with no fewer than 5 errors on any test subject. In this example, we found it most natural to think of subject identity as the content factor, and pose as the style factor, so this task becomes an example of style classication under varying content. Of course, exactly the same techniques can be applied to classifying content under varying styles, and [21] reports one example in the domain of speech recognition. A bilinear model was trained on data from 8 speakers (4 male, 4 female) uttering 11 dierent vowels, and then used to classify the utterances of 7 new speakers (4 male, 3 female) into the correct vowel category, using the same EM procedure described above. The bilinear model with EM achieved 77% correct performance, compared to 51% for a multi-layer perceptron and 56% for 1-nearest neighbor, the best of many statistical techniques tested on the same dataset. 7 Translation 7.1 Color Constancy Translation problems occur frequently in vision. One example is the \color constancy" problem [4, 13, 10, 5]. The spectrum of light reecting o an object is a function of both the unknown illumination spec-

10 trum and the object's unknown reectance spectrum. The color of the reected light may vary wildly under dierent illumination conditions, yet human observers perceive colors to be relatively constant across illuminants. They can eectively translate the observations of unknown colors, viewed under an unknown illuminant, to how they would appear under a canonical illuminant. This is the translation problem of Fig. 4. References [4, 13, 10] analyze the case of full training information, but do not address the translation problem. Our approach is to learn a W matrix for the training set. We then hold that xed, and t for both the a and b parameters for the new observations. Then we can translate across style or content, as desired. In general, the color constancy problem is underdetermined [5]. Researchers have proposed using additional constraints or visual cues, such as interreections or specular reections. A nice property of the bilinear models is that these or other visual constaints can be learned. We show a synthetic example to illustrate this. Methods which exploit specularities to achieve color constancy typically require specularity levels large enough to dominate image chromaticity histograms [11, 9]. Here we show how small, random variations in specularity may also be used to achieve color constancy. We assume that from a localized patch, we observe three samples of light reected from the same object, each with an unknown fractional specular reection, uniformly distributed between 0 and 10%. (In a natural image, observations could consist of image values and local lter outputs. We have not yet undertaken the calibrated studies to determine how well this technique works for natural scenes.) Using a set of programs developed by Brainard [5], we drew 30 random samples from a 3-dimensional linear model for surface reectances, and 9 random samples from another 3-dimensional linear model for illuminants. We rendered these, using the randomly drawn specularity values, creating the full training observation matrix corresponding. We t this with the symmetric bilinear model, using a 3-dimensional model for illuminants and a 5-dimensional model for surface color and the random fractional specularity. This tting stage also sets the W matrix. We then drew 30 new color surfaces, viewed under a new random draw of illumination, with small random specularities added to each color observation. We use the W matrix trained above, and used iterations of the linear tting to t a and b values to the observations to describe the illuminant, surfaces, and specularities. There is an ambiguity of scale between a and b, so this approach can't t absolute intensities, but it nds the chromaticities correctly. With the symmetric bilinear model, we are able to predict the chromaticities of the new surfaces under a canonical illuminant, with an rms error of 0.07%. For comparison, the rms dierence in observation matrix chromaticities under the two illuminants is 54%. 7.2 Face and lighting translation We worked with the Weizmann database of face images under controlled variations in illumination (provided by Yael Moses). Images of 24 male faces under four dierent conditions of illumination were blurred and cropped to 40 x 24 pixels, removing most nonfacial features (hair, clothing, etc.) from the images. We tested the model's eectiveness for style and content translation by training the symmetric bilinear model on 23 of 24 faces viewed under 3 of 4 illuminants, leaving out one subject under all illuminants and one illuminant for all subjects. We then applied the learned model to render the novel 24th face under the 3 known illuminants and the 23 known faces under the new illuminant. Thus this is another example of style-content translation. For training, we took as input the face database of 24 faces under 4 illuminants, minus one subject under all illuminants and all subjects under illuminant. We then t the symmetric bilinear model y sicj = a si Wb cj to this training data using the iterated SVD procedure described above, yielding models of the ith illuminant, a s i, the jth face, b c j, and the interaction between illuminant and face, W. The dimensionalities for face and illuminant models were set equal to the maximum values that still allow a unique solution (the number of dierent faces and illuminants respectively), in order to give the models of face space and illumination space the largest range possible. For testing, input one image: the single unseen face under the single unseen illuminant. The goal is now to apply the illuminant-face interaction model W learned during training to recover appropriate face and illuminant models a s3 and b c3 for this new image which are commensurable with the face and illuminant models learned for the training set. Figure 10 shows the results of rendering the novel face under the three old illuminants, and rendering the 7 of the 23 known faces under the new illuminant. The quality of the synthesized results demonstrates that the style-content interaction model W learned during the training phase is sucient to allow good recovery of both face shape and illuminant from a single image during the test phase. Atick [1] also showed that shape-from-shading from a single image could be solved by assuming a lowdimensional shape space. An explicit shape space is learned directly from 3-D range maps and built into a physical model of shading. In contrast, we learn an implicit shape space, as well as a space of illuminants and a model of the interaction between illuminant and shape, only from the a set of 2-D images. 8 Summary We address learning two-factor problems in vision. We use both symmetric and asymmetric bilinear models to learn parameters describing both the style and content-classes of a set of observations. We identify and study the problems of extrapolation, classication, and translation, illustrating these with examples from color constancy, face pose estimation, shapefrom-shading, typography, and speech.

11 References [1] J. J. Atick, P. A. Grin, and A. N. Redlich. Statistical approach to shape from shading: Reconstruction of 3d face surfaces from single 2d images. Neural Computation, in press. [2] P. N. Belhumeur and D. J. Kriegman. What is the set of images of an object under all possible lighting conditions. In Proc. IEEE CVPR, pages 270{277, [3] D. Beymer and T. Poggio. Image representations for visual learning. Science, 272:1905{1909, June [4] M. D'Zmura. Color constancy: surface color from changing illumination. J. Opt. Soc. Am. A, 9:490{493, [5] W. T. Freeman and D. H. Brainard. Bayesian decision theory, the maximum local mass estimate, and color constancy. In Proc. 5th Intl. Conf. on Computer Vision, pages 210{217. IEEE, [6] I. Grebert, D. G. Stork, R. Keesing, and S. Mims. Connectionist generalization for production: An example from gridfont. Neural Networks, 5:699{ 710, [7] P. W. Hallinan. A low-dimensional representation of human faces for arbitrary lighting conditions. In Proc. IEEE CVPR, pages 995{999, [8] D. Hofstadter. Fluid Concepts and Creative Analogies. Basic Books, [17] S. E. Sclaro. Modal matching: a method for describing, comparing, and manipulating digital signals. PhD thesis, MIT, [18] L. Shapiro and J. Brady. Feature-based correspondence: an eigenvector approach. Image and Vision Computing, 10(5):283{288, June [19] A. Shashua. Geometry and photomnetry in 3D visual recognition. PhD thesis, MIT, [20] R. Szeleski and D. Tonnesen. Surface modeling with oriented particle systems. In Proc. SIG- GRAPH 92, volume 26, pages 185{194, In Computer Graphics, Annual Conference Series. [21] J. B. Tenenbaum and W. T. Freeman. Separable mixture models: Separating style and content. In Advances in Neural Information Processing Systems, [22] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: a factorization method. Intl. J. Comp. Vis., 9(2):137{ 154, [23] M. Turk and A. Pentland. Eigenfaces for recognition. Journal of cognitive neuroscience, 3(1), [24] S. Ullman and R. Basri. Recognition by linear combinations of models. IEEE Pat. Anal. Mach. Intell., 13(10):992{1006, [9] G. J. Klinker, S. A. Shafer, and T. Kanade. The measurement of highlights in color images. Intl. J. Comp. Vis., 2(1):7{32, [10] J. J. Koenderink and A. J. van Doorn. The generic bilinear calibration{estimation problem. Intl. J. Comp. Vis., in press. [11] H. Lee. Method for computing the sceneilluminant chromaticity from specular highlights. J. Opt. Soc. Am. A, 3(10):1694{1699, [12] J. R. Magnus and H. Neudecker. Matrix dierential calculus with applications in statistics and econometrics. Wiley, [13] D. H. Marimont and B. A. Wandell. Linear models of surface and illuminant spectra. J. Opt. Soc. Am. A, 9(11):1905{1913, [14] H. Murase and S. Nayar. Visual learning and recognition of 3-d objects from appearance. Intl. J. Comp. Vis., 14:5{24, [15] S. M. Omohundro. Family discovery. In Advances in Neural Information Processing Systems, [16] A. P. Pentland. Linear shape from shading. Intl. J. Comp. Vis., 1(4):153{162, 1990.

12 Identity Pose Figure 8: A subset of the face images used for pose classication. Basis images Pose Figure 6: Example of style extrapolation in the domain of typography. Training data included upper and lower case alphabets, and digits 0-9. Figure 9: A subset of the pose-specic basis faces. Note that in the bilinear model, each basis face plays the same role across poses. Figure 7: Warp bases. Figure 10: An example of translation.

UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision

UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision UW CSE Technical Report 03-06-01 Probabilistic Bilinear Models for Appearance-Based Vision D.B. Grimes A.P. Shon R.P.N. Rao Dept. of Computer Science and Engineering University of Washington Seattle, WA

More information

Face Locating and Tracking for Human{Computer Interaction. Carnegie Mellon University. Pittsburgh, PA 15213

Face Locating and Tracking for Human{Computer Interaction. Carnegie Mellon University. Pittsburgh, PA 15213 Face Locating and Tracking for Human{Computer Interaction Martin Hunke Alex Waibel School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Eective Human-to-Human communication

More information

Introduction to Logistic Regression

Introduction to Logistic Regression OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

More information

Keywords: Image Generation and Manipulation, Video Processing, Video Factorization, Face Morphing

Keywords: Image Generation and Manipulation, Video Processing, Video Factorization, Face Morphing TENSORIAL FACTORIZATION METHODS FOR MANIPULATION OF FACE VIDEOS S. Manikandan, Ranjeeth Kumar, C.V. Jawahar Center for Visual Information Technology International Institute of Information Technology, Hyderabad

More information

SYMMETRIC EIGENFACES MILI I. SHAH

SYMMETRIC EIGENFACES MILI I. SHAH SYMMETRIC EIGENFACES MILI I. SHAH Abstract. Over the years, mathematicians and computer scientists have produced an extensive body of work in the area of facial analysis. Several facial analysis algorithms

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Probabilistic Latent Semantic Analysis (plsa)

Probabilistic Latent Semantic Analysis (plsa) Probabilistic Latent Semantic Analysis (plsa) SS 2008 Bayesian Networks Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org} References

More information

Image Normalization for Illumination Compensation in Facial Images

Image Normalization for Illumination Compensation in Facial Images Image Normalization for Illumination Compensation in Facial Images by Martin D. Levine, Maulin R. Gandhi, Jisnu Bhattacharyya Department of Electrical & Computer Engineering & Center for Intelligent Machines

More information

Accurate and robust image superresolution by neural processing of local image representations

Accurate and robust image superresolution by neural processing of local image representations Accurate and robust image superresolution by neural processing of local image representations Carlos Miravet 1,2 and Francisco B. Rodríguez 1 1 Grupo de Neurocomputación Biológica (GNB), Escuela Politécnica

More information

Component Ordering in Independent Component Analysis Based on Data Power

Component Ordering in Independent Component Analysis Based on Data Power Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals

More information

From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract

From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract To Appear in the IEEE Trans. on Pattern Analysis and Machine Intelligence From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose Athinodoros S. Georghiades Peter

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode

More information

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

More information

Tracking Groups of Pedestrians in Video Sequences

Tracking Groups of Pedestrians in Video Sequences Tracking Groups of Pedestrians in Video Sequences Jorge S. Marques Pedro M. Jorge Arnaldo J. Abrantes J. M. Lemos IST / ISR ISEL / IST ISEL INESC-ID / IST Lisbon, Portugal Lisbon, Portugal Lisbon, Portugal

More information

Bayesian Image Super-Resolution

Bayesian Image Super-Resolution Bayesian Image Super-Resolution Michael E. Tipping and Christopher M. Bishop Microsoft Research, Cambridge, U.K..................................................................... Published as: Bayesian

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

Tracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

Nonlinear Iterative Partial Least Squares Method

Nonlinear Iterative Partial Least Squares Method Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

More information

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing CS Master Level Courses and Areas The graduate courses offered may change over time, in response to new developments in computer science and the interests of faculty and students; the list of graduate

More information

Chapter 4: Artificial Neural Networks

Chapter 4: Artificial Neural Networks Chapter 4: Artificial Neural Networks CS 536: Machine Learning Littman (Wu, TA) Administration icml-03: instructional Conference on Machine Learning http://www.cs.rutgers.edu/~mlittman/courses/ml03/icml03/

More information

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics

Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Tensor Methods for Machine Learning, Computer Vision, and Computer Graphics Part I: Factorizations and Statistical Modeling/Inference Amnon Shashua School of Computer Science & Eng. The Hebrew University

More information

Face Model Fitting on Low Resolution Images

Face Model Fitting on Low Resolution Images Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com

More information

Taking Inverse Graphics Seriously

Taking Inverse Graphics Seriously CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best

More information

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson

DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or

More information

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper

More information

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence

Artificial Neural Networks and Support Vector Machines. CS 486/686: Introduction to Artificial Intelligence Artificial Neural Networks and Support Vector Machines CS 486/686: Introduction to Artificial Intelligence 1 Outline What is a Neural Network? - Perceptron learners - Multi-layer networks What is a Support

More information

Highlight Removal by Illumination-Constrained Inpainting

Highlight Removal by Illumination-Constrained Inpainting Highlight Removal by Illumination-Constrained Inpainting Ping Tan Stephen Lin Long Quan Heung-Yeung Shum Microsoft Research, Asia Hong Kong University of Science and Technology Abstract We present a single-image

More information

A Short Introduction to Computer Graphics

A Short Introduction to Computer Graphics A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical

More information

The Lindsey-Fox Algorithm for Factoring Polynomials

The Lindsey-Fox Algorithm for Factoring Polynomials OpenStax-CNX module: m15573 1 The Lindsey-Fox Algorithm for Factoring Polynomials C. Sidney Burrus This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 2.0

More information

Learning Visual Behavior for Gesture Analysis

Learning Visual Behavior for Gesture Analysis Learning Visual Behavior for Gesture Analysis Andrew D. Wilson Aaron F. Bobick Perceptual Computing Group MIT Media Laboratory Cambridge, MA 239 drew, bobick@media.mit.edu Abstract A state-based method

More information

Television Control by Hand Gestures

Television Control by Hand Gestures MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Television Control by Hand Gestures William T. Freeman, Craig D. Weissman TR94-24 December 1994 Abstract We study how a viewer can control

More information

Statistical Models in Data Mining

Statistical Models in Data Mining Statistical Models in Data Mining Sargur N. Srihari University at Buffalo The State University of New York Department of Computer Science and Engineering Department of Biostatistics 1 Srihari Flood of

More information

Analecta Vol. 8, No. 2 ISSN 2064-7964

Analecta Vol. 8, No. 2 ISSN 2064-7964 EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,

More information

Feature Tracking and Optical Flow

Feature Tracking and Optical Flow 02/09/12 Feature Tracking and Optical Flow Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem Many slides adapted from Lana Lazebnik, Silvio Saverse, who in turn adapted slides from Steve

More information

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall

Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin

More information

CS 534: Computer Vision 3D Model-based recognition

CS 534: Computer Vision 3D Model-based recognition CS 534: Computer Vision 3D Model-based recognition Ahmed Elgammal Dept of Computer Science CS 534 3D Model-based Vision - 1 High Level Vision Object Recognition: What it means? Two main recognition tasks:!

More information

Visualization of Large Font Databases

Visualization of Large Font Databases Visualization of Large Font Databases Martin Solli and Reiner Lenz Linköping University, Sweden ITN, Campus Norrköping, Linköping University, 60174 Norrköping, Sweden Martin.Solli@itn.liu.se, Reiner.Lenz@itn.liu.se

More information

Revenue Management with Correlated Demand Forecasting

Revenue Management with Correlated Demand Forecasting Revenue Management with Correlated Demand Forecasting Catalina Stefanescu Victor DeMiguel Kristin Fridgeirsdottir Stefanos Zenios 1 Introduction Many airlines are struggling to survive in today's economy.

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

3D Model based Object Class Detection in An Arbitrary View

3D Model based Object Class Detection in An Arbitrary View 3D Model based Object Class Detection in An Arbitrary View Pingkun Yan, Saad M. Khan, Mubarak Shah School of Electrical Engineering and Computer Science University of Central Florida http://www.eecs.ucf.edu/

More information

Low-resolution Character Recognition by Video-based Super-resolution

Low-resolution Character Recognition by Video-based Super-resolution 2009 10th International Conference on Document Analysis and Recognition Low-resolution Character Recognition by Video-based Super-resolution Ataru Ohkura 1, Daisuke Deguchi 1, Tomokazu Takahashi 2, Ichiro

More information

Visualization and Feature Extraction, FLOW Spring School 2016 Prof. Dr. Tino Weinkauf. Flow Visualization. Image-Based Methods (integration-based)

Visualization and Feature Extraction, FLOW Spring School 2016 Prof. Dr. Tino Weinkauf. Flow Visualization. Image-Based Methods (integration-based) Visualization and Feature Extraction, FLOW Spring School 2016 Prof. Dr. Tino Weinkauf Flow Visualization Image-Based Methods (integration-based) Spot Noise (Jarke van Wijk, Siggraph 1991) Flow Visualization:

More information

Visualization by Linear Projections as Information Retrieval

Visualization by Linear Projections as Information Retrieval Visualization by Linear Projections as Information Retrieval Jaakko Peltonen Helsinki University of Technology, Department of Information and Computer Science, P. O. Box 5400, FI-0015 TKK, Finland jaakko.peltonen@tkk.fi

More information

Clustering & Visualization

Clustering & Visualization Chapter 5 Clustering & Visualization Clustering in high-dimensional databases is an important problem and there are a number of different clustering paradigms which are applicable to high-dimensional data.

More information

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006 Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,

More information

How To Analyze Ball Blur On A Ball Image

How To Analyze Ball Blur On A Ball Image Single Image 3D Reconstruction of Ball Motion and Spin From Motion Blur An Experiment in Motion from Blur Giacomo Boracchi, Vincenzo Caglioti, Alessandro Giusti Objective From a single image, reconstruct:

More information

The Image Deblurring Problem

The Image Deblurring Problem page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation

More information

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

More information

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies

COMP175: Computer Graphics. Lecture 1 Introduction and Display Technologies COMP175: Computer Graphics Lecture 1 Introduction and Display Technologies Course mechanics Number: COMP 175-01, Fall 2009 Meetings: TR 1:30-2:45pm Instructor: Sara Su (sarasu@cs.tufts.edu) TA: Matt Menke

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Wii Remote Calibration Using the Sensor Bar

Wii Remote Calibration Using the Sensor Bar Wii Remote Calibration Using the Sensor Bar Alparslan Yildiz Abdullah Akay Yusuf Sinan Akgul GIT Vision Lab - http://vision.gyte.edu.tr Gebze Institute of Technology Kocaeli, Turkey {yildiz, akay, akgul}@bilmuh.gyte.edu.tr

More information

Daniel F. DeMenthon and Larry S. Davis. Center for Automation Research. University of Maryland

Daniel F. DeMenthon and Larry S. Davis. Center for Automation Research. University of Maryland Model-Based Object Pose in 25 Lines of Code Daniel F. DeMenthon and Larry S. Davis Computer Vision Laboratory Center for Automation Research University of Maryland College Park, MD 20742 Abstract In this

More information

Machine Learning for Data Science (CS4786) Lecture 1

Machine Learning for Data Science (CS4786) Lecture 1 Machine Learning for Data Science (CS4786) Lecture 1 Tu-Th 10:10 to 11:25 AM Hollister B14 Instructors : Lillian Lee and Karthik Sridharan ROUGH DETAILS ABOUT THE COURSE Diagnostic assignment 0 is out:

More information

Tensor Factorization for Multi-Relational Learning

Tensor Factorization for Multi-Relational Learning Tensor Factorization for Multi-Relational Learning Maximilian Nickel 1 and Volker Tresp 2 1 Ludwig Maximilian University, Oettingenstr. 67, Munich, Germany nickel@dbs.ifi.lmu.de 2 Siemens AG, Corporate

More information

Bayesian Statistics: Indian Buffet Process

Bayesian Statistics: Indian Buffet Process Bayesian Statistics: Indian Buffet Process Ilker Yildirim Department of Brain and Cognitive Sciences University of Rochester Rochester, NY 14627 August 2012 Reference: Most of the material in this note

More information

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING

SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING AAS 07-228 SPECIAL PERTURBATIONS UNCORRELATED TRACK PROCESSING INTRODUCTION James G. Miller * Two historical uncorrelated track (UCT) processing approaches have been employed using general perturbations

More information

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set

EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set EM Clustering Approach for Multi-Dimensional Analysis of Big Data Set Amhmed A. Bhih School of Electrical and Electronic Engineering Princy Johnson School of Electrical and Electronic Engineering Martin

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Part-Based Recognition

Part-Based Recognition Part-Based Recognition Benedict Brown CS597D, Fall 2003 Princeton University CS 597D, Part-Based Recognition p. 1/32 Introduction Many objects are made up of parts It s presumably easier to identify simple

More information

Introduction to Computer Graphics

Introduction to Computer Graphics Introduction to Computer Graphics Torsten Möller TASC 8021 778-782-2215 torsten@sfu.ca www.cs.sfu.ca/~torsten Today What is computer graphics? Contents of this course Syllabus Overview of course topics

More information

Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University Digital Imaging and Multimedia Filters Ahmed Elgammal Dept. of Computer Science Rutgers University Outlines What are Filters Linear Filters Convolution operation Properties of Linear Filters Application

More information

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns

. Learn the number of classes and the structure of each class using similarity between unlabeled training patterns Outline Part 1: of data clustering Non-Supervised Learning and Clustering : Problem formulation cluster analysis : Taxonomies of Clustering Techniques : Data types and Proximity Measures : Difficulties

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Interactive Visualization of Magnetic Fields

Interactive Visualization of Magnetic Fields JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 21 No. 1 (2013), pp. 107-117 Interactive Visualization of Magnetic Fields Piotr Napieralski 1, Krzysztof Guzek 1 1 Institute of Information Technology, Lodz University

More information

A Computational Framework for Exploratory Data Analysis

A Computational Framework for Exploratory Data Analysis A Computational Framework for Exploratory Data Analysis Axel Wismüller Depts. of Radiology and Biomedical Engineering, University of Rochester, New York 601 Elmwood Avenue, Rochester, NY 14642-8648, U.S.A.

More information

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

More information

Relating Vanishing Points to Catadioptric Camera Calibration

Relating Vanishing Points to Catadioptric Camera Calibration Relating Vanishing Points to Catadioptric Camera Calibration Wenting Duan* a, Hui Zhang b, Nigel M. Allinson a a Laboratory of Vision Engineering, University of Lincoln, Brayford Pool, Lincoln, U.K. LN6

More information

Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr

Object Recognition. Selim Aksoy. Bilkent University saksoy@cs.bilkent.edu.tr Image Classification and Object Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Image classification Image (scene) classification is a fundamental

More information

High-dimensional labeled data analysis with Gabriel graphs

High-dimensional labeled data analysis with Gabriel graphs High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use

More information

Texture. Chapter 7. 7.1 Introduction

Texture. Chapter 7. 7.1 Introduction Chapter 7 Texture 7.1 Introduction Texture plays an important role in many machine vision tasks such as surface inspection, scene classification, and surface orientation and shape determination. For example,

More information

High-Dimensional Image Warping

High-Dimensional Image Warping Chapter 4 High-Dimensional Image Warping John Ashburner & Karl J. Friston The Wellcome Dept. of Imaging Neuroscience, 12 Queen Square, London WC1N 3BG, UK. Contents 4.1 Introduction.................................

More information

Efficient online learning of a non-negative sparse autoencoder

Efficient online learning of a non-negative sparse autoencoder and Machine Learning. Bruges (Belgium), 28-30 April 2010, d-side publi., ISBN 2-93030-10-2. Efficient online learning of a non-negative sparse autoencoder Andre Lemme, R. Felix Reinhart and Jochen J. Steil

More information

Template-based Eye and Mouth Detection for 3D Video Conferencing

Template-based Eye and Mouth Detection for 3D Video Conferencing Template-based Eye and Mouth Detection for 3D Video Conferencing Jürgen Rurainsky and Peter Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz-Institute, Image Processing Department, Einsteinufer

More information

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

More information

Solution of Linear Systems

Solution of Linear Systems Chapter 3 Solution of Linear Systems In this chapter we study algorithms for possibly the most commonly occurring problem in scientific computing, the solution of linear systems of equations. We start

More information

Geometric-Guided Label Propagation for Moving Object Detection

Geometric-Guided Label Propagation for Moving Object Detection MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Geometric-Guided Label Propagation for Moving Object Detection Kao, J.-Y.; Tian, D.; Mansour, H.; Ortega, A.; Vetro, A. TR2016-005 March 2016

More information

Character Image Patterns as Big Data

Character Image Patterns as Big Data 22 International Conference on Frontiers in Handwriting Recognition Character Image Patterns as Big Data Seiichi Uchida, Ryosuke Ishida, Akira Yoshida, Wenjie Cai, Yaokai Feng Kyushu University, Fukuoka,

More information

Robot Perception Continued

Robot Perception Continued Robot Perception Continued 1 Visual Perception Visual Odometry Reconstruction Recognition CS 685 11 Range Sensing strategies Active range sensors Ultrasound Laser range sensor Slides adopted from Siegwart

More information

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report

Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 269 Class Project Report Automatic 3D Reconstruction via Object Detection and 3D Transformable Model Matching CS 69 Class Project Report Junhua Mao and Lunbo Xu University of California, Los Angeles mjhustc@ucla.edu and lunbo

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

A FUZZY LOGIC APPROACH FOR SALES FORECASTING

A FUZZY LOGIC APPROACH FOR SALES FORECASTING A FUZZY LOGIC APPROACH FOR SALES FORECASTING ABSTRACT Sales forecasting proved to be very important in marketing where managers need to learn from historical data. Many methods have become available for

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

Analyzing Facial Expressions for Virtual Conferencing

Analyzing Facial Expressions for Virtual Conferencing IEEE Computer Graphics & Applications, pp. 70-78, September 1998. Analyzing Facial Expressions for Virtual Conferencing Peter Eisert and Bernd Girod Telecommunications Laboratory, University of Erlangen,

More information

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines

EECS 556 Image Processing W 09. Interpolation. Interpolation techniques B splines EECS 556 Image Processing W 09 Interpolation Interpolation techniques B splines What is image processing? Image processing is the application of 2D signal processing methods to images Image representation

More information

Colorado School of Mines Computer Vision Professor William Hoff

Colorado School of Mines Computer Vision Professor William Hoff Professor William Hoff Dept of Electrical Engineering &Computer Science http://inside.mines.edu/~whoff/ 1 Introduction to 2 What is? A process that produces from images of the external world a description

More information

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006

Robert Collins CSE598G. More on Mean-shift. R.Collins, CSE, PSU CSE598G Spring 2006 More on Mean-shift R.Collins, CSE, PSU Spring 2006 Recall: Kernel Density Estimation Given a set of data samples x i ; i=1...n Convolve with a kernel function H to generate a smooth function f(x) Equivalent

More information

Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior

Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Kenji Yamashiro, Daisuke Deguchi, Tomokazu Takahashi,2, Ichiro Ide, Hiroshi Murase, Kazunori Higuchi 3,

More information

Hands Tracking from Frontal View for Vision-Based Gesture Recognition

Hands Tracking from Frontal View for Vision-Based Gesture Recognition Hands Tracking from Frontal View for Vision-Based Gesture Recognition Jörg Zieren, Nils Unger, and Suat Akyol Chair of Technical Computer Science, Ahornst. 55, Aachen University (RWTH), 52074 Aachen, Germany

More information

Adaptive Face Recognition System from Myanmar NRC Card

Adaptive Face Recognition System from Myanmar NRC Card Adaptive Face Recognition System from Myanmar NRC Card Ei Phyo Wai University of Computer Studies, Yangon, Myanmar Myint Myint Sein University of Computer Studies, Yangon, Myanmar ABSTRACT Biometrics is

More information

Introduction. C 2009 John Wiley & Sons, Ltd

Introduction. C 2009 John Wiley & Sons, Ltd 1 Introduction The purpose of this text on stereo-based imaging is twofold: it is to give students of computer vision a thorough grounding in the image analysis and projective geometry techniques relevant

More information

Projection Center Calibration for a Co-located Projector Camera System

Projection Center Calibration for a Co-located Projector Camera System Projection Center Calibration for a Co-located Camera System Toshiyuki Amano Department of Computer and Communication Science Faculty of Systems Engineering, Wakayama University Sakaedani 930, Wakayama,

More information

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr

MVA ENS Cachan. Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Machine Learning for Computer Vision 1 MVA ENS Cachan Lecture 2: Logistic regression & intro to MIL Iasonas Kokkinos Iasonas.kokkinos@ecp.fr Department of Applied Mathematics Ecole Centrale Paris Galen

More information

Robust Principal Component Analysis for Computer Vision

Robust Principal Component Analysis for Computer Vision Robust Principal Component Analysis for Computer Vision Fernando De la Torre Michael J. Black Ý Departament de Comunicacions i Teoria del Senyal, Escola d Enginyeria la Salle, Universitat Ramon LLull,

More information

Image Hallucination Using Neighbor Embedding over Visual Primitive Manifolds

Image Hallucination Using Neighbor Embedding over Visual Primitive Manifolds Image Hallucination Using Neighbor Embedding over Visual Primitive Manifolds Wei Fan & Dit-Yan Yeung Department of Computer Science and Engineering, Hong Kong University of Science and Technology {fwkevin,dyyeung}@cse.ust.hk

More information

Metrics on SO(3) and Inverse Kinematics

Metrics on SO(3) and Inverse Kinematics Mathematical Foundations of Computer Graphics and Vision Metrics on SO(3) and Inverse Kinematics Luca Ballan Institute of Visual Computing Optimization on Manifolds Descent approach d is a ascent direction

More information

Tracking in flussi video 3D. Ing. Samuele Salti

Tracking in flussi video 3D. Ing. Samuele Salti Seminari XXIII ciclo Tracking in flussi video 3D Ing. Tutors: Prof. Tullio Salmon Cinotti Prof. Luigi Di Stefano The Tracking problem Detection Object model, Track initiation, Track termination, Tracking

More information