Recognizing focus areas using isophote pupil location

Size: px
Start display at page:

Download "Recognizing focus areas using isophote pupil location"

Transcription

1 Recognizing focus areas using isophote pupil location Gijs Kruitbosch Universiteit van Amsterdam June 27, 2008 Course: Bachelor Thesis Project 2008 BA Kunstmatige Intelligentie Universiteit van Amsterdam Spui 21, 1012 WX Amsterdam The Netherlands Supervisors: Theo Gevers and Nicu Sebe and Intelligent Sensory Information Systems Informatics Institute Universiteit van Amsterdam Kruislaan 403, 1098 SJ Amsterdam The Netherlands

2 Abstract In several subject areas within computer science, such as usability research and alternative input devices, eye gaze tracking plays an important role. While there are several existing approaches (eg. [17, 16, 15]), most of them are either invasive by requiring the attachment of sensors, cameras or controllers to the user s head, or are prohibitively expensive, such as those using infrared corneal reflection techniques and stereovision. In this paper, I introduce a method that tracks the eye s gaze using only a webcam. It relies on face detection using boosted cascade classifiers, as proposed in [20], 3D reconstruction using POSIT, as described in [5], and pupil detection using isophote centers as discussed in [19]. It also uses the isophotes to detect the eye corners, and Lucas Kanade tracking, as developed in [13], to keep track of the eye corners once found. This method was implemented in a proof of concept application using OpenCV. This proof of concept has been evaluated, discovering several important issues with the approach used. These concern not only the pupil tracking itself, but also the effective localization of a face in 3D. Based on the problems discovered, several approaches for further improvement are suggested. In conclusion, while this implementation did not achieve an accuracy anywhere near sufficient to be usable in general real-world applications, I remain hopeful as to its eventual usefulness in the field. Acknowledgments This research could not have happened without the aid of Theo Gevers and Nicu Sebe, my supervisors, or their PhD student Roberto Valenti. Nor would I have been as motivated without the help of Leo Dorst and Andrea Haker. For helping the honours program being finalized in a longer thesis, I am indebted to Bert Bredeweg. Finally, I am grateful to the Mozilla Foundation for facilitating my attendance of the SightCity conference in Frankfurt, where I was able to discuss the issues I faced with several people who had confronted them before. 1

3 Contents 1 Introduction Earlier approaches to eyegaze tracking Current approach Research goals 4 3 Background Theory Human Eyes Pinhole Camera Model D Transformation Approach Face Detection Pupil Detection Face Location Determination Eyecorner Detection Geometrical Constraints Aggregating over multiple frames Tracking using Lucas Kanade optical flow estimation Eye Gaze Determination Screen Intersect Location Ray-plane intersection Point in rectangle Screen coordinate Synthesis Implementation Limitations Efficiency Discussion Face localization in 3D Eye corner detection Pupil detection Proposed Improvements Alternative face localization Alternative eye corner detection and improvements in pupil detection Conclusion 19 2

4 1 Introduction Humans have always, to a very large extent, relied on their eyesight when going about their business in the world. Because of this importance as a sensor, detecting the focus of a person s eyes will in many instances give a strong indication of the focus of the person themselves. This focus function works in two directions: visual focus helps someone decide where to focus their attention, and attention helps determine someone s visual focus. While interviews, surveys and introspection can give indirect indications, these approaches are obviously limited. Therefore, it can sometimes be desirable to record the focus of someone s gaze directly, rather than asking them to comment on it. For more background on the complex interactions between attention, thought and the actual visual focus exhibited by the pupil movements, I refer the reader to [7]. 1.1 Earlier approaches to eyegaze tracking Various approaches have been used to determine the focus of someone s gaze. Originally, the most oft-used approach was based on measuring the electric potential differences of the skin area around the eyes. Needless to say, this was quite intrusive, and did not allow for completely free head movement because of the wires attached to the sensors. A second approach that can be used is to have the subject wear special contact lenses or otherwise manipulate the surface of the eye, and to deduce its movement using a coil and electromagnetism. This method is very accurate [23], but has two strong disadvantages in that it causes a large amount of discomfort to the subject, as well as not tracking head movement, making it unsuitable to detect the actual point of focus (which is also influenced, of course, by the pose of the head of the subject). Then it is also possible to use photo or video material of the eyes alone to accomplish this task, but this suffers from the same problem as the contact lens method, as it is unable to determine head movement and therefore can not give any indication of the actual point at which the user is focusing. Additionally, this method requires a head-mounted camera, which is also often uncomfortable. Finally, a popular more recent method is to capture video material of the user s face and use (infrared) light reflection off the cornea of the eye (for an illustration of the anatomy of the eye, please refer to figure 1). Because of the shape of the eye, several reflections can be visible if a relatively highresolution camera is used, and the difference in location between these reflections and the pupil location can account for both rotation and translation of the actual eye (for a more in-depth look at the biology of the human eye, please refer to section 3.1). This approach is very accurate, but usually at the expense of requiring head stabilisation, which then adds more discomfort for the subject. 1.2 Current approach Currently, researchers at the Universiteit van Amsterdam are working on pupil location detection based on isophotes. In principle, this approach attempts to use isocenters (the centerpoints of curved isophotes) to locate the pupil. For a more detailed discussion of this approach, I refer the reader to section 4.2. Using various techniques to deduce the other needed data (such as head position, and the position of the pupil relative to the eyecorners and face) it would be possible to use this method of eyegaze detection without having to rely on (typically expensive) infrared equipment. This is the approach that I attempted to use for this thesis. The remainder of this thesis is organized as follows: first I will discuss the research goals of the project (section 2), then some of the basic background theory (section 3), and then a detailed description of the approach I used (section 4), with some notes about the actual implementation in section 5. I will then discuss the functionality and problems of this implementation in section 6. Finally, some ideas for improvements and some conclusions about the project are discussed in sections 7 and 8, respectively. 3

5 2 Research goals In this project, I attempted to investigate the possibilities of using only a webcam to do eyetracking, without using stereovision, infrared lights, headrests, or other tools that are either expensive or invasive for the user. My main research question was: To what extent is it possible to identify areas of the screen on which the user is focusing, by locating the user s pupil location using an ordinary webcam? Several auxiliary questions were considered: Is it possible to reliably use the detection of the location of the pupil and the face to obtain a gaze direction? Using the gaze direction, is it possibly to reliably determine which area of the screen is being viewed? Can this be done in realtime, with an ordinary webcam? When attempting to answer these questions, the goal was to create an implementation that would do eyetracking with just a webcam. Ideally, it would give back a point on the screen which would be accurate to a reasonable degree, and it would work in realtime. 3 Background Theory In this section, I will attempt to outline some of the most basic theory that is required to understand the concept, approach and implementation of eyetracking. First I will discuss some of the biological aspects of the human eye, then some theory behind the pinhole camera model used in computer vision, and finally some background on the 3D transformations between world models involved in the two things that see here: the webcam and the subject looking at the computer. 3.1 Human Eyes In humans, the eyes are one of the most relied-upon senses. In the human eye, light enters through the lens in the pupil, and is projected onto the retina. The muscles around the eye are able to determine the optical power (focus length and so on) of the eye by controlling the curvature of the lens. Using these muscles, humans are able to focus their eyes. On the retina, several types of cells (traditionally called rods and cones ) function as light receptors. The cones are best able to distinguish colour in high-intensity light, the rods are best able to distinguish between dim and achromatic light, roughly corresponding to humans day- and nighttime vision, respectively. When observing objects in situations with sufficient light, humans attempt to use their fovea (see figure 1), an area in the retina with a high density of cones and no rods. This area provides the sharpest vision. In order to focus on objects in different locations in the world, humans are able to move their eyes while keeping their head in the same position. Note that this is not as self-evident as it seems, for many animals are not able to accomplish this. For these movements, humans use six different muscles, allowing the eye six degrees of freedom: translation in three directions, and rotation over the three different axes. I will not detail the specifics of the movements of the eyes here, but will merely point out that several different movements of the eyes may be distinguished: saccades, in which the eye shifts 4

6 Figure 1: Schematic drawing of the human eye focus abruptly from one point to another, smooth pursuits, in which the eye focus tracks a moving object, and fixation, the adjustment of the eye focusing on a particular stationary point (where the pupil need not necessarily remain entirely still). A rather old, but still quite useful introduction on the topic of eye movements may be found in [2]. In terms of range, the horizontal field of view that both eyes are capable of viewing spans 114 degrees [10]. This is smaller than the field of view of the individual eyes, as both eyes are capable of viewing a small area in the field of view that the other eye cannot reach. 3.2 Pinhole Camera Model In terms of optics, the pinhole camera model is an ideal model of the traditional pinhole camera. In the model, the pinhole is seen as a single point in space, without any lenses being involved. This allows one to treat the transformation from the 3D points in space to the 2D points on the image plane as a simple projection, which can be represented as a matrix multiplication, as shown in equation 1. x 2d = P X 3d (1) where P is the projection (camera) matrix. Of course, the fact that in a normal camera, there really is a lens, and the fact that the resulting 2D image in a digital camera is discrete (that is, it is pixellated) rather than continuous, mean that one has to be careful when applying this model. A more detailed description of why this is the case, what distortion effects can be caused by various lenses, and how to remedy them, is outside the scope of this paper, but may be found in existing literature, eg. [6] and [9]. 5

7 I will use the pinhole camera model in the attempt to build the eyetracker, without accounting for any lens discrepancies myself. This explicitly runs the risk of not being very accurate, but there are several factors which I decided made endeavouring to compensate for the inaccuracies of the pinhole model unprofitable: In an average webcam, the noise and light conditions will usually provide problems in effectively identifying features. The discrepancy caused by lens distortion would, in comparison, be relatively minor. The average person will use their computer more or less centrally focused on the screen, above which the webcam will usually be attached. In the middle of the image, the distortions caused by the pinhole model are even smaller, and so they are even less important. The resolution of a webcam is so small that any distortion is consequently also small. This is, in fact, another point because of which using the pinhole model is not entirely realistic, as the difference between the discrete webcam image and the continuous image the pinhole model assumes becomes even more pronounced. However, there is no way to solve the basic problem that a webcam does not have a very high resolution without violating the assumptions of my research goals (section 2) where I explicitly specified that I wanted to use basic equipment D Transformation It is important to realize that in the theory and implementation of the eye tracker, I will deal with two 3D coordinate systems: the camera coordinate system; the face (subject) coordinate system; and two 2D coordinate systems: the computer screen coordinate system; the webcam image coordinate system. It is important to always keep in mind with which coordinate system calculations are made. Roughly, one can say that the observing of the subject s viewing of an object by the webcam goes from 2D points in the webcam image, using 3D model coordinates that correspond to the size of a human face, to 3D points in space (in the camera coordinate system) for the face and eye positions. For the actual gaze, we start with a 3D vector in the face model coordinates that is transformed to a 3D vector (point) in space, using the deduced transformation to the camera coordinate system. The ray along this vector is then intersected with the screen plane, obtaining a 3D intersection point. This point then needs to be transformed into a 2D point in screen coordinates. I will explain the specifics of transforming between these different coordinate systems in the next section. 4 Approach My eventual goal is to use the sequence of frames produced by the webcam to deduce a point at which the subject that is visible on these frames is looking. In principle, there are two factors involved that determine the point on the screen a subject is looking at. These are the position and rotation of the head, relative to the screen, and the direction of the gaze of the user s eyes. So, from the 2D webcam image, it is necessary to deduce the 3D position and rotation of the head, and the direction of the gaze of the eyes. 6

8 Figure 2: The 3D coordinate systems: the camera (red dot at the origin) coordinate system and X, Y, Z axes, with the face (in green) and the screen (in blue). The face shows its eyes (red) and center (green) with its own coordinate system (grey, unlabelled). However, it is also necessary to determine the position of the screen relative to the webcam, in order to deduce where on the screen the user is looking. First, I will focus on the detection of the face and pupils, and how we can use these to determine the location of the user s head in 3D. Then I will focus on how we can deduce the direction of the gaze of the user s eyes. Finally, I will consider how we can combine this information to make an informed decision about where the user is actually looking. 4.1 Face Detection First, we must detect the user s head. In this case, the pupil detection code which I received for my work already used a cascade of boosted classifiers working with Haar-like features, as introduced in the seminal paper by Viola and Jones [20], but there are certainly other possibilities. The detector I used is built-in in the OpenCV graphics library. It uses learned classifications of simple Haar-like features. Haar-like features represent oriented contrasts in small image regions, such as dark region left, light region right. Using multiple such features (a cascade) allows one to represent complex structures such as the human face. The face detector is first trained on a set of positive examples (faces) and negative examples (random image noise). The face detector returns a rectangle that encases the face in the 2D image, and therefore gives us rough 2D positions of the face. This is illustrated in figure Pupil Detection Just like for the face detection procedure, the code I received already contained a method to detect pupil locations. The pupil detection uses a region in what is roughly the upper half of the face, to look for the eyes and their pupils. In this region, it uses isophotes. These are areas of equal intensity in the image. It then calculates the curvature of these isophotes, and finally tries to find the centers of these curvatures. For each curvature, a vote is registered for its center, depending on the curvedness. Because isophote density is greater around the edges of objects, and the curvedness is influenced by this density, isophotes along object edges have a larger say in where isocenters are found. The votes are summed, and 7

9 Figure 3: The face (red square) as detected by the boosted cascade classifier. the point with the highest number of votes is used to determine the isocenter using a simple mean-shift algorithm. Using this method turns out to be a very workable way of finding the pupil positions. For a more extensive discussion of this method, I refer the reader to the paper by Valenti et al. [19] 4.3 Face Location Determination Now that we know the location of the face, and two points on the face (the pupils), we can attempt to locate the face in 3D. For this we need one additional parameter: the focus length of the camera. We use the center of the face detected as representing a point at roughly the same depth as the eyes, in the nose. Now that we have 3 points and the focus length, we are able to use the POSIT algorithm. POSIT reconstructs the 3D position of a rigid body model (one where the distances between the points in the model do not change) given both the 3D model points and the 2D image points corresponding to these model points, and the focus length used to produce the image. The POSIT algorithm assumes that the differences between the Z coordinates of the object model points are very small compared to the distance between the camera and the object model points. This assumption allows for the use of a Scaled Orthographic Projection (SOP) rather than a true perspective projection. So, the approximate image coordinates x i and y i of an object point i with camera world coordinates X i, Y i and Z i in a scaled orthographic projection are: x i = fx i /Z 0, y i = fy i /Z 0 compared to the normal perspective projection: x i = fx i /Z i, y i = fy i /Z i This simplification allows the calculation of an approximate pose. Then, a simplification of the original object model can be used, where the different object points are at the same Z coordinate (but still on their original line of sight from the camera). If we use this deformation of the object model with the approximate pose we calculated earlier, using the scaled orthographic projection model, we should get back the same points. If we do, we ve found a correct pose. If not, we repeat the previous steps with the newly found image points. It can be shown that iteration using this method converges on the actual pose of the object, given that the distance between the camera and the object is sufficiently large compared to the distance between the object points themselves. For a full explanation of the algorithm and the proofs associated with it, please refer to the original paper [5]. 8

10 4.4 Eyecorner Detection When using isocenters to deduce the image points corresponding to the pupils, the pupils are the most strongly present amongst the different isocenters, but the eye corners are also usually found. Hence, using the same algorithm described in section 4.2, it is possible to find several candidate points for the eye corners. Below, I will outline how it is possible to distinguish between the eye corners and other isocenters. This can be done using some geometrical constraints, by aggregating data from several frames, and by tracking the actual eye corner once it has been found Geometrical Constraints To distinguish the actual eye corners from the other isocenters, some simple geometrical constraints are used. A formula representing the line segment connecting the two pupils is deduced, and the y coordinate of the eye is limited to being close to this line (where close is defined relative to the size of the image and the vertical distance between the two pupils). The x coordinate is limited to being half the distance between the two pupils away from the nearest pupil. This provides an effective window, no matter where the eyes are looking. For an illustration of these restrictions, please see figure 4. Figure 4: Illustration of the geometrical constraints (blue) on the isocenters that were recognized as eye corners (green). The pupils are shown in red, and the red square is the face detected by the boosted cascade classifier as described in section 4.1. Furthermore, the distance between the two eye corners of each eye needs to be approximately the same. Ideally, the latter constraint would take the rotation of the face into account, and set tighter boundaries on the sizes of the eye corners, but that subtlety has not been represented in the approach taken here due to the relative inaccuracy of the rotation and translation information found using POSIT (please also see section 6 for a more complete discussion of the weaknesses of this system). Because of this inaccuracy, it was not deemed useful to try to represent the rotation in the scale of the eyes: it would most likely lead to a loss in accuracy on the eye corner detection due to the rotation often being incorrect. 9

11 Figure 5: Occasionally, isocenters are found in the eyebrows and other odd locations Aggregating over multiple frames Using the constraints outlined above, several isocenters remain as valid eye corner options. To eliminate more false positives, the isocenters found over several frames are compared, and only those are retained which remain consistent across several frames. It is important to apply some leniency in comparing isocenters across frames: they need not be exactly the same, due to image noise and slight head adjustments. In practice, I found that looking for an isocenter that was within the graphical constraints, and which reoccurred within a continuous sequence of 5 frames, was an effective approach. Here, a reoccurrance was deemed to exist if an isocenter was found was less than 5 pixels away from the first frame s isocenter (approximately 0.78% of the image s width). This approach removes almost all of the incidental isocenters occurring in odd locations such as the eyebrows or cheeks, some examples of which can be seen in figure 5. However, it has the additional disadvantage of losing track of the eye corners when the face changes position, or if an eye corner does not appear as an isocenter all the time. In order to mitigate this, tracking of the eye corner features is used Tracking using Lucas Kanade optical flow estimation The tracking is done using the Lucas Kanade method for optical flow estimation. A full discussion of this algorithm is outside the scope of this thesis, but I will try to explain the basic procedure in this section. For a full discussion, I refer the reader to the seminal paper by Lucas and Kanade, 1981 [13]. The Lucas Kanade method, like other optical flow estimation algorithms, tries to deduce the movement that occurred between two images. Suppose, for simplicity, that we are considering a onedimensional image F that was translated by some h to produce image G. In order to find h, the algorithm finds the derivative of F. It then assumes that the derivative can be approximated linearly for reasonably small h, so that: G(x) = F (x + h) F (x) + hf (x) We can formulate the difference between F (x + h) and G over the entire curve as an L2 norm: E = x [F (x + h) G(x)] 2 10

12 Then, to find the h which minimizes this difference norm, we can set: 0 = δe δh δ [F (x) + hf (x) G(x)] 2 δh x x 2F (x)[f (x) + hf (x) G(x)] Using the above, we can deduce: x h F (x)[g(x) F (x)] x F (x) 2 which can be implemented in an iterated fashion using: h 0 = 0, h k+1 = h k + x F (x + h k )[G(x) F (x + h k )] x F (x + h k ) 2. Usually, a weighting function is used to account for the fact that at some points, the assumption that F (x) is linear holds better than at others. A weighting function allows those points to play a larger part in determining h (and conversely, to give less weight to points where the assumption of linearity (so F (x) being close to 0) does not hold). This approach can be generalized to multiple dimensions, using vectors for x and h, and a gradient operator rather than the derivative. The algorithm can also be generalized to take into account shear, rotation, and so on. In this case, we use the algorithm to obtain the coordinates of the eye corners in some frame a, given the coordinates of the eye corners (obtained through the isocenter procedure with aggregration and geometrical limits) in an earlier frame b, along with the image data of frames a and b, allowing the tracking of the eye corner features using this Lucas Kanade method, until the approach outlined above finds the eye corners again after the head stabilizes. 4.5 Eye Gaze Determination When the position of the pupil as well as the positions of the eye corners for at least one eye are all known, it is possible to deduce a gaze vector. This can be done using the distance between the pupil and the eye corners, and correlating this with the visual field of the average human. Figure 6: Determining the relative position of the pupil (point D) compared to the eye corners (A and C). 11

13 In order to do this accurately even if the face (and hence the eye) is rotated, the difference vector from the pupil to the eye corner left of it in the image is split into a vector (AB) along the line (AC) that intersects both eye corners, and a vector (BD) perpendicular to this line (see figure 6). In other words, we project the pupil point from the image coordinate space in 2D to a new coordinate space in 2D, in which the line segment between the eye corners is on the x axis, and the left eye corner is the origin. The size of these vectors is compared to the size of the eye, and a corresponding rotation will be applied to a unit vector along the Z axis. Formally, given a horizontal field of view of 114 degrees, centered around the vector that is orthogonal to the front of the face, the horizontal angle of the view α in degrees is given by: α = 114 AB AC = 114 AB AC 57 where AB is the distance from the pupil to the left eye corner along the line AC (see figure 6). Similarly, for the vertical angle of the gaze, a view of 90 degrees is used. This value is not necessarily very accurate, but there has been little to no research in this area, and no conclusive data was available. The vertical size of the eye is deemed to be roughly 2 5 of the horizontal size, with the gaze at the center of the view if the pupil is positioned on the line segment between the two eye corners. The same principle is applied, so the vertical angle β in degrees is given by: β = 45 DB 0.4AC (where DB is the distance between the pupil (D) to the line between the eye corners (AC). In this equation, we assume DB would be negative if the pupil were below the line segment connecting the two eye corners) Using these rotations with the unit vector along the Z axis, we are able to determine a gaze vector g. An example of some transformation angles and the resulting vector is visualized in figure 7. Using the rotation angles α and β, it is possible to build two simple rotation matrices, and multiply the normal vector with them. For the rotation about the Y axis (so the horizontal rotation) this is: R 1 = cos α 0 sin α sin α 0 cos α And for the rotation about the X axis (so the vertical rotation): R 2 = 0 cos β sin β 0 sin β cos β Which then allows us to compute a gaze vector g using the simple unit vector v: 0 v = 0 1 g = R 1 R 2 v Using this gaze vector, we can determine where the user s gaze intersects with the screen. 12

14 Figure 7: The transformation of the vector [0, 0, 1] to gaze vector g using angles α and β for rotation about the Y and X axes, respectively. Also compare with figure 2 to understand which 3D coordinate model is used. 4.6 Screen Intersect Location Now that we have a vector in the face coordinate system, and know the transformation matrix to go from the face coordinate system to the camera coordinate system, we can use this to obtain the same vector in the camera coordinate system. We can then compute an intersection with the screen by determining the point at which the vector intersects the plane in which the screen lies (assuming that the screen is flat) Ray-plane intersection The following is a classic way of doing ray-plane intersection in 3D. It has been described extensively in literature. For more background, please refer to [8]. Consider the plane in which our screen lies as defined by the classic plane equation: 0 = Ax + By + Cz + D (2) Here, A, B, and C are the unit normal from the plane. In our case, this unit normal is: 0 n = 0. (3) 1 That is, we assume that the screen is positioned in such a way that the normal of its plane is the same as the camera Z axis. In other words, the camera being at the origin, gazes along the Z axis, with the screen and the camera in the X-Y plane. This is not at all required, and arbitrary translations and rotations could be applied to the formulas given here. However, very many laptops these days come with a camera preinstalled on top of the screen, and therefore satisfy this precise criterion already. Because the plane we are interested in is the X-Y plane, we also know that the distance to the origin D is 0. From equations 2 and 3, this means our plane equation is the following: 0 = 0x + 0y + 1z + 0 = z (4) 13

15 This fits our intuition that a point is in the plane iff its z coordinate is zero. For the vector, we define an origin, v 0 = [x 0, y 0, z 0 ] and a point on the ray along the vector v d = [x d, y d, z d ]. Now we can parametrize the ray as such: Substituting this into equation 2 produces: v(t) = v 0 + tv d (5) 0 = A(x 0 + tx d ) + B(y 0 + ty d ) + C(z 0 + tz d ) + D 0 = Ax 0 + By 0 + Cy 0 + D + tax d + tby d + tcz d (Ax 0 + By 0 + Cy 0 + D) = tax d + tby d + tcz d (Ax 0 + By 0 + Cy 0 + D) = t(ax d + By d + Cz d ) (Ax 0 + By 0 + Cy 0 + D) (Ax d + By d + Cz d ) = t All these variables are known, so we can compute t, substitute it into v(t) and obtain an intersection point. In our specific case, the equation is actually much simpler, as substituting equation 5 into equation 4 produces: 0 = z 0 + tz d which clearly saves a lot of tedious computation. Having obtained the intersection point (if any) of the ray with the plane, we need to assess where this point is on the screen Point in rectangle In our case, we only care about points in the rectangle that is the screen. So, at first we test if the point is inside this rectangle. In order to do this, we do a naive projection that retains the topology of the rectangle by dropping one of the coordinates from our vectors (this approach to quickly project vectors to 2D is outlined in [8] as well). We determine the dominant coordinate in the normal of the plane in which the rectangle lies. In the specific case outlined above, where the screen is in the viewing plane, that is, the camera Z axis is the normal of the screen plane, that is the z coordinate (recall that our normal was [0, 0, 1], from equation 4). This coordinate is then removed from all of our points. When dropping one of the coordinates, we end up with 2D coordinates for the four corners of the screen, c 0...c 3 where c n = (c nx, c ny ), and the point p = (p x, p y ). In order to deduce whether the point is in this (arbitrarily rotated) rectangle, the following algorithm was used: 1. For each corner c n define: c m as the next point in clockwise order, so m = (n + 1) mod 4; l n as the line through c n and c m ; f n as the linear formula describing the line l n ; d n as the vertical distance between the line l n and p, or if l n is vertical, the horizontal distance. (refer to figure 8 to see a visual representation of this situation. Note that the screen orientation and proportion in the figure are not realistic: the normal computer screen would be wider than it is high. However, in the interest of the example, these dimensions are more convenient because d 1 and/or d 3 would otherwise be overly long) 14

16 2. Compute f n (x): f n (x) = ( c n y c my c nx c mx )x + (c ny ( c n y c my c nx c nx )c nx ) 3. For point p = (p x, p y ) compute d n = f n (p x ) p y, or, if l n was vertical, compute d n = c nx p x. This is the vertical, or if l n was vertical, horizontal, distance between point p and l n. The sign tells whether p is above or below (or to the left or the right) of the line. 4. p is inside the rectangle if and only if the signs of d 0 and d 2 are opposite and the signs of d 1 and d 3 are opposite. 5. If any of d 0...d 3 are 0, the point is on that line, but not necessarily between the line s defining screen corners. To check for the latter property, check the other pair of values. If those two values have opposite signs, or one of them is 0, the point is inside the rectangle. Figure 8: Diagram showing the similar triangles formed by l 0...l 3 (black), d 0...d 3 (blue and green) and the normals from l 0...l 3 to p (in red) Screen coordinate Because the topology of the rectangle was retained, we can use basic geometry with d 0...d 3 to compute where the point is, compared to the corners, and to compute the screen coordinates. First, note that the screen coordinates are proportional to the normals from lines l 0...l 3 to p. Hence, we are interested in the proportions of these normals. Fortunately, the triangles formed by the lines l 0...l 3, the line segments corresponding to d 0...d 3 and the normals from lines l 0...l 3 to p are similar, because point p forms the intersection between these 15

17 straight lines, and the opposite lines of the rectangle are parallel. Hence, the proportion of the different d 0...d 3 values is equivalent to that of the normals from l 0...l 3, which allows us to easily calculate the screen coordinates of the point p. This concept should be obvious from figure Synthesis I have now treated all the different parts needed to go from the webcam image to a position on the screen. I will try to recap a bit to explain how the different parts fit together. The steps to go from the webcam image to the screen coordinate are as follows: 1. Detect the face (subsection 4.1). This returns the 2D image coordinates of the face. 2. Detect the pupils (subsection 4.2) in the area of the face where the eyes should be found, using the coordinates obtained in the previous step. This returns the 2D image coordinates of the pupils. 3. Using the position of the eyes and the center of the face, compute the location of the face (subsection 4.3). This returns a transformation matrix from the 3D model coordinate system of the face to the 3D camera coordinate system for the webcam, composed of a rotation matrix and a translation vector. 4. Using residual data from locating the pupils, determine the location of the eye corners (subsection 4.4). This returns the 2D image coordinates of the eye corners. 5. With the information about both the pupils and the eye corners, compute the gaze of the eye in terms of the model of the face (subsection 4.5), and convert the vector coordinates to the coordinate system of the camera rather than the model of the face. This returns a 3D vector that represents the user s gaze in the camera coordinate system. 6. Calculate the intersection point of the user s gaze found in the previous step with the screen, and its coordinates in the screen coordinate system (subsection 4.6). In the next section, I will detail how these steps were implemented in my proof of concept application. 5 Implementation I have implemented the method outlined in section 4 using C++ and the OpenCV computer vision library [1]. In order to show the screen coordinate at which the user is looking, I have also used the Macintoshspecific Carbon API, so as to obtain a transparent overlay window[11] as big as the screen on which the application draws a clearly visible red dot. The current implementation is therefore mac-specific, however, it would be very little effort to port the mac-specific code to the Windows platform. All the OpenCV code, and the actual algorithms and synthesis of the data described in section 4 should be cross-platform. 5.1 Limitations The current implementation has some limitations in terms of generalizability beyond that of the platformspecifity, and apart from the problems found with the approach implemented (which are discussed in section 6). For one, the implementation currently hardcodes the size of the screen to be 29 by 13 centimeters, and the camera view axis to be perpendicular to this, with the camera centered at 1 centimeter above the 16

18 screen. These distances correspond to those found in the Apple Macbook laptop. They would need to be adjusted for other laptops or setups. Another limitation is the fact that the focus length is currently hardcoded. It would be better to automatically deduce this parameter using some form of calibration. However, this was not the focus of this research, and no time has been invested in doing this. Because the Apple Macbook has a fixedfocus camera, hardcoding this value was not deemed to be a problem in normal usage and testing of the application. Finally, the size of the human face was hardcoded to values which were found experimentally using measurements done on the author s face. Clearly, these would need to be adjusted to well-referenced averages as found by studies into the human anatomy. Unfortunately, the author has, despite serious effort, been unable to locate such well-referenced averages. 5.2 Efficiency The current proof of concept was not written primarily with efficiency in mind. Hence, its performance could easily be improved upon. However, in the current setup, processing one frame takes approximately 130 milliseconds on average, using a webcam image with a 640 by 480 pixel resolution, on an Apple Macbook running a 2.0 GHz Intel Core 2 Duo processor (but with a single core implementation). This would boil down to approximately 7.7 frames per second. While this is definitely not stunning, it is not as unreasonable as it might have been, considering the number of different algorithms and tasks implemented and in use. 6 Discussion Unfortunately, the implementation does not perform as well as might have been hoped. There are several different problems that interfere with the accuracy of the eye tracker, each of which I will consider in turn. The different problems, when singled out, are not always major, but the combination of all of them means that in its current state, the eye tracker cannot be used for serious applications. 6.1 Face localization in 3D The most obvious and visible problem is that of doing face localization in 3D. That is to say, to use POSIT to obtain a transformation matrix from the original face model to the real world. The implementation uses the pupils and the center of the face in order to do this. In practice, there are several problems with this approach: The center of the face detected shifts as the face turns. This means that it is actually not possible to make the point track an actual feature on the face, such as the tip of the nose. While it may correspond to this feature in one face position, it will no longer do so when the face turns but a few degrees. The face detector does not work when turning too far to the left or right. This is inherent in using this boosted cascade classifier, and therefore unavoidable when using that approach, but the effects were stronger than anticipated. Depending on the direction from which the largest amount of light originates, it is sometimes not possible to turn more than approximately 10 degrees in the opposite direction, as the side of the face being turned away is also in the shadow, and the face is therefore no longer recognized. Both the center of the face and the pupils move irrespective of actual facial movement. The center of the face moves because of the fact that the rectangle indicated to be the face by the face 17

19 detector shifts a few pixels every frame, even if the actual user s face remains quite still. The pupils move, of course, due to eye movement, but also because of noise in the webcam image influencing the isocenter detector. The combination of these movements means that the assumption that these three points can be treated as a rigid body is violated. As a result, the POSIT estimation of the pose also shifts very frequently. The POSIT pose estimation is unreliable. The cause of this lies partially in the previous point, but quite apart from that, the three points given to it seem to be insufficient for it to make a reliable estimate of the rotation of the face in particular. While the translation vector it deduces is usually reasonably correct, the rotation is not. It is not exactly known what causes this problem. It may be a problem in the OpenCV implementation, but this seems unlikely given its ubiquitous usage. 6.2 Eye corner detection Another problem is that of locating the eye corners. For one thing, the theory behind this is unclear: To date, there has been no decisive explanation as to why the eye corners are present as isocenters. Several possible reasons include the curvature of the face around the eyes, the curvature of the opposite side of the pupil, and the shape of the tear glands. Especially the latter might have an effect, given that the inner eye corners are detected more often than the outer ones. Regardless of how it works, the fact remains that this approach is also producing a small problem. The eye corners that are detected are often already a few pixels off when they are detected as isocenters. This is not very significant when attempting to distinguish the horizontal position of the pupil, but for the vertical position it makes much more difference, as the visible area of the eye, in the vertical direction, is simply much smaller, and the effect of these few pixels is all the more noticeable. Then the next problem is that the Lucas Kanade tracking is prone to allow the corners to glide along the bottom of the eyes, especially when they were not entirely in the eye s corner to begin with. This can easily be explained by the fact that the window used by the tracking is so small that when the point it is tracking is not exactly in the corner, it will only pick up the difference between the eye and the bottom eyelid as being defining for that feature. This feature, occurs roughly in the same way along the entire edge of the eye. A similar thing happens if the corner is initially detected a little bit outside of the eye, when the tracking moves it along the side of the head (until it passes the geometrical constraint boundary, at which point it is reset). 6.3 Pupil detection The pupil detection, too, seems to sometimes produce the wrong results, selecting one of the eye s corners instead of the pupil as the most prominent isocenter, leading to a wrong estimation of the location of the pupil, which massively throws off the gaze direction to either of the two corners of the eyes. Clearly, this should not be allowed to happen. 7 Proposed Improvements In order to fix the problems outlined in the previous section, several solutions are proposed in this section. Some of the problems may also be solved by using camera equipment with a higher resolution, or stereo vision, but the first brings extra monetary costs, and the second has been treated extensively in literature (eg. [12, 14]), and in addition to that poses new problems related to calibration and correlation of the two images. Disregarding these options, however, there are still several ways in which the current result may be improved. 18

20 7.1 Alternative face localization In order to improve the localization of the face, several steps may be taken: Use a different pose estimation algorithm. POSIT is not the only algorithm available to do 3D reconstruction, and the origins of POSIT date back to 1989 [4]. By now, various refinements and alternatives have been proposed, eg. [21, 3, 24]. Use a more stable feature set, such as the average between the two eye corners for each eye, and the average between the last 3 frames for the nose. Using these averages, the points fed into POSIT will vary less, which should help stabilize the rotation matrix for POSIT. Use POSIT with more features, such as the mouth or the ears (if available). This will allow a more robust estimation of rotation by POSIT, and this, too, would be able to help the accuracy of the transformation matrix obtained through POSIT. Use a different classifier for the face position in 2D, that is not as noisy as the boosted cascade classifier. There are many possible ways of doing this (eg. [18, 22]). Using something other than the boosted cascade classifier may prove profitable in terms of stability, though care must be taken in retaining the speed of the current implementation. 7.2 Alternative eye corner detection and improvements in pupil detection In order to reliably detect eye corners, several other well-known options are available instead of the makeshift isocenter approach used for this thesis, eg. [25]. Additionally, some methods to detect faces use actual face models, from which it would be possible to infer the eye corner position as well. Alternatively, because we know the position of at least one point in the eye (namely, the pupil, using the isocenter approach), it would be possible to use more naive (and faster) methods to infer the corner of the eyes from there (such as edge detection in the area between the two pupils). Finally, it may be possible to use a more geometrically oriented approach of finding the right isocenters that correspond to the corners and the pupil. Because we know the relative ordering between the three centers, and can estimate a distance between the two eye corners from the size of the face detected as well as data from previous frames, it may be possible to use these to more elaborately score different combinations of isocenters, and select the optimal one. This approach may improve over the current situation because it combines the finding of the different points, ensuring a little bit more consistency between the data. 8 Conclusion I have outlined and implemented a method to do eye tracking using just an ordinary webcam. The approach uses a boosted cascade classifier for face recognition, and isocenters for locating the pupils and eye corners. The pupil and face data is combined to do 3D reconstruction and obtain a transformation matrix from the camera model to the face model. The pupil and eye corner data is combined to obtain a gaze direction, which is transformed using the aforementioned matrix so as to obtain a vector for the user s gaze in the camera model. Using this vector, it is possible to calculate an intersect point on the screen, and display this point to the user. This method proved to suffer from various problems, including severe problems in determining the 3D rotation of the user s face, and accurate localization of the eye corners and pupils. Several suggestions to resolve these issues have been proposed, such as using alternative algorithms for 3D reconstruction, using more data points, or using alternative methods to process the isocenter data so as to obtain more accurate locations for the pupils and eye corners. 19

21 Although the aim of obtaining a fully functional eye tracker using only a webcam was not achieved in the timeline for this thesis, promising steps have been made in the development of such a system. Using the suggestions for improvement outlined in the previous section, I am confident that it would in fact be possible to do eye tracking using just a webcam. References [1] G. Bradski. The OpenCV Library. Dr. Dobb s Journal, Computer Security, November [2] R. H. S. Carpenter. Movements of the eye. Pion, London, [3] P. David, D. DeMenthon, R. Duraiswami, and H. Samet. Simultaneous pose and correspondence determination using line features. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, volume 2, [4] D. F. DeMenthon and L. S. Davis. New exact and approximate solutions of the three-point perspective problem. University of Maryland Tech Notes, October [5] D. F. DeMenthon and L. S. Davis. Model-based object pose in 25 lines of code. International Journal of Computer Vision, 15: , [6] Frédéric Devernay and Olivier Faugeras. Straight lines have to be straight. Machine Vision and Applications, 13:14 24, [7] Andrew T. Duchowski. Eye Tracking Methodology. Springer, second edition, [8] A. S. Glassner. An Introduction to Ray Tracing. Morgan Kaufmann, [9] J. Heikkila and O. Silven. A four-step camera calibration procedure with implicit image correction. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages , [10] Ian P. Howard and Brian J. Rogers. Binocular Vision and Stereopsis. Oxford University Press, USA, [11] Apple Inc. Using overlay windows. In Quartz Programming Guide for QuickDraw Developers, chapter 7, pages Apple Inc., [12] Shinjiro Kawato and Nobuji Tetsutani. Detection and tracking of eyes for gaze-camera control. Image and Vision Computing, 22: , October [13] Bruce D. Lucas and Takeo Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of Imaging Understanding Workshop, pages , [14] Y. Matsumoto and A. Zelinsky. An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, page 499, [15] C. H. Morimoto, D. Koons, A. Amir, and M. Flickner. Pupil detection and tracking using multiple light sources. Image and Vision Computing, 18: , March [16] Takehiko Ohno and Naoki Mukawa. A free-head, simple calibration, gaze tracking system that enables gaze-based interaction. In Proceedings of the 2004 symposium on Eye tracking research & applications, pages , San Antonio, Texas, ACM. 20

CS231M Project Report - Automated Real-Time Face Tracking and Blending

CS231M Project Report - Automated Real-Time Face Tracking and Blending CS231M Project Report - Automated Real-Time Face Tracking and Blending Steven Lee, slee2010@stanford.edu June 6, 2015 1 Introduction Summary statement: The goal of this project is to create an Android

More information

An Iterative Image Registration Technique with an Application to Stereo Vision

An Iterative Image Registration Technique with an Application to Stereo Vision An Iterative Image Registration Technique with an Application to Stereo Vision Bruce D. Lucas Takeo Kanade Computer Science Department Carnegie-Mellon University Pittsburgh, Pennsylvania 15213 Abstract

More information

THREE DIMENSIONAL GEOMETRY

THREE DIMENSIONAL GEOMETRY Chapter 8 THREE DIMENSIONAL GEOMETRY 8.1 Introduction In this chapter we present a vector algebra approach to three dimensional geometry. The aim is to present standard properties of lines and planes,

More information

discuss how to describe points, lines and planes in 3 space.

discuss how to describe points, lines and planes in 3 space. Chapter 2 3 Space: lines and planes In this chapter we discuss how to describe points, lines and planes in 3 space. introduce the language of vectors. discuss various matters concerning the relative position

More information

Face detection is a process of localizing and extracting the face region from the

Face detection is a process of localizing and extracting the face region from the Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.

More information

1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v,

1.3. DOT PRODUCT 19. 6. If θ is the angle (between 0 and π) between two non-zero vectors u and v, 1.3. DOT PRODUCT 19 1.3 Dot Product 1.3.1 Definitions and Properties The dot product is the first way to multiply two vectors. The definition we will give below may appear arbitrary. But it is not. It

More information

Reflection and Refraction

Reflection and Refraction Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,

More information

Solving Simultaneous Equations and Matrices

Solving Simultaneous Equations and Matrices Solving Simultaneous Equations and Matrices The following represents a systematic investigation for the steps used to solve two simultaneous linear equations in two unknowns. The motivation for considering

More information

Section 1.1. Introduction to R n

Section 1.1. Introduction to R n The Calculus of Functions of Several Variables Section. Introduction to R n Calculus is the study of functional relationships and how related quantities change with each other. In your first exposure to

More information

12.5 Equations of Lines and Planes

12.5 Equations of Lines and Planes Instructor: Longfei Li Math 43 Lecture Notes.5 Equations of Lines and Planes What do we need to determine a line? D: a point on the line: P 0 (x 0, y 0 ) direction (slope): k 3D: a point on the line: P

More information

3. INNER PRODUCT SPACES

3. INNER PRODUCT SPACES . INNER PRODUCT SPACES.. Definition So far we have studied abstract vector spaces. These are a generalisation of the geometric spaces R and R. But these have more structure than just that of a vector space.

More information

MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem

MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem MA 323 Geometric Modelling Course Notes: Day 02 Model Construction Problem David L. Finn November 30th, 2004 In the next few days, we will introduce some of the basic problems in geometric modelling, and

More information

Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior

Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Automatic Calibration of an In-vehicle Gaze Tracking System Using Driver s Typical Gaze Behavior Kenji Yamashiro, Daisuke Deguchi, Tomokazu Takahashi,2, Ichiro Ide, Hiroshi Murase, Kazunori Higuchi 3,

More information

Subspace Analysis and Optimization for AAM Based Face Alignment

Subspace Analysis and Optimization for AAM Based Face Alignment Subspace Analysis and Optimization for AAM Based Face Alignment Ming Zhao Chun Chen College of Computer Science Zhejiang University Hangzhou, 310027, P.R.China zhaoming1999@zju.edu.cn Stan Z. Li Microsoft

More information

Template-based Eye and Mouth Detection for 3D Video Conferencing

Template-based Eye and Mouth Detection for 3D Video Conferencing Template-based Eye and Mouth Detection for 3D Video Conferencing Jürgen Rurainsky and Peter Eisert Fraunhofer Institute for Telecommunications - Heinrich-Hertz-Institute, Image Processing Department, Einsteinufer

More information

Eye-Tracking with Webcam-Based Setups: Implementation of a Real-Time System and an Analysis of Factors Affecting Performance

Eye-Tracking with Webcam-Based Setups: Implementation of a Real-Time System and an Analysis of Factors Affecting Performance Universitat Autònoma de Barcelona Master in Computer Vision and Artificial Intelligence Report of the Master Project Option: Computer Vision Eye-Tracking with Webcam-Based Setups: Implementation of a Real-Time

More information

Equations Involving Lines and Planes Standard equations for lines in space

Equations Involving Lines and Planes Standard equations for lines in space Equations Involving Lines and Planes In this section we will collect various important formulas regarding equations of lines and planes in three dimensional space Reminder regarding notation: any quantity

More information

FURTHER VECTORS (MEI)

FURTHER VECTORS (MEI) Mathematics Revision Guides Further Vectors (MEI) (column notation) Page of MK HOME TUITION Mathematics Revision Guides Level: AS / A Level - MEI OCR MEI: C FURTHER VECTORS (MEI) Version : Date: -9-7 Mathematics

More information

Segmentation of building models from dense 3D point-clouds

Segmentation of building models from dense 3D point-clouds Segmentation of building models from dense 3D point-clouds Joachim Bauer, Konrad Karner, Konrad Schindler, Andreas Klaus, Christopher Zach VRVis Research Center for Virtual Reality and Visualization, Institute

More information

Adding vectors We can do arithmetic with vectors. We ll start with vector addition and related operations. Suppose you have two vectors

Adding vectors We can do arithmetic with vectors. We ll start with vector addition and related operations. Suppose you have two vectors 1 Chapter 13. VECTORS IN THREE DIMENSIONAL SPACE Let s begin with some names and notation for things: R is the set (collection) of real numbers. We write x R to mean that x is a real number. A real number

More information

3D Scanner using Line Laser. 1. Introduction. 2. Theory

3D Scanner using Line Laser. 1. Introduction. 2. Theory . Introduction 3D Scanner using Line Laser Di Lu Electrical, Computer, and Systems Engineering Rensselaer Polytechnic Institute The goal of 3D reconstruction is to recover the 3D properties of a geometric

More information

Common Core Unit Summary Grades 6 to 8

Common Core Unit Summary Grades 6 to 8 Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations

More information

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007

Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Sample Exam Questions 2007 Monash University Clayton s School of Information Technology CSE3313 Computer Graphics Questions 2007 INSTRUCTIONS: Answer all questions. Spend approximately 1 minute per mark. Question 1 30 Marks Total

More information

1.5 Equations of Lines and Planes in 3-D

1.5 Equations of Lines and Planes in 3-D 40 CHAPTER 1. VECTORS AND THE GEOMETRY OF SPACE Figure 1.16: Line through P 0 parallel to v 1.5 Equations of Lines and Planes in 3-D Recall that given a point P = (a, b, c), one can draw a vector from

More information

CS 534: Computer Vision 3D Model-based recognition

CS 534: Computer Vision 3D Model-based recognition CS 534: Computer Vision 3D Model-based recognition Ahmed Elgammal Dept of Computer Science CS 534 3D Model-based Vision - 1 High Level Vision Object Recognition: What it means? Two main recognition tasks:!

More information

Mathematics on the Soccer Field

Mathematics on the Soccer Field Mathematics on the Soccer Field Katie Purdy Abstract: This paper takes the everyday activity of soccer and uncovers the mathematics that can be used to help optimize goal scoring. The four situations that

More information

Face Model Fitting on Low Resolution Images

Face Model Fitting on Low Resolution Images Face Model Fitting on Low Resolution Images Xiaoming Liu Peter H. Tu Frederick W. Wheeler Visualization and Computer Vision Lab General Electric Global Research Center Niskayuna, NY, 1239, USA {liux,tu,wheeler}@research.ge.com

More information

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data.

In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. MATHEMATICS: THE LEVEL DESCRIPTIONS In mathematics, there are four attainment targets: using and applying mathematics; number and algebra; shape, space and measures, and handling data. Attainment target

More information

Wii Remote Calibration Using the Sensor Bar

Wii Remote Calibration Using the Sensor Bar Wii Remote Calibration Using the Sensor Bar Alparslan Yildiz Abdullah Akay Yusuf Sinan Akgul GIT Vision Lab - http://vision.gyte.edu.tr Gebze Institute of Technology Kocaeli, Turkey {yildiz, akay, akgul}@bilmuh.gyte.edu.tr

More information

Robot Perception Continued

Robot Perception Continued Robot Perception Continued 1 Visual Perception Visual Odometry Reconstruction Recognition CS 685 11 Range Sensing strategies Active range sensors Ultrasound Laser range sensor Slides adopted from Siegwart

More information

We can display an object on a monitor screen in three different computer-model forms: Wireframe model Surface Model Solid model

We can display an object on a monitor screen in three different computer-model forms: Wireframe model Surface Model Solid model CHAPTER 4 CURVES 4.1 Introduction In order to understand the significance of curves, we should look into the types of model representations that are used in geometric modeling. Curves play a very significant

More information

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA

A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA A PHOTOGRAMMETRIC APPRAOCH FOR AUTOMATIC TRAFFIC ASSESSMENT USING CONVENTIONAL CCTV CAMERA N. Zarrinpanjeh a, F. Dadrassjavan b, H. Fattahi c * a Islamic Azad University of Qazvin - nzarrin@qiau.ac.ir

More information

For example, estimate the population of the United States as 3 times 10⁸ and the

For example, estimate the population of the United States as 3 times 10⁸ and the CCSS: Mathematics The Number System CCSS: Grade 8 8.NS.A. Know that there are numbers that are not rational, and approximate them by rational numbers. 8.NS.A.1. Understand informally that every number

More information

NEW MEXICO Grade 6 MATHEMATICS STANDARDS

NEW MEXICO Grade 6 MATHEMATICS STANDARDS PROCESS STANDARDS To help New Mexico students achieve the Content Standards enumerated below, teachers are encouraged to base instruction on the following Process Standards: Problem Solving Build new mathematical

More information

L 2 : x = s + 1, y = s, z = 4s + 4. 3. Suppose that C has coordinates (x, y, z). Then from the vector equality AC = BD, one has

L 2 : x = s + 1, y = s, z = 4s + 4. 3. Suppose that C has coordinates (x, y, z). Then from the vector equality AC = BD, one has The line L through the points A and B is parallel to the vector AB = 3, 2, and has parametric equations x = 3t + 2, y = 2t +, z = t Therefore, the intersection point of the line with the plane should satisfy:

More information

The Olympus stereology system. The Computer Assisted Stereological Toolbox

The Olympus stereology system. The Computer Assisted Stereological Toolbox The Olympus stereology system The Computer Assisted Stereological Toolbox CAST is a Computer Assisted Stereological Toolbox for PCs running Microsoft Windows TM. CAST is an interactive, user-friendly,

More information

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS A QUIK GUIDE TO THE FOMULAS OF MULTIVAIABLE ALULUS ontents 1. Analytic Geometry 2 1.1. Definition of a Vector 2 1.2. Scalar Product 2 1.3. Properties of the Scalar Product 2 1.4. Length and Unit Vectors

More information

Exam 1 Sample Question SOLUTIONS. y = 2x

Exam 1 Sample Question SOLUTIONS. y = 2x Exam Sample Question SOLUTIONS. Eliminate the parameter to find a Cartesian equation for the curve: x e t, y e t. SOLUTION: You might look at the coordinates and notice that If you don t see it, we can

More information

11.1. Objectives. Component Form of a Vector. Component Form of a Vector. Component Form of a Vector. Vectors and the Geometry of Space

11.1. Objectives. Component Form of a Vector. Component Form of a Vector. Component Form of a Vector. Vectors and the Geometry of Space 11 Vectors and the Geometry of Space 11.1 Vectors in the Plane Copyright Cengage Learning. All rights reserved. Copyright Cengage Learning. All rights reserved. 2 Objectives! Write the component form of

More information

CS 4204 Computer Graphics

CS 4204 Computer Graphics CS 4204 Computer Graphics 3D views and projection Adapted from notes by Yong Cao 1 Overview of 3D rendering Modeling: *Define object in local coordinates *Place object in world coordinates (modeling transformation)

More information

Jiří Matas. Hough Transform

Jiří Matas. Hough Transform Hough Transform Jiří Matas Center for Machine Perception Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague Many slides thanks to Kristen Grauman and Bastian

More information

LINES AND PLANES CHRIS JOHNSON

LINES AND PLANES CHRIS JOHNSON LINES AND PLANES CHRIS JOHNSON Abstract. In this lecture we derive the equations for lines and planes living in 3-space, as well as define the angle between two non-parallel planes, and determine the distance

More information

Face Locating and Tracking for Human{Computer Interaction. Carnegie Mellon University. Pittsburgh, PA 15213

Face Locating and Tracking for Human{Computer Interaction. Carnegie Mellon University. Pittsburgh, PA 15213 Face Locating and Tracking for Human{Computer Interaction Martin Hunke Alex Waibel School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Eective Human-to-Human communication

More information

PASSIVE DRIVER GAZE TRACKING WITH ACTIVE APPEARANCE MODELS

PASSIVE DRIVER GAZE TRACKING WITH ACTIVE APPEARANCE MODELS PASSIVE DRIVER GAZE TRACKING WITH ACTIVE APPEARANCE MODELS Takahiro Ishikawa Research Laboratories, DENSO CORPORATION Nisshin, Aichi, Japan Tel: +81 (561) 75-1616, Fax: +81 (561) 75-1193 Email: tishika@rlab.denso.co.jp

More information

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors

Chapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors Chapter 9. General Matrices An n m matrix is an array a a a m a a a m... = [a ij]. a n a n a nm The matrix A has n row vectors and m column vectors row i (A) = [a i, a i,..., a im ] R m a j a j a nj col

More information

Math 215 HW #6 Solutions

Math 215 HW #6 Solutions Math 5 HW #6 Solutions Problem 34 Show that x y is orthogonal to x + y if and only if x = y Proof First, suppose x y is orthogonal to x + y Then since x, y = y, x In other words, = x y, x + y = (x y) T

More information

Orthogonal Projections

Orthogonal Projections Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors

More information

Section 1.4. Lines, Planes, and Hyperplanes. The Calculus of Functions of Several Variables

Section 1.4. Lines, Planes, and Hyperplanes. The Calculus of Functions of Several Variables The Calculus of Functions of Several Variables Section 1.4 Lines, Planes, Hyperplanes In this section we will add to our basic geometric understing of R n by studying lines planes. If we do this carefully,

More information

Study of the Human Eye Working Principle: An impressive high angular resolution system with simple array detectors

Study of the Human Eye Working Principle: An impressive high angular resolution system with simple array detectors Study of the Human Eye Working Principle: An impressive high angular resolution system with simple array detectors Diego Betancourt and Carlos del Río Antenna Group, Public University of Navarra, Campus

More information

Anamorphic Projection Photographic Techniques for setting up 3D Chalk Paintings

Anamorphic Projection Photographic Techniques for setting up 3D Chalk Paintings Anamorphic Projection Photographic Techniques for setting up 3D Chalk Paintings By Wayne and Cheryl Renshaw. Although it is centuries old, the art of street painting has been going through a resurgence.

More information

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0,

(a) We have x = 3 + 2t, y = 2 t, z = 6 so solving for t we get the symmetric equations. x 3 2. = 2 y, z = 6. t 2 2t + 1 = 0, Name: Solutions to Practice Final. Consider the line r(t) = 3 + t, t, 6. (a) Find symmetric equations for this line. (b) Find the point where the first line r(t) intersects the surface z = x + y. (a) We

More information

521493S Computer Graphics. Exercise 2 & course schedule change

521493S Computer Graphics. Exercise 2 & course schedule change 521493S Computer Graphics Exercise 2 & course schedule change Course Schedule Change Lecture from Wednesday 31th of March is moved to Tuesday 30th of March at 16-18 in TS128 Question 2.1 Given two nonparallel,

More information

How To Fuse A Point Cloud With A Laser And Image Data From A Pointcloud

How To Fuse A Point Cloud With A Laser And Image Data From A Pointcloud REAL TIME 3D FUSION OF IMAGERY AND MOBILE LIDAR Paul Mrstik, Vice President Technology Kresimir Kusevic, R&D Engineer Terrapoint Inc. 140-1 Antares Dr. Ottawa, Ontario K2E 8C4 Canada paul.mrstik@terrapoint.com

More information

Introduction. www.imagesystems.se

Introduction. www.imagesystems.se Product information Image Systems AB Main office: Ågatan 40, SE-582 22 Linköping Phone +46 13 200 100, fax +46 13 200 150 info@imagesystems.se, Introduction Motion is the world leading software for advanced

More information

Processing the Image or Can you Believe what you see? Light and Color for Nonscientists PHYS 1230

Processing the Image or Can you Believe what you see? Light and Color for Nonscientists PHYS 1230 Processing the Image or Can you Believe what you see? Light and Color for Nonscientists PHYS 1230 Optical Illusions http://www.michaelbach.de/ot/mot_mib/index.html Vision We construct images unconsciously

More information

Factoring Patterns in the Gaussian Plane

Factoring Patterns in the Gaussian Plane Factoring Patterns in the Gaussian Plane Steve Phelps Introduction This paper describes discoveries made at the Park City Mathematics Institute, 00, as well as some proofs. Before the summer I understood

More information

Introduction to Lensometry Gregory L. Stephens, O.D., Ph.D. College of Optometry, University of Houston 2010

Introduction to Lensometry Gregory L. Stephens, O.D., Ph.D. College of Optometry, University of Houston 2010 Introduction to Lensometry Gregory L. Stephens, O.D., Ph.D. College of Optometry, University of Houston 2010 I. Introduction The focimeter, lensmeter, or Lensometer is the standard instrument used to measure

More information

Tracking Moving Objects In Video Sequences Yiwei Wang, Robert E. Van Dyck, and John F. Doherty Department of Electrical Engineering The Pennsylvania State University University Park, PA16802 Abstract{Object

More information

Arrangements And Duality

Arrangements And Duality Arrangements And Duality 3.1 Introduction 3 Point configurations are tbe most basic structure we study in computational geometry. But what about configurations of more complicated shapes? For example,

More information

Copyright 2011 Casa Software Ltd. www.casaxps.com. Centre of Mass

Copyright 2011 Casa Software Ltd. www.casaxps.com. Centre of Mass Centre of Mass A central theme in mathematical modelling is that of reducing complex problems to simpler, and hopefully, equivalent problems for which mathematical analysis is possible. The concept of

More information

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R. Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).

More information

One-Way Pseudo Transparent Display

One-Way Pseudo Transparent Display One-Way Pseudo Transparent Display Andy Wu GVU Center Georgia Institute of Technology TSRB, 85 5th St. NW Atlanta, GA 30332 andywu@gatech.edu Ali Mazalek GVU Center Georgia Institute of Technology TSRB,

More information

Polarization of Light

Polarization of Light Polarization of Light References Halliday/Resnick/Walker Fundamentals of Physics, Chapter 33, 7 th ed. Wiley 005 PASCO EX997A and EX999 guide sheets (written by Ann Hanks) weight Exercises and weights

More information

GeoGebra. 10 lessons. Gerrit Stols

GeoGebra. 10 lessons. Gerrit Stols GeoGebra in 10 lessons Gerrit Stols Acknowledgements GeoGebra is dynamic mathematics open source (free) software for learning and teaching mathematics in schools. It was developed by Markus Hohenwarter

More information

Solutions to old Exam 1 problems

Solutions to old Exam 1 problems Solutions to old Exam 1 problems Hi students! I am putting this old version of my review for the first midterm review, place and time to be announced. Check for updates on the web site as to which sections

More information

Ultra-High Resolution Digital Mosaics

Ultra-High Resolution Digital Mosaics Ultra-High Resolution Digital Mosaics J. Brian Caldwell, Ph.D. Introduction Digital photography has become a widely accepted alternative to conventional film photography for many applications ranging from

More information

Automatic Labeling of Lane Markings for Autonomous Vehicles

Automatic Labeling of Lane Markings for Autonomous Vehicles Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 jkiske@stanford.edu 1. Introduction As autonomous vehicles become more popular,

More information

A Learning Based Method for Super-Resolution of Low Resolution Images

A Learning Based Method for Super-Resolution of Low Resolution Images A Learning Based Method for Super-Resolution of Low Resolution Images Emre Ugur June 1, 2004 emre.ugur@ceng.metu.edu.tr Abstract The main objective of this project is the study of a learning based method

More information

Taking Inverse Graphics Seriously

Taking Inverse Graphics Seriously CSC2535: 2013 Advanced Machine Learning Taking Inverse Graphics Seriously Geoffrey Hinton Department of Computer Science University of Toronto The representation used by the neural nets that work best

More information

RESEARCH ON SPOKEN LANGUAGE PROCESSING Progress Report No. 29 (2008) Indiana University

RESEARCH ON SPOKEN LANGUAGE PROCESSING Progress Report No. 29 (2008) Indiana University RESEARCH ON SPOKEN LANGUAGE PROCESSING Progress Report No. 29 (2008) Indiana University A Software-Based System for Synchronizing and Preprocessing Eye Movement Data in Preparation for Analysis 1 Mohammad

More information

Instructions for Creating a Poster for Arts and Humanities Research Day Using PowerPoint

Instructions for Creating a Poster for Arts and Humanities Research Day Using PowerPoint Instructions for Creating a Poster for Arts and Humanities Research Day Using PowerPoint While it is, of course, possible to create a Research Day poster using a graphics editing programme such as Adobe

More information

2.2 Creaseness operator

2.2 Creaseness operator 2.2. Creaseness operator 31 2.2 Creaseness operator Antonio López, a member of our group, has studied for his PhD dissertation the differential operators described in this section [72]. He has compared

More information

Spatial location in 360 of reference points over an object by using stereo vision

Spatial location in 360 of reference points over an object by using stereo vision EDUCATION Revista Mexicana de Física E 59 (2013) 23 27 JANUARY JUNE 2013 Spatial location in 360 of reference points over an object by using stereo vision V. H. Flores a, A. Martínez a, J. A. Rayas a,

More information

Mean-Shift Tracking with Random Sampling

Mean-Shift Tracking with Random Sampling 1 Mean-Shift Tracking with Random Sampling Alex Po Leung, Shaogang Gong Department of Computer Science Queen Mary, University of London, London, E1 4NS Abstract In this work, boosting the efficiency of

More information

Solutions to Practice Problems

Solutions to Practice Problems Higher Geometry Final Exam Tues Dec 11, 5-7:30 pm Practice Problems (1) Know the following definitions, statements of theorems, properties from the notes: congruent, triangle, quadrilateral, isosceles

More information

Monitoring Head/Eye Motion for Driver Alertness with One Camera

Monitoring Head/Eye Motion for Driver Alertness with One Camera Monitoring Head/Eye Motion for Driver Alertness with One Camera Paul Smith, Mubarak Shah, and N. da Vitoria Lobo Computer Science, University of Central Florida, Orlando, FL 32816 rps43158,shah,niels @cs.ucf.edu

More information

Shape Measurement of a Sewer Pipe. Using a Mobile Robot with Computer Vision

Shape Measurement of a Sewer Pipe. Using a Mobile Robot with Computer Vision International Journal of Advanced Robotic Systems ARTICLE Shape Measurement of a Sewer Pipe Using a Mobile Robot with Computer Vision Regular Paper Kikuhito Kawasue 1,* and Takayuki Komatsu 1 1 Department

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

9 Multiplication of Vectors: The Scalar or Dot Product

9 Multiplication of Vectors: The Scalar or Dot Product Arkansas Tech University MATH 934: Calculus III Dr. Marcel B Finan 9 Multiplication of Vectors: The Scalar or Dot Product Up to this point we have defined what vectors are and discussed basic notation

More information

A Short Introduction to Computer Graphics

A Short Introduction to Computer Graphics A Short Introduction to Computer Graphics Frédo Durand MIT Laboratory for Computer Science 1 Introduction Chapter I: Basics Although computer graphics is a vast field that encompasses almost any graphical

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Solving Geometric Problems with the Rotating Calipers *

Solving Geometric Problems with the Rotating Calipers * Solving Geometric Problems with the Rotating Calipers * Godfried Toussaint School of Computer Science McGill University Montreal, Quebec, Canada ABSTRACT Shamos [1] recently showed that the diameter of

More information

INTRODUCTION TO RENDERING TECHNIQUES

INTRODUCTION TO RENDERING TECHNIQUES INTRODUCTION TO RENDERING TECHNIQUES 22 Mar. 212 Yanir Kleiman What is 3D Graphics? Why 3D? Draw one frame at a time Model only once X 24 frames per second Color / texture only once 15, frames for a feature

More information

Geometry 1. Unit 3: Perpendicular and Parallel Lines

Geometry 1. Unit 3: Perpendicular and Parallel Lines Geometry 1 Unit 3: Perpendicular and Parallel Lines Geometry 1 Unit 3 3.1 Lines and Angles Lines and Angles Parallel Lines Parallel lines are lines that are coplanar and do not intersect. Some examples

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Geometry of Vectors. 1 Cartesian Coordinates. Carlo Tomasi

Geometry of Vectors. 1 Cartesian Coordinates. Carlo Tomasi Geometry of Vectors Carlo Tomasi This note explores the geometric meaning of norm, inner product, orthogonality, and projection for vectors. For vectors in three-dimensional space, we also examine the

More information

Geometric Transformations

Geometric Transformations Geometric Transformations Definitions Def: f is a mapping (function) of a set A into a set B if for every element a of A there exists a unique element b of B that is paired with a; this pairing is denoted

More information

Build Panoramas on Android Phones

Build Panoramas on Android Phones Build Panoramas on Android Phones Tao Chu, Bowen Meng, Zixuan Wang Stanford University, Stanford CA Abstract The purpose of this work is to implement panorama stitching from a sequence of photos taken

More information

A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION

A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION A HYBRID APPROACH FOR AUTOMATED AREA AGGREGATION Zeshen Wang ESRI 380 NewYork Street Redlands CA 92373 Zwang@esri.com ABSTRACT Automated area aggregation, which is widely needed for mapping both natural

More information

Lecture 14: Section 3.3

Lecture 14: Section 3.3 Lecture 14: Section 3.3 Shuanglin Shao October 23, 2013 Definition. Two nonzero vectors u and v in R n are said to be orthogonal (or perpendicular) if u v = 0. We will also agree that the zero vector in

More information

TWO-DIMENSIONAL TRANSFORMATION

TWO-DIMENSIONAL TRANSFORMATION CHAPTER 2 TWO-DIMENSIONAL TRANSFORMATION 2.1 Introduction As stated earlier, Computer Aided Design consists of three components, namely, Design (Geometric Modeling), Analysis (FEA, etc), and Visualization

More information

Solution Guide III-C. 3D Vision. Building Vision for Business. MVTec Software GmbH

Solution Guide III-C. 3D Vision. Building Vision for Business. MVTec Software GmbH Solution Guide III-C 3D Vision MVTec Software GmbH Building Vision for Business Machine vision in 3D world coordinates, Version 10.0.4 All rights reserved. No part of this publication may be reproduced,

More information

Making Machines Understand Facial Motion & Expressions Like Humans Do

Making Machines Understand Facial Motion & Expressions Like Humans Do Making Machines Understand Facial Motion & Expressions Like Humans Do Ana C. Andrés del Valle & Jean-Luc Dugelay Multimedia Communications Dpt. Institut Eurécom 2229 route des Crêtes. BP 193. Sophia Antipolis.

More information

Section 8.8. 1. The given line has equations. x = 3 + t(13 3) = 3 + 10t, y = 2 + t(3 + 2) = 2 + 5t, z = 7 + t( 8 7) = 7 15t.

Section 8.8. 1. The given line has equations. x = 3 + t(13 3) = 3 + 10t, y = 2 + t(3 + 2) = 2 + 5t, z = 7 + t( 8 7) = 7 15t. . The given line has equations Section 8.8 x + t( ) + 0t, y + t( + ) + t, z 7 + t( 8 7) 7 t. The line meets the plane y 0 in the point (x, 0, z), where 0 + t, or t /. The corresponding values for x and

More information

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.

a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2. Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given

More information

Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday.

Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday. Math 312, Fall 2012 Jerry L. Kazdan Problem Set 5 Due: In class Thursday, Oct. 18 Late papers will be accepted until 1:00 PM Friday. In addition to the problems below, you should also know how to solve

More information

Field Application Note

Field Application Note Field Application Note Reverse Dial Indicator Alignment RDIA Mis-alignment can be the most usual cause for unacceptable operation and high vibration levels. New facilities or new equipment installations

More information

Vectors 2. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.

Vectors 2. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996. Vectors 2 The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996. Launch Mathematica. Type

More information

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT

HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT International Journal of Scientific and Research Publications, Volume 2, Issue 4, April 2012 1 HANDS-FREE PC CONTROL CONTROLLING OF MOUSE CURSOR USING EYE MOVEMENT Akhil Gupta, Akash Rathi, Dr. Y. Radhika

More information

Section 2.4: Equations of Lines and Planes

Section 2.4: Equations of Lines and Planes Section.4: Equations of Lines and Planes An equation of three variable F (x, y, z) 0 is called an equation of a surface S if For instance, (x 1, y 1, z 1 ) S if and only if F (x 1, y 1, z 1 ) 0. x + y

More information