Image Forensic Analyses that Elude the Human Visual System



Similar documents
Metric Measurements on a Plane from a Single Image

A Flexible New Technique for Camera Calibration

From Few to Many: Illumination Cone Models for Face Recognition Under Variable Lighting and Pose. Abstract

Rendering Synthetic Objects into Legacy Photographs

Distinctive Image Features from Scale-Invariant Keypoints

Automatic Photo Pop-up

Using Many Cameras as One

How To Find The Ranking Of A Cormorant

Correspondences Between ACT and SAT I Scores


Indexing by Latent Semantic Analysis. Scott Deerwester Graduate Library School University of Chicago Chicago, IL 60637

Can we calibrate a camera using an image of a flat, textureless Lambertian surface?

Orthogonal Bases and the QR Algorithm

A Theory of Shape by Space Carving

Geometric Context from a Single Image

Which Edges Matter? Aayush Bansal RI, CMU. Devi Parikh Virginia Tech. Adarsh Kowdle Microsoft. Larry Zitnick Microsoft Research

An Efficient Solution to the Five-Point Relative Pose Problem

Learning over Sets using Kernel Principal Angles

What can be seen in three dimensions with uncalibrated stereo rig?

How To Draw On A Computer Graphics Program

Multi-view matching for unordered image sets, or How do I organize my holiday snaps?

Object Detection with Discriminatively Trained Part Based Models

Common Core State Standards for. Mathematics

Great Minds Think Alike, and Fools Seldom Differ: A Theory on the Moments of Firms' Performance Distributions.

Unrealistic Optimism About Future Life Events

Parallel Tracking and Mapping for Small AR Workspaces

Object Segmentation by Long Term Analysis of Point Trajectories

Inference by eye is the interpretation of graphically

How To Test The Effect Of Pain On A Person'S Self Esteem

Transcription:

Image Forensic Analyses that Elude the Human Visual System Hany Farid a and Mary J. Bravo b a Department o Computer Science, Dartmouth College, Hanover NH 3755, USA b Psychology Department, Rutgers University, Camden, NJ 82, USA ABSTRACT While historically we may have been overly trusting o photographs, in recent years there has been a backlash o sorts and the authenticity o photographs is now routinely questioned. Because these judgments are oten made by eye, we wondered how reliable the human visual system is in detecting discrepancies that might arise rom photo tampering. We show that the visual system is remarkably inept at detecting simple geometric inconsistencies in shadows, relections, and perspective distortions. We also describe computational methods that can be applied to detect the inconsistencies that seem to elude the human visual system. Keywords: Photo Forensics. INTRODUCTION In an attempt to quell rumors regarding the health o North Korea s leader Kim Jong-Il, the North Korean government released a series o photographs in the Spring o 28 showing a healthy and active Kim Jong-Il. Shortly ater their release the BBC and UK Times reported that the photographs might have been doctored. One o the reported visual discrepancies was a seemingly incongruous shadow. Because such claims o inauthenticity are oten made by eye, we wondered how reliable the human visual system is in detecting discrepancies that might arise rom photo tampering. In many ways, the visual system is remarkable, capable o hyperacuity, rapid scene understanding, 2 and robust ace recognition. 3 In other arenas, however, the visual system can be quite inept. For example, the visual system can be insensitive to inconsistencies in lighting, 4 viewing position, 5 and certain judgments o lightness and color. 6 There is also some evidence that observers cannot reliably interpret shadows, 7 relections, 8 and perspective distortion. 9 These last three cues can provide evidence o photo-tampering, and so, as the example above illustrates, it is important to understand how well observers can utilize these cues. Here we report three experiments that compare the perormance o human observers with that o computational methods in detecting inconsistencies in shadows, perspective and relections. 2. Human Perormance 2. SHADOWS Figure (a) is a rendered 3-D scene illuminated by a single light that produces cast shadows on the ground plane and back wall. Panel (b) o this igure is the same scene with the light moved to a dierent location. Panel (c) is a composite created by combining the back wall rom panel (a) and the ground plane rom panel (b) to create a scene with shadows that are inconsistent with a single light. One hundred and orty rendered scenes were created such that the cast shadows were either consistent or inconsistent with a single light. For the consistent scenes, the light was positioned either on the let or right side o the room and in one o nine dierent locations that varied in distance rom the ground plane and rom the back wall. For the inconsistent scenes, the back walls rom scenes with dierent lighting were interchanged. Twenty observers were each given unlimited time to judge whether the original and composite scenes were consistent with a single light. Their perormance was Contact: arid@cs.dartmouth.edu and mbravo@camden.rutgers.edu Fake photo revives Kim rumours, BBC, November 2, 28 Kim Jong Il: digital trickery or an amazing recovery rom a stroke?, UK Times, November 7, 28.

Figure. The cast shadows in panels (a) and (b) are consistent with a single light. The scene in panel (c) is a composite o the back wall rom panel (a) and the ground plane rom panel (b). In this case, the cast shadows are inconsistent with a single light. The yellow lines in panel (d) connect points on each object to their projected shadow. The intersection o these lines is the 2-D location o the light. nearly perect (95.5%) or inconsistent scenes that were the combination o lights rom opposites sides o the room (i.e., the cast shadows ran in opposite directions). For all other cases, however, observer accuracy was near chance (52.8%). The average response time was 4.9 seconds, indicating that observers spent a reasonable amount o time inspecting each scene. 2.2 Geometry o Shadows Although observers have considerable diiculty detecting inconsistencies in cast shadows, there is a simple imagebased technique or making this judgment. Since light travels in a straight line, a point on the shadow, its corresponding point on the object, and the light source all lie on a single line. Thereore, the light source will always lie on a line that connects every point on a shadow with its corresponding point on an object, regardless o scene geometry. In an image, the projection o these lines will always intersect at the 2-D projection o the light position. In practice, there are some limitations to this geometric analysis o light position. Care must be taken to select appropriately matched points on the shadow and the object; this is best achieved when the object has a distinct shape (e.g., the tip o a cone). I the dominant light is the sun, then the lines may be nearly parallel, making the computation o their intersection vulnerable to numerical instability. For example, Figure 2(a) is an authentic image where the sun is directly above the vehicle. The yellow lines, connecting shadow and object, are nearly parallel, making it diicult to determine i these lines intersect at a single point. On the other hand, the image in Figure 2(b) must be a ake because the lines clearly diverge. Even i the intersection is diicult to

Figure 2. An (a) authentic and (b) ake image. The yellow lines, connecting shadow and object, should intersect at the location o the light. The diverging lines on the right reveal that the cast shadows are inconsistent with a single light. compute, this analysis can still be employed to determine i an object s shadow is inconsistent with a single light. We also note that this simple geometric analysis could be used to condition the estimation o light direction as described in. 2,3 3. Human Perormance 3. PLANAR PERSPECTIVE Figure 3(a) is a rendered scene with three planar suraces texture-mapped with the same 2-D image. Panel (b) o this igure is the same scene with the texture map on the let-most panel skewed relative to the central panel. Three planar suraces were placed in the coniguration shown in Figure 3, with only one o the three planes texture-mapped with one o six images o amiliar objects. The image was texture-mapped with no distortion or with horizontal skew. The horizontal skew was created by applying the aine matrix [ s ; ] to the image, where the amount o skew s varied rom ±2 to ±8. For the let plane in Figure 3, a negative skew counteracted some o the perspective planar distortion, while a positive skew exaggerated the perspective distortion. On the other hand, a positive skew would have exaggerated the perspective distortion. Twenty observers were each shown examples o undistorted and skewed images, and instructed that their task would be to determine i an image was skewed. In the center panel condition, there was minimal perspective distortion and we expected all observers to accurately detect skew in the image. Nonetheless, ive observers perormed below 7% overall and their data were eliminated rom this analysis. The remaining observers showed the expected pattern o results or the center panel, reliably detecting large positive and negative skews (center panel o Figure 4). O interest is how these observers perormed when the panel was viewed obliquely and the image was subjected to perspective distortions (let and right panels). Here the pattern o results is asymmetrical depending on whether the skew in the image exaggerated or counteracted the perspective distortion. When the skew exaggerated the perspective distortion (positive image skew, let panel; negative image skew, right panel) perormance was good. When the skews counteracted the perspective distortion, perormance was at or below chance. That is, observers were unable to detect even a large skew when the eects o perspective distorted counteracted the skew. These results suggest that observers underestimate or possibly even ignore the eects o

Figure 3. The image on each o the planes in panel (a) are the same. The image on the let-most plane in panel (b) is a skewed version o the image on the central plane, as shown in the rectiied images below.

skew responses (%) 8 6 4 2 skew responses (%) 8 6 4 2 skew responses (%) 8 6 4 2.8.6.4.2.2.4.6.8 skew.8.6.4.2.2.4.6.8 skew.8.6.4.2.2.4.6.8 skew Figure 4. Skew estimation accuracy or each o three planar panels (let, center, right, respectively) shown in Figure 3. perspective projection. Averaging over all conditions, observer accuracy or the let and right panel was 63.% and 64.4%, compared to 82.6% on the center panel. The average response time was 2.8 seconds, indicating that observers spent a reasonable amount o time inspecting each scene. 3.2 Geometry o Planar Perspective Observers routinely underestimate the amount o distortion caused by planar perspective distortion. From a computational point o view, the perspective transormation o a planar surace is relatively easy to model and estimate. 4,5 The mapping rom points in 3-D world coordinates to 2-D image coordinates can be expressed by the projective imaging equation: x = P X, where the 3 4 matrix P embodies the projective transorm, the vector X is a 3-D world point in homogeneous coordinates, and the vector x is a 2-D image point also in homogeneous coordinates. In the case when all o the world points X lie on a single plane, the transorm reduces to a 3 3 planar projective transorm H, also known as a homography: x = H X, () where the world X and image points x are now represented by 2-D homogeneous vectors. In order to estimate the homography H, we begin with a cross production ormulation o Equation (): x [HX] = x x 2 h h 2 h 3 h 4 h 5 h 6 X X 2 =. (2) x 3 h 7 h 8 h 9 X 3 Evaluating the cross product yields: x 2(h 7 X + h 8 X 2 + h 9 X 3 ) x 3 (h 4 X + h 5 X 2 + h 6 X 3 ) x 3 (h X + h 2 X 2 + h 3 X 3 ) x (h 7 X + h 8 X 2 + h 9 X 3 ) =. (3) x (h 4 X + h 5 X 2 + h 6 X 3 ) x 2 (h X + h 2 X 2 + h 3 X 3 ) This constraint is linear in the unknown elements o the homography h i. Re-ordering the terms yields the ollowing system o linear equations: x h 3 3X x 3 X 2 x 3 X 3 x 2 X x 2 X 2 x 2 X 3 h 4 x 3 X x 3 X 2 x 3 X 3 x X x X 2 x X 3 h 5 x 2 X x 2 X 2 x 2 X 3 x X x X 2 x X 3 h 6 h 7 h 8 h 9 h h 2 = A h =. (4)

Figure 5. The ace o each cigarette box in panel (a) is rectiied using the known dimensions o the box ace. Shown in panels (b) and (c) are the rectiied images, revealing an inconsistency in the cartoon character and text. A matched set o world X and image x coordinates appears to provide three constraints on the eight unknown elements o H (the homography is deined up to an unknown scale actor, reducing the number o unknowns rom nine to eight). The rows o the matrix A, however, are not linearly independent (the third row is a linear combination o the irst two rows). As such, this system provides only two constraints in the eight unknowns. Thereore, a total o our or more points with known world and image coordinates are required to estimate the homography. From these points, standard least-squares techniques can be used to solve or h: the minimal eigenvalue eigenvector o A T A is the unit vector h that minimizes the least-squares error A h 2. The inverse homography H is applied to the image to remove planar perspective distortion. Figure 5(a) is a orgery o our creation two boxes o Marlboro cigarettes were doctored to read Marlboro kids with an image o the cartoon character Tweety Bird. On both boxes, the kids text and the character were manually adjusted to give the appearance o correct perspective. Figure 5(b) and (c) are the results o planar rectiication based on the known shape o the rectangle on the ront o the box ( /6 3 /8 inches, determined by measuring an actual box o cigarettes). Note that ater rectiication the text and character on the boxes are inconsistent with one another, clearly revealing the image to be a ake. 4. Human Perormance 4. REFLECTIONS Figure 6(a) is a rendered 3-D scene containing a red cone and a mirror. Panel (b) o this igure is the same scene with the cone displaced relative to the mirror. Panel (c) is a composite created by replacing the correct relection in panel (a) with that rom panel (b) to create a physically impossible scene. Three-dimensional rendered scenes were generated such that the relection was either consistent or inconsistent with the scene geometry, Figure 6. The scenes were rendered with the viewer in one o three locations relative to the relective mirror, either (nearly ronto-parallel) or ±6 relative to the mirror. For each viewing direction, the object (red cone) was moved to one o three locations along the ground plane. The inconsistent scenes were generated by combining the relection rom one scene with the object rom another, always taken rom scenes with the same viewing direction. Twenty observers were each presented with these scenes (4 consistent and 28 inconsistent) and given unlimited time to determine i the relection in each was correct. The average accuracy over all viewing conditions was only 55.7%, slightly better than chance. The average response time was 7.6 seconds, indicating that observers spent a reasonable amount o time inspecting each scene. 4.2 Geometry o Relections Observers were largely unable to predict the location o an object s relection in a planar mirror. This ailure might not seem surprising given that the task requires knowledge o 3-D scene geometry. However, i the relective

Figure 6. The relections in the planar mirror in panels (a) and (b) are consistent with the scene geometry. The scene in panel (c) is a composite o the object rom panel (a) and the relection rom panel (b). In this case, the relection and scene geometry are inconsistent The yellow lines in panel (d) constrain the location o the object relative to the relection. surace is planar with known dimensions, there is suicient inormation in the image to constrain the relationship between an object and its relection. The law o relection states that a light ray relects rom a surace at an angle o relection θ r equal to the angle o incidence θ i, measured with respect to the surace normal. Assuming unit-length vectors, the direction rom the relection to the object Z can be described in terms o the view direction V and surace normal N as: Z = 2 cos(θ i ) N V = 2( V T N) N V. (5) In order to estimate the view direction and surace normal in a common coordinate system, we must irst determine the homography H that maps the relective planar surace rom world to image coordinates. This estimation is the same as that described in Section 3. Once estimated, the homography is actored as: H = λk ( r r 2 t ), (6) where λ is a scale actor, the matrix K embodies the internal camera parameters, and where the rigid body transormation rom world to camera coordinates is speciied by a translation vector t, and a rotation matrix

R whose irst two columns are r and r 2. The ull rotation matrix is ( r r 2 r r 2 ). I we assume that the camera has unit aspect ratio and zero skew (i.e., square pixels), and that the principle point is the image center, then the intrinsic matrix simpliies to: K =, (7) where is the ocal length. Substituting this intrinsic matrix into Equation (6) gives: H = λ ( r r 2 t ). (8) Let-multiplying by K yields: H = λ( r r 2 t ) h h 2 h 3 h 4 h 5 h 6 = λ( r r 2 t ) (9) h 7 h 8 h 9 Because r and r 2 are the irst two columns o a rotation (orthonormal) matrix, their inner product, r T r 2, is zero, leading to the ollowing constraint: h T h 4 h 2 h 5 = h 7 h 8 (h h 4 h 7 ) 2 h 2 h 2 5 =. () h 8 The ocal length is estimated by solving the above linear system or : = h h 2 + h 4 h 5 h 7 h 8. () The additional constraint that r and r 2 are each unit length, r T r = r T 2 r 2, can also be used to estimate the ocal length. 4 The scale actor λ is determined by enorcing unit norm on the columns o the rotation matrix: λ = K h = r (2) where h is the irst column o the matrix H. With the homography actored, the desired view direction and surace normal can each be estimated in a common coordinate system. The view direction, in camera coordinates, is given by: V = λ K H (X Y ) T, (3) where (X Y ) is the location o the relection in world coordinates. These coordinates are determined by irst applying H to the image, in order to planar rectiy the relective surace. A point (X Y ) on the relection is then selected. Without loss o generality, we assume that the world plane is positioned at unit length rom the

Figure 7. Flamingos rom Miami s MetroZoo seek shelter rom Hurricane Georges (Joe Cavaretta / Associated Press / Sept. 998). On the right is a magniied view o the lamingos and their relection. The yellow lines show that the position o the relections are consistent with the scene geometry. origin (i.e., Z = ). The surace normal in world coordinates is ( the origin), and in camera coordinates: ) T (i.e., along the Z-axis and acing N = R ( ) T. (4) With V and N estimated, the direction Z to the object is then determined rom Equation (5). Note that all o these directions and normals are speciied in the camera coordinate system. As such, they can each be projected into image coordinates and used to determine i an object and its relection are consistent. Speciically, in the original image, a line is drawn rom a point on the relection to K( V + Z), Figure 6(d). Note, however, that this constraint does not uniquely deine the location o the object, as the relection is consistent with an object anywhere along the constraint line. Figure 7 is a seemingly improbable, albeit authentic image. The right panel is a magniied view o the lamingos and their relection. Because the tiles surrounding the mirror are square, they were used to estimate the aspect ratio o the mirror. This known aspect ratio o.65 was used to estimate the homography H, rom which the object location o a relection was estimated. Figure 7 are two such estimates, where the yellow lines connect points in the mirror with their corresponding real-world locations. Any inconsistencies in these locations could be used as evidence o tampering. 5. DISCUSSION The human visual system is, at times, remarkably inept at detecting simple geometric inconsistencies that might result rom photo tampering. We described three experiments that show that the human visual system is unable to detect inconsistencies in shadows, relections, and planar perspective distortions. At the same time, we have described computational methods that can be applied to detect the inconsistencies that seem to elude the human visual system. These results suggest that care should be taken when making judgments o photo authenticity based solely on visual inspection. ACKNOWLEDGMENTS Thanks to Christopher Kapsales or his help with the data collection. This work was supported by a git rom Adobe Systems, Inc., a git rom Microsot, Inc., and a grant rom the National Science Foundation (CNS- 7829).

REFERENCES [] Westheimer, G. and McKee, S., Visual acuity in the presence o retinal-image motion, Journal o the Optical Society o America 65, 847 85 (975). [2] Potter, M., Short-term conceptual memory or pictures, Journal o Experimental Psychology: Human Learning and Memory 2, 59 522 (976). [3] Sinha, P., Balas, B., Ostrovsky, Y., and Russell, R., Face recognition by humans: 9 results all computer vision researchers should know about, Proceedings o the IEEE 94(), 948 962 (26). [4] Ostrovsky, Y., Cavanagh, P., and Sinha, P., Perceiving illumination inconsistencies in scenes, Perception 34, 3 34 (25). [5] Vishwanath, D., Girshick, A., and Banks, M., Why pictures look right when viewed rom the wrong place, Nature Neuroscience (8), 4 4 (25). [6] Adelson, E. H., [The New Cognitive Neurosciences, 2nd Edition], ch. Lightness Perception and Lightness Illusions, 339 35, MIT Press (2). [7] Jacobson, J. and Werner, S., Why cast shadows are expendable: Insensitivity o human observers and the inherent ambiguity o cast shadows in pictorial art, Perception 33(), 369 383 (24). [8] Bertamini, M., Spooner, A., and Hecht, H., Predicting and perceiving relections in mirrors, Journal o Experimental Psychology: Human Perception and Perormance 39, 982 2 (23). [9] Bravo, M. and Farid, H., Texture perception on olded suraces, Perception 3(7), 89 832 (2). [] Ritschel, T., Okabe, M., Thormählen, T., and Seidel, H.-P., Interactive relection editing, ACM Trans. Graph. (Proc. SIGGRAPH Asia 29) 28(5) (29). [] Zhang, W., Cao, X., Zhang, J., Zhu, J., and Wang, P., Detecting photographic composites using shadows, in [IEEE International Conerence on Multimedia and Expo], 42 45 (29). [2] Johnson, M. and Farid, H., Exposing digital orgeries by detecting inconsistencies in lighting, in [ACM Multimedia and Security Workshop], (25). [3] Johnson, M. and Farid, H., Exposing digital orgeries in complex lighting environments, IEEE Transactions on Inormation Forensics and Security 3(2), 45 46 (27). [4] Hartley, R. and Zisserman, A., [Multiple View Geometry in Computer Vision], Cambridge University Press (24). [5] Johnson, M. and Farid, H., Metric measurements on a plane rom a single image, Tech. Rep. TR26-579, Department o Computer Science, Dartmouth College (26).