Transparency and Occlusion

Transcription

1 Transparency and Occlusion Barton L. Anderson University of New South Wales One of the great computational challenges in recovering scene structure from images arises from the fact that some surfaces in a scene are partially obscured by nearer surfaces or media. Both occluding and transparent surfaces interrupt the projection of more distant surfaces, and may be considered two ends of a continuum. Occluding surfaces completely obscure the surfaces that they occlude, whereas transparent surfaces only partially obscure the surfaces they overlay. The degree to which a transparent surface obscures an underlying surface depends on its transmittance (i.e., the proportion of light that it lets through of the underlying layer). Thus, when the transmittance is zero, the near surface is an opaque occluder; when it is greater than zero, some light of the underlying layer is transmitted. In order for the visual system to recover scene structure in contexts in which the transmittance of a near layer falls between zero and one, it must decompose the image into a layered representation that specifies the presence of multiple surfaces (or a surface and intervening media) along the same line of sight. This form of decomposition has been termed scission. In addition to the relationship between transparency and occlusion, the physical transformations induced by transparent surfaces are intimately related to the physical transformations that are caused by changes in illumination. This suggests that possibility that the decomposition that underlies the perception of transparency may also underlie the separation of illumination from surface reflectance, and hence, the computation of surface lightness and/or color. Thus, the topic of transparency is intimately related to two apparently distinct domains: the computation of occlusion relationships; and the computation of surface lightness. In this paper, I will describe some recent evidence that reveals the intimate relationship between scission and the perception of surface opacity, lightness, and depth, and the impact this research has on theoretical frameworks in vision more broadly. The computation of transparency The topic of transparency research emerged as a fundamental area of vision research with the seminal work of the Italian psychologist Metelli (1970; 1974a,b). Metelli developed a model of transparency based on the physical setup he used to generate of transparent images namely, a disk with a missing sector (episcotister) that rotated in front of a two-toned background (see Fig. 1). Metelli derived a simple set of equations that described the relationship between the reflectance of the underlying background surfaces, the transmittance of the transparent filter (i.e., the size of the missing sector), and the reflectance of the transparent layer. Metelli argued that perceived transparency was well predicted by the physical constraints that must be satisfied to produce a transparent surface, and thus, embraced an inverse optics approach to modeling visual perception. In particular, Metelli used his episcotister display to derive a physical model for the transmittance (α) and reflectance (t) of the transparent surface, which was just a weighted sum of the contributions of the light transmitted from the underlying layer and that reflected by the episcotister:

2 p = αa + (1-α)t (1) q = αb + (1-α)t (2) where p is the region containing the transparent layer that overlaps background a; q is the region containing the transparent layer that overlaps background b, and α is the transmittance of the transparent layer (i.e., the proportion of the size of the holes in the episcotister). These equations can be solved to derive separate expressions for the transmittance and reflectance of the transparent surface: α = (p-q)/(a-b) (3) t = (aq-bp)/(a+q-b-p) (4) To make physical sense, α is restricted to be between 0 and 1, which imposes two basic constraints on the images that are consistent with transparency: the luminance difference between the regions of transparency (p-q) must have the same sign as the luminance difference between the regions in plain view (a-b) (to insure that α is positive); and the magnitude of the luminance difference of the regions of transparency (p-q) must be less than or equal to the luminance difference of the surface in plain view (a-b) (to insure that the transmittance falls between 0 and 1). Perhaps the most salient restriction of the applicability of these equations is that they can only be used to describe transparent filters containing a uniform reflectance and transmittance. They make no predictions of displays containing unbalanced transparent filters or media, i.e., forms of transparency that are not uniform in reflectance and/or transmittance. Note that Metelli s model is a purely generative model, i.e., his equations described the (simplified) physics of his episcotister display. Thus, the issue of whether such equations could be used to predict when, whether, and how transparency was perceived is an issue that required psychological experimentation. Nearly three decades of research into transparency perception seemed to provide compelling evidence that Metelli s model successfully predicted when transparency was and was not perceived (Metelli, 1974a,b, 1985; Metelli, Da Pos, & Cavedon, 1985; Gerbino, Stultiens, Troost, & de Weert, 1990; Kaasrai & Kingdom, 2001). However, we have recently argued that that this apparent success stems from the particular methodologies employed to assess his model. Although Metelli derived separate expressions for the transmittance and reflectance of a transparent surface, until recently, no experiments assessed whether these expressions accurately captured human perception of transparent surfaces. Rather, most experiments typically required observers to adjust (or otherwise judge) the luminance of a test patch in a display until it appeared to form a uniform transparent filter; they did not measure whether Metelli s equations could quantitatively predict the perceived transmittance and reflectance of a transparent filter. To provide a more direct quantitative test of Metelli s model, we (Singh & Anderson, 2002) recently performed a number of experiments to directly assess whether his equations correctly predicted the perception of transparency. In these experiments, observers were required to separately match the transmittance and reflectance of a

3 transparent filter. Observers viewed displays containing a central sinewave grating surrounded by a higher contrast sinewave grating of the same frequency, orientation and phase. To enhance the perception of transparency, binocular disparity was added to the edges of the central patch, giving rise to a clear percept of a homogeneous transparent filter on lying on top of a high contrast grating. Metelli s model predicts that the perceived transmittance of a transparent filter should only depend on the ratio of luminance differences (i.e., the luminance range) of the regions of transparency to the regions in plain view (see equation 3). In one of our experiments, the luminance range of the region in plain view (the high contrast grating) and the luminance difference of the region of transparency was held constant; only the mean luminance of the region of transparency was varied. Observers adjusted the luminance range of a matching pattern to match the perceived transmittance of the transparent filter with varying mean luminance values. In this experiment, the luminance range of the region of transparency and the region in plain view are constant, so Metelli s equations predict that the transmittance settings in this experiment should be independent of mean luminance. This was not at all what was observed experimentally. Rather, observer s matches are very strongly dependent on the mean luminance of the transparent region (see Fig. 2). In particular, observers significantly underestimate the transmittance of light filters, and systematically overestimate the transmittance of dark filters. The results of this experiment reveal that human observers are not simply inverting the physics of transparency to compute the properties of transparent surfaces. The theoretical importance of this finding should not be underestimated, as one of the main theoretical views of visual processing assumes that the visual system extracts the properties of the world by performing computations that invert the image formation process. These findings provide a striking example where human observers have a very clear sense of a surface property (the transmittance of a transparent layer) that almost always generates the physically incorrect answer. Why, then, does the visual system err in this manner? What information is the visual system using to compute these properties that causes it to give such incorrect responses? One of the main problems with Metelli s model is that it assumes that transmittance is computed on the basis of the ratio of luminance differences between the regions of transparency and the regions in plain view. 1 However, the earliest stages of cortex do not seem to have access to raw luminance values. Rather, the information about image structure seems to be largely transformed into a contrast code. Indeed, when the data from this experiment are plotted in terms of Michelson contrast, it becomes evident that the visual system uses contrast to scale the transmittance of a transparent surface, even though this yields the physically incorrect answer. In our sinusoidal displays, Michelson contrast provides a good measure of perceived contrast, so works well in accounting for perceived contrast in our displays. However, Michelson contrast fails to provide an adequate measure of contrast in more complex spatial patterns containing, e.g., random patches of achromatic surfaces. Despite these shortcomings, recent work has shown that observer s transparency judgments can be well described by the perceived contrast of images (Robillotto, Khang, & Zaidi, 2002; Robillotto & Zaidi, 2004), 1 Strictly speaking, Metelli s model is formulated on the basis of reflectance differences. However, Gerbino et al. (1990) showed that Metelli s equations could be rewritten as difference in luminance values, and that the form of these equations were identical to equations (1-4).

4 suggesting that transparency computations are indeed based on some (yet to be discovered) measure of image contrast. In sum, there is now clear evidence that the visual system uses the relative contrast of image regions to compute the opacity of transparent surfaces. In what follows, I will consider the implications that this discovery has had in understanding when the decomposition into a layered representation is initiated, and the consequences that this decomposition can have on not just the properties of the transparent surface, but the underlying surface as well. Anchoring perceived transmittance In addition to the quantitative failures of Metelli s model, it was also not easily generalized to a variety of naturally occurring forms of transparency. Metelli s model only described conditions of balanced transparency, i.e., conditions where the transmittance and reflectance of the transparent layer is uniform. However, many naturally occurring forms of transparency such as that induced by smoke and fog are typically unbalanced, particularly in transmittance. Clearly, a more general framework is needed to understand the wide variety of ways that transparent surfaces and media can transform image structure. How does the visual system determine whether a scene is in plain view or viewed through a transparent layer or medium when the properties of the transparent layer vary continuously? If image contrast is the currency through which the properties of transparent surfaces are computed, then any general theory of transparency perception will have to use image contrast as one of its main ingredients in the computation of transparency. However, all natural scenes generate variations in image contrast, and the properties of the underlying surfaces are unknown. So how can the visual system determine whether a given image arose from a scene in plain view or a higher contrast scene viewed through a (contrast reducing) transparent layer? The image data is always consistent with both possibilities, so something is needed to understand how the visual system determines when transparency is and is not perceived. I have recently proposed that the visual system employs a transmittance anchoring principle to determine when transparency is (and is not) inferred (Anderson, 1999, 2003a). The intuitive content of this principle is that the visual system assumes the fewest surfaces necessary to account for the image data. More specifically, it states that the visual system treats the highest contrast region along surfaces and contours as transmittance anchors that are in plain view. All other contrast values along such contours and/or surfaces are compared to this anchor region, and decreases in contrast along surfaces are used to infer the presence of transparency. More specifically, this theory asserts that if there are reductions in contrast along surfaces or contours that are geometrically continuous, then scission is initiated and transparency is perceived. The magnitude of the contrast reduction is used to compute the transmittance of the overlying (transparent) layer (in proportion to this reduction; i.e., the greater the contrast reduction, the more opaque the transparent surface will appear). In the limit, where the contrast of the underlying surface goes to zero, the transmittance of the overlying surface goes to zero, and the near layer becomes an occluder. Note also that if a change in mean luminance occurs without a corresponding reduction in contrast, that such transformations are consistent with a change in illumination, and hence, would be correctly predicted to appear in plain view in both regions of the image. Thus, the

5 contrast relationships along surfaces and contours could be used to provide information about both transparent media and changes in illumination. We have recently shown that this theory correctly predicts the perception of transparency in both balanced displays, as well as spatially inhomogeneous media (Anderson, Singh, & Meng, 2006). In addition, we have shown that transmittance anchoring also has a temporal component: if the contrast of a texture is modulated in time, the highest contrast region in the spatio-temporal sequence is treated as a region in plain view, and the spatio-temporal reductions in contrast appear as the intrusion of transparent media. Thus, the transmittance anchoring principle appears to provide a foundation on which to predict when transparency is and is not perceived, and the perceived transmittance of the transparent layer. Scission and the perception of lightness The decomposition of an image region into multiple layers can also have a significant impact on perceived lightness. In the preceding, I have focused on how variations in image contrast can be used to determine whether transparent surfaces are present, and if so, how the opacity of the transparent layer is computed. However, when multiple surfaces are present along the same line of sight, the visual system must also compute properties of the underlying surface, such as its lightness. Given the close relationship between the physical transformations induced by transparent surfaces and the transformations induced by changes in illumination, there is good a priori reasons to suspect that a process of scission may play a critical role in the perception of surface lightness. Although a number of authors have suggested the possibility that scission may play a critical role in lightness perception (Anderson, 1997; Bergström, 1977; Gilchrist, 1977, 1979; Adelson, 1993), such authors have questioned whether an explicit decomposition actually underlies lightness perception. More recently, we have been developing a new paradigm that explicitly reveals the role that scission can play in the perception of surface lightness (Anderson, 1999; 2003a,b; Anderson & Winawer, 2005). Traditional lightness studies typically employ homogeneous targets to be judged, and measure the effect different contexts have on their perceived lightness or brightness. In such paradigms, it can often be difficult to determine whether scission is occurring, and if so, whether it is playing any causal role in the perceived lightness of a figure. However, in the paradigm we have developed, if scission occurs, it is phenomenologically very explicit, and the effects it has on perceived lightness can be directly experienced. To see the role that scission can play in lightness perception, consider the image depicted in Fig. 3. This figure appears to contain white chess pieces viewed through dark smoke, and black chess pieces viewed through white fog. However, the image regions containing the chess pieces in the top and bottom of the image are actually absolutely identical; the only difference between the top and bottom images is the overall lightness of the surrounds. These images were carefully constructed to satisfy the constraints of transparency. In the top image, all of the boundaries separating the chess pieces from the surround are lighter inside the chess pieces than outside (so that the boundary separating the regions is of constant polarity), whereas the opposite polarity holds for the bottom figure. In addition to the shift in polarity between the top and bottom figure, there are also significant differences in the way the magnitude of contrast varies along the borders separating the chess pieces and their surrounds in the two images. Consider, e.g., the

6 king in the two images. In the top figure, the greatest contrast between the king and the surround occurs along the bottom right of the piece, whereas the lowest contrast occurs along the top. Thus, the transmittance anchoring principle states that the bottom right of this figure should appear in plain view, which is white; whereas the reductions in contrast that occurs along the boundaries separating the king from its surround should signal the presence of a transparent medium that varies in opacity (being most opaque where the contrast of the boundary is lowest, which here, occurs along the top of the king). A similar analysis holds for the bottom figure, except now the polarity and magnitude relationships are reversed. In this image, the highest contrast region along the border separating the king from its surround occurs along its top, and hence, the transmittance anchoring principle predicts that this portion of the king (which is dark) should appear in plain view, and lower contrast regions along the contour should appear partially obscured by transparent media. In the bottom image, the lowest contrast region of the kingsurround border occurs along the lower right of the image, and thus, the opacity of the transparent layer should be greatest in this region. This is consistent with what observers report. In sum, the perception of transparency and lightness in Fig. 3 reveals the close relationship between the perception of transparency, occlusion, and lightness. The perception of transparency involves the decomposition of an image into multiple layers, and the properties of the layers depend critically on exactly how luminance is partitioned between them. There is a growing body of data suggesting that the contrast relationships that occur along surfaces and contours play a critical role in determining when scission is initiated, as well as determining how surface properties such as lightness and opacity are attributed to the layers that are formed when this decomposition occurs. Such results reveal severe limitations on inverse optics models of perception, since the computation of properties such as the transmittance of transparent surfaces are almost always physically incorrect. Moreover, there is currently no single measure of image contrast that adequately captures perceived contrast in arbitrary images, which impedes the ability to predict the precise conditions that lead to scission and the quantitative consequences that scission should have on perceptual experience. It is therefore of critical importance to develop a physical measure of contrast that captures the human experience of contrast, so that it can be used to develop and assess theories of transparency perception. Finally, although phenomena such as those depicted in Fig. 3 reveal that scission can have a dramatic effect on perceived lightness, more research is needed to determine if an explicit decomposition of images into layers is responsible for the perception of lightness (and color) in all images.

7 References Adelson, E.H. Perceptual organization and the judgment of brightness. Science 262, (1993). Anderson, B. L. (1997) A theory of illusory lightness and transparency in monocular and binocular images. Perception 26, Anderson, B. L. (1999) Stereoscopic surface perception. Neuron, 26, Anderson, B. L. (2003a) The role of occlusion in the perception of depth, lightness, and opacity. Psychological Review, 110, Anderson, B.L. (2003b) The role of perceptual organization in White's illusion. Perception, 32, Anderson, B.L., Singh, M., & Meng, J. (2006) The perceived opacity of inhomogeneous surfaces and media. Vision Research, 46, Bergström, S.S. (1977) Common and relative components of reflected light as information about the illumination, colour, and three-dimensional form of objects. Scandinavian Journal of Psychology, Gerbino, W., Stultiens, C. I., Troost, J. M., & de Weert, C. M. (1990). Transparent layer constancy. Journal of Experimental Psychology: Human Perception and Performance, 16, Gilchrist, A.L. (1977) Perceived lightness depends on perceived spatial arrangement. Science, 195(4274), Gilchrist, A.L. (1979) The perception of surface blacks and whites. Scientific American, 240, Kasrai, R., & Kingdom, F. (2001). Precision, accuracy, and range of perceived achromatic transparency. Journal of the Optical Society of America A, 18, Metelli, F. (1970). An algebraic development of the theory of perceptual transparency. Ergonomic, 13, Metelli, F. (1974a). Achromatic color conditions in the perception of transparency. In: MacLeod, R. B., Pick, H. L. (Eds.), Perception: Essays in Honor of J. J. Gibson. Cornell University Press, Ithaca, NY. Metelli, F. (1974b). The perception of transparency. Scientific American, 230, Metelli, F., Da Pos, O., & Cavedon, A. (1985). Balanced and unbalanced, complete and partial transparency. Perception and Psychophysics, 38, Robillotto, R., Khang, B., & Zaidi, Q. (2002). Sensory and physical determinants of perceived achromatic transparency. Journal of Vision, 2, Robillotto, R., & Zaidi, Q. (2004). Perceived transparency of neutral density filters across dissimilar backgrounds. Journal of Vision, 4, Singh, M., and Anderson, B.L. (2002) Perceptual assignment of opacity to translucent surfaces: The role of image blur. Perception, 31,

8 Figure Captions: Fig. 1: The episcotister setup used by Metelli to derive his transparency model. A disc with a missing sector of size α is rotated at high speed over a two toned background. The proportion of the open to solid regions of the disc determines the transmittance of the episcotister, and the reflectance of the solid portion of the episcotister (1-α) contributes additional luminance in regions of transparency. Fig. 2. Results from Experiment 1 in Singh and Anderson (2002). Observers adjusted a matching pattern to appear equal in transmittance to a test patch with a fixed luminance range and mean luminance. Dashed lines in the left figure indicate the predictions of Metelli s model; the matches made to four different luminance ranges with varying mean luminance are given by the filled symbols for a typical observer. The data exhibit strong and consistent departures from these predictions. When the data are re-plotted in terms of Michelson contrast (right figure), the data are independent of mean luminance, demonstrating that observers us perceived contrast to determine the transmittance of transparent layers. Fig. 3: A figure demonstrating how the contrast relationships along contours can be used to determine the perceived transmittance and lightness of surfaces in a complex scene. The regions within the boundaries of the chess pieces are composed of identical textures in the top and bottom image, but are decomposed in complementary ways. In the top figure, the texture is decomposed into dark clouds that obscure light chess pieces, and in the bottom, the texture appears as light mist that obscured dark chess pieces. Note that the highest contrast contour segments in both the top and bottom figure appear in plain view, which occur in different regions for the two images. See text for details.

9 Figure 1

10 Figure 2

11 Figure 3