Is auditory streaming a bistable percept? Daniel Pressnitzer Equipe Audition, LPE-CNRS UMR 858, Département d Etudes Cognitives, ENS, 29 rue d Ulm, 7523 Paris cedex 5, France, e-mail: Daniel.Pressnitzer@ens.fr Jean-Michel Hupé Centre de Recherche Cerveau Cognition (CerCo), CNRS UMR 5549, Faculté de Médecine de Rangueil-Bat A3 33, route de Narbonne, 362 Toulouse Cedex, e-mail: Jean-Michel.Hupe@cerco.ups-tlse.fr The physical world can present us with ambiguous stimuli. Perceptual systems, however, must make decisions. In the visual modality, it has been shown that two ambiguous perceptual interpretations are usually mutually exclusive. When the ambiguous stimulus is observed for long enough, though, the two interpretations can alternate spontaneously. This has been termed bistability. Auditory scenes can also provide ambiguous cues. For instance, there might be one or more active sources in a scene, and we must decide how many are actually present. We studied whether such scenes would give rise to auditory bistability. We used the classic stimulus where a high tone A alternates with a low tone B, in repeated ABA- sequences. Listeners report either hearing the sequence as a single stream ABA-ABA or as two separate streams A- A-A-A and -B-B-. When listeners heard 4-minutes sequences with a 5-semitone difference and a 2-ms tone duration, we found spontaneous alternations between the one-stream and two-streams percepts. The dynamics of the alternations had the characteristics associated with bistability: a log-normal distribution of durations of the percepts, and an absence of correlations between successive durations. The ratio of one vs two streams percept could be altered by voluntary intention. We thus propose that bistability exists in the auditory modality. We also compared the frequency of switches and the effect of voluntary intention across modalities, by measuring perceptual alternations for auditory and visual bistable stimuli in the same subjects. Introduction Presenting sensory systems with stimuli designed to be ambiguous is a powerful method to probe the organizational mechanisms necessarily involved in conscious perception. In this paper, we demonstrate that ambiguous auditory stimulation can lead to perceptual bistability, i.e. the spontaneous alternance between two mutually exclusive conscious interpretations of an unchanging sensory stimulation. We compare the dynamics of such an auditory bistability with the better-known visual bistability, within the same observers, and the susceptibilities of both modalities to volitional control. The study of bistable perception has found its place in neuroscience as it uncouples, to some extent, the conscious perception of the observer from the characteristics of the physical stimulation. Bistability has been described with different visual stimulations: ambiguous figures such as the Necker cube, binoccular rivalry, moving plaids [, 2, 3]. A number of competing theories attempt to explain the existence of bistable perception. Some are based on sensory fatigue or inhibition of peripheral neural channels, and are thus specific to the modality or even the type of bistability studied, while other posit a central and therefore amodal switching mechanism (see [3] for a review). In spite of the value that bistable perception presents for the study of the brain functions involved in perceptual organisation, it has only been described systematically thus far with stimuli in the visual modality. There are reports of alternating auditory interpretations of an unchanging stimulation, such as in the verbal transformation effect [4, 5]. It is unclear, however, what similarities or differences exist between such phenomena and visual bistability. The study of bistable perception in a sensory modality other than vision is important to investigate whether the rules governing the alternance of perceptual states are indeed general principles of brain function, or specific to the visual system. 2 Auditory streaming as a bistable stimulus Auditory scenes can be just as ambiguous as visual scenes. For instance, there might be one or more active sound sources in a scene, and the observer must decide how many are actually present. We chose a simplified version of such scenes to study auditory bistability. We used a stimulus where a high tone A alternates with a low tone B, in repeated ABA- sequences. Listeners report either hearing the sequence as a single stream ABA-ABA or as two separate streams A-A-A-A and -B B-. This stimulus has been introduced by Van Noorden [6] as a
Forum Acusticum 25 Budapest canonic paradigm to study the mechanisms involved in the organisation of complex auditory scenes. The proportion of one vs two stream percepts have been reported for a range of stimulus parameters [7]. However, all these data pertain to short stimuli or to only one perceptual judgement per stimulus. Recently, brain imaging studies have started collecting continuous judgements of listeners on long streaming stimuli and have observed spontaneous alternances of percepts [8, 9]. As these alternances were used by these authors to study auditory perceptual organisation, it is of interest to assess their commonalities and differences with visual bistability. To this effect, in this paper we recorded continuous perceptual judgements to long exposures to the auditory streaming stimulus and analysed in detail their temporal dynamics. To compare auditory perceptual alternances with visual bistability, we also measured perceptual judgements for a visual bistable stimulus, in the same group of observers. We chose to use visual plaids as the comparison visual stimulus. Visual plaids are made of a network of crossing lines that are seen moving through a circular aperture. Such a stimulus can evoke a percept of a single plaid moving upward, or two separate gratings sliding laterally in opposite directions on top of each other. A number of previous studies have established the characteristics of the dynamics of the perception of plaids [2]. Note that there is a formal correspondance between the visual and auditory percepts in terms of organisation of the sensory scene. The decision has to be made between grouping the scene into a single object (one stream or one plaid) or splitting it between two objects (two streams or two gratings). A spontaneous alternance between percepts could be a sufficient criterion to accept that a given physical stimulation produces perceptual bistability. Leopold and Logothetis (999), however, have proposed three characteristics of the alternations that are found in all visual bistability instances: exclusivity, randomness, and inevitability. Exclusivity means that the two or more perceptual interpretations are mutually exclusive. Randomness characterises the statistical distribution of the time spent in each percept, requiring for instance short-term independence between percept s durations. Inevitability indicates that the observer has only limited volitional control on the perceptual alternances. In the following sections we will address in turn these three criteria with both auditory and visual stimulation. Exclusivity is estimated by allowing observers to report an undeterminate percept and by estimating the time spent in such an undetermined state. Randomness is assessed by statistical analyses of the durations of the alternances. Inevitability is addressed by manipulating the voluntary intention of observers. 3 Methods 3. Auditory stimuli The auditory stimuli consisted of 4-minutes long sequences where a low-frequency tone A alternated with a high-frequency tone B, in an ABA- pattern. The frequency of tone A was 587 Hz and that of tone B was 44 Hz (5 semitones difference). The duration of each tone was 2 ms. The silence (-) that completes the ABApattern was also 2 ms. Listeners initially adjusted the loudness of the tones to a comfortable hearing level and maintained the level constant during the experiment. 3.2 Visual stimuli The visual stimuli consisted two rectangular-wave gratings presented through a rectangular aperture. Each grating consisted of a set of dark stripes at a ±6 angle from the horizontal. The intersection regions were transparent. As the gratings were dark on a lighter background, and appeared as figures moving over the background. A red fixation point was added in the middle of the circular aperture and subjects were instructed to fixate this point throughout stimulus presentation. 3.3 Procedure Listeners were instructed to report their conscious perception of each stimulus continously during stimulus presentation. They started with auditory presentation and were asked to decide whether they heard one or two streams. A third, undetermined response type was available if they heard something else, or were not sure about their perception at a given instant. Responses were collected via 3 buttons on a computer keyboard. In the first run, subjects were instructed to pay close attention to the stimulus. We will refer to this condition as the Attend task. In the subsequent two presentations, they were instructed to either try to hear a one-stream percept, or to try to hear a two-streams percept (in random order of presentation). We will refer to these two instructions as the Group or Split tasks, respectively. Judgements with visual presentation of plaids were then performed, with an identical procedure and the three different tasks (Attend, Group, Split). Judgements were collected continously at a sampling rate of 2 Hz. The default response when the trial started was undetermined. 3.4 Subjects Twenty-three subjects participated in the experiment (average age : 23) with no self-reported hearing problem and
Forum Acusticum 25 Budapest Percent total duration 8 6 4 2 Auditory Undetermined Grouped Split Visual Figure : Ratios of percepts durations in the Attend task, for auditory and visual stimuli. Means across subjects and 95% confidence intervals. In both modalities, the overall time spent in the grouped or split interpretations are similar. Very little time is spent in the undetermined state, indicating exclusivity of the two interpretation. normal or corrected eyesight. They gave informed consent to participate to the experiments. 4 Exclusivity of bistable percepts Bistability implies the spontaneous alternances of two distinct perceptual interpretation of a given physical stimulus. All listeners reported spontaneous alternances between the one-stream and two-streams percepts during listening to the 4-minutes long auditory stimulus. This was also the case for the visual stimulus, as expected [2]. The overall duration spent in each percept has been calculated for the first part of the experiment, the Attend task where subjects were simply instructed to pay close attention to the stimuli. Figure shows that, in the auditory case, the time spent in the grouped or split percepts (one or two streams) were similar, and that very little time was spent in the undetermined perceptual state. Undetermined responses accounted for less than 3% of total stimulus presentation time, even though this was the default response when stimulus presentation began. Results are similar in the visual modality, with less than.5% time spent in the undetermined state and an equivalent time spent in grouped and split percepts. The same analysis performed on the tasks with specific intentions showed a ever lesser proportion of time spent in the undetermined state (not shown). With the stimuli parameters that we chose and with our group of observers, both modalities thus show one basic feature of perceptual bistability: spontaneous alternances between percepts are observed and the two percepts are mutually exclusive, as indicated by the negligible amount of time spent in the undetermined state. # occurences 5 a b 2 4 6 Norm. percept duration Duration percept N+ (s) c d Duration percept N (s) Figure 2: Distribution percepts durations. (a,b): The duration of subjective percepts in the auditory (black) and visual (gray) modality are presented for all subjects, normalized by the average percept duration for each given subject. There is no significant difference between the normalised distributions for the two modalities, and both can be fitted by a log-normal model (see text for details). (c,d): The log-duration of a percept is independent from the log-duration of the previous percept, in both modalities. Note that the durations of alternances for the auditory modality are usually longer, so there are fewer of them 5 Randomness of durations The distribution of percepts durations during visual bistability has been shown to follow a random law that can be fitted with a gamma distribution or a log-normal distribution []. Statistical independance is expected between successive percepts, which is a first indication that bistability is not simply the result of sensory fatigue as fatigue would be expected to carry over to the next percept [2]. The distributions of percept durations for the auditory and visual modalities are illustrated in Figure 2. Panels a and b display the histograms of percept durations normalized by the mean duration for each subject. Both distributions are skewed toward longer durations as is expected for gamma or log-normal distributions. A Kolmogorov- Smirnov test indicated that the two normalized distributions, for auditory and visual stimulations, were not significantly different one from each other (p >.). In order to test whether the durations followed a lognormal distribution, we transformed the data onto a log scale and performed an analysis of variance. For this analysis, we pooled together the results for the three different tasks on a given modality in order to increase the power of the statistical test. The analysis was performed with subjects as a random factor, task and percepts as
Forum Acusticum 25 Budapest Proportion grouped.75.5.25 Attend Group Split Auditory Visual Figure 3: Effect of volitional control on auditory and visual bistability. Means across subjects and 95% confidence intervals. Subjects could influence the amount of time spent in a given perceptual state according to their intention. The effect is visible in both modalities with a significantly stronger magnitude in the auditory modality. fixed factors, and interactions where taken into account. The residuals of the model were not statistically different from a normal distribution (Kolmogorov-Smirnov test, p>.2for auditory and p>.for visual). This indicates that the percepts durations distributions were indeed log-normal. The statistical independence of the duration of successive percepts can be estimated by displaying a scatterplot of a given percept as a function of the duration of the previous percept, for all stimulus presentations. This is shown in Figure 2, panels c and d. No correlation is visible between successive percepts duration. These scatterplots also show the tendency of auditory percepts to last longer, but this trend could be due to the particular stimuli parameter that we chose. The data obtained with our group of observers in the visual modality are consistent with previous reports, using the same stimulus [2]. The auditory modality shows a similar, random temporal dynamics, where percept durations can be fit with a log-normal distribution and display no correlation between successive percepts. 6 Inevitability of alternances We investigated the influence of volitional control on the temporal dynamics of bistability. We asked subjects to try and maintain a given perceptual interpretation, for both auditory stimuli and visual stimuli. The influence of intention on the time spent on each percept is shown in Figure 3, panel a. For both modalities, volitional control had a significant effect (all tests of significance performed with post-hoc comparisons in the ANOVA model). In the auditory modality, when subjects tried to hear one-stream, the proportion of one-stream increased as compared to the Attend task. When they tried to hear two-streams, the proportion of one-stream decreased. The same pattern of responses was observed in the visual modality. The effect of volitional control was however significantly stronger for the auditory modality than for the visual modality. The effect of intention shows that bistable alternances are indeed inevitable in the two modalities : alternances persist in the presence of intention. 7 Conclusion The temporal dynamics of auditory streaming have characteristics very similar to those found in visual bistability. The percepts of one-stream vs two-streams are mutually exclusive, their duration follow a log-normal distribution with short-term independance, and volitional control influences the alternances but does not abolish them completely. Thus, we would propose that auditory streaming is a case of perceptual bistability. When measured in the same group of subjects, we found strong similarities between auditory and visual bistability (measured with plaids). The normalized distributions of percept durations were identical in the two modalities. Volitional control produced the same pattern of effects in the two modalities. However, differences were also found, as the magnitude of the effect of volitional control was stronger in the auditory case. When comparing the effect of volitional control between different forms of visual bistability, Meng and Tong [] argued that volitional control should have the same effect if there were a unique, central brain mechanism responsible for the alternances. Our results thus point to the idea that similar perceptual organisation mechanisms are responsible for bistability in the auditory and in the visual modality, but also that at least a subset of these mechanisms may be implemented independently in the different sensory modalities. References [] D.A. Leopold and N.K. Logothetis, Multistable phenomena: changing views in perception. Trends in Cognitive Sciences, Vol. 3. pp. 254-263 (999) [2] J.M. Hupé and N. Rubin, The dynamics of bi-stable alternation in ambiguous motion displays: a fresh look at plaids. Vision Research, Vol. 43. pp. 53-548 (23) [3] G.M. Long and T.C. Toppino, Enduring interest in perceptual ambiguity: alternating views of reversible figures. Psychological Bulletin, Vol. 3. pp. 748-768 (24)
Forum Acusticum 25 Budapest [4] R. Warren and R. Gregory, An auditory analogue of the visual reversible figure., American Journal of Psychology, Vol. 7. pp. 62-63 (958) [5] M. Sato et al., Multistable representation of speech forms: a functional MRI study of verbal transformations, NeuroImage, Vol. 23. pp. 43-5. (24) [6] L.P.A.S. Van Noorden, Temporal Coherence in the Perception of Tone Sequences. Eindhoven University of Technology, doctoral dissertation., (975) [7] A.S. Bregman, Auditory scene analysis, Cambridge, MA: MIT Press. (99) [8] R. Cusack. The Intraparietal Sulcus and Perceptual Organization, Journal of Cognitive Neuroscience, Vol. 7. pp. 64-65 (25) [9] A. Gutschalk et al., Neuromagnetic Correlates of Streaming in Human Auditory Cortex, Journal of Neuroscience, Vol. 25. pp. 5382-5388. (25) [] Y.H. Zhou et al., Perceptual dominance time distributions in multistable visual perception, Biological Cybernetics, Vol. 9. pp. 256-263. (24) [] M. Meng and F. Tong, Can attention selectively bias bistable perception? Differences between binocular rivalry and ambiguous figures Journal of Vision, Vol. 4. pp. 539-55 (24)