Categorization of complex visual images by rhesus monkeys. Part 2: single-cell study

European Journal of Neuroscience, Vol. 11, pp. 1239 1255, 1999 European Neuroscience Association Categorization of complex visual images by rhesus monkeys. Part 2: single-cell study Rufin Vogels Laboratorium voor Neuro- en Psychofysiologie, KULeuven, Campus Gasthuisberg, Herestraat, B-3000 Leuven, Belgium Keywords: inferior temporal cortex, invariances, macaques, object recognition Abstract In order to investigate the neural coding of ordinate-level visual categories, single-cell recordings were made in the anterior temporal cortex of two rhesus monkeys performing a categorization of colour images of trees versus images of other objects. Neurons showed a high average degree of selectivity for these complex colour images. Although most neurons responded to trees and non-trees, about a quarter responded in a category-specific manner, e.g. to trees but not non-trees, and about one-tenth responded almost exclusively to exemplars of the trained category. The responses of these neurons were largely invariant for stimulus transformations, e.g. changes in position or size, and decreased with the degree of image scrambling, mimicking the behavioural results. However, the responses of single neurons were insufficiently stimulus invariant to accommodate the entire range of variability present in the features of exemplars within the same category. This strong within-category selectivity challenges the idea that a prototype is represented at the single neuron level, but suggests that ordinate-level categorization is based on a population of neurons, each selective for a limited set of exemplars. Introduction The retinal images of objects in the visual world are highly variable due to changes in illumination and the relative position of the object with respect to the observer s eye. Despite such variations in the retinal image of a particular object, primates are nonetheless able to identify that object. Furthermore, primates can express the same response to different objects belonging to the same behaviourally defined category. For instance, humans can use the same label for physically different objects belonging to the same ordinate-level category (Rosch et al., 1976). Also, as shown in the companion paper, rhesus monkeys can be trained to generalize from a learned set of complex images to novel, physically dissimilar images. Images of exemplars of an ordinate class will differ more than the various retinal images of a single object. Indeed, exemplars of natural categories show considerable variation in stimulus features, e.g. different trees differ in shape, textures (branch patterns, leaves) and colour. Furthermore, there is considerable overlap in the presence or absence of simple stimulus features for exemplars of different categories (e.g. some trees can have the same colour as some automobiles). Thus, categorization of complex natural images shows considerable invariance to changes in simple features and is based on the use of at least a combination of single low-level features (see Vogels, 1999). How is this categorization achieved at the neural level? Because several lines of evidence highlight the importance of temporal cortical areas for object recognition and identification (Ungerleider & Mishkin, 1982; Logothetis & Scheinberg, 1996), it is likely that this part of the cortex is involved in the categorization of complex visual images. Neurons of the inferior temporal (IT) cortex, the adjoining cortex of Correspondence: R. Vogels, as above. E-mail: Rufin.Vogels@med.kuleuven.ac.be Received 1 May 1998, revised 22 September 1998, accepted 4 November 1998 the superior temporal sulcus (STS) and perirhinal cortex, show considerable invariance in their stimulus selectivities with regard to variations of the position, size and defining cue of the stimulus (for reviews, see Vogels & Orban, 1996; Tanaka, 1996), which may underlie the invariance of behavioural responses to these stimulus transformations. During the course of the present experiments, Logothetis & Pauls (1995) reported that IT neurons respond differentially to different views of the same object, suggesting that these neurons code for a single or at least limited set of views of a single object. The neurons showing these view-dependent responses were highly selective for the similar-looking wireframe stimuli, thus showing stimulus selectivity at the subordinate level. However, it is possible that other neurons show less selectivity for subordinate differences between exemplars of an ordinate-level category, and respond more strongly to all exemplars of a particular category (e.g. trees) than to exemplars of other categories (non-trees). This then corresponds to a coding of a category of objects at the single neuronal level and would imply that these neurons respond similarly to those feature combinations characterizing the different exemplars of the category. On the other hand, it is possible that neurons show considerable selectivity for exemplars of the same category and that an ordinate category is coded by the activity of a set of neurons each of which responds to a limited set of exemplars. These two possibilities relate to two models of the categorization process, as formulated by cognitive psychologists (for reviews, see Smith & Medin, 1981; Komatsu, 1992) and students of animal behaviour (see Roberts & Mazmanian, 1988; Herrnstein, 1990; Mackintosh, 1995; Huber & Lenz, 1996): prototype and exemplarbased models. A prototype model assumes that the subject abstracts the central tendency of the different exemplars of the category (e.g. the average tree) during learning and that categorization is based on the resemblance of the stimulus to that abstracted prototype representation. A single-unit implementation of prototype coding

1240 R. Vogels would consist of units that respond to all exemplars of a category, i.e. little within-category stimulus selectivity, but respond less or not at all to exemplars of other categories, i.e. high category specificity. The responses to exemplars of the same category do not need to be all-or-none, but can be graded, e.g. depending on the typicality of the exemplar as a member of the category. Exemplar-based models state that a set of exemplars of a category are represented, and that categorization is then based on the resemblance of the stimulus to one of these exemplars. Such an exemplar-based coding can be implemented by having a population of different units each of which respond to a limited number of exemplars of a category. In such a scheme, each neuron will respond to a small set of exemplars, i.e. those exemplars sharing the particular feature (combinations) that the neuron is selective for. The category will then be represented implicitly by the activity in this population of neurons, with each neuron showing within-category selectivity. The present study addresses the neural coding of categories by measuring the selectivity of single anterior temporal neurons for images that are categorized by the animals, and examining the withincategory and category specificity of the stimulus selectivity of the neurons. The single-cell recordings were made for the two monkeys whose training and behavioural performance are documented in the companion paper (Vogels, 1999). The monkeys categorized the same tree and non-tree images as those used in that behavioural study during the single-unit recordings. Although a limited number of neurons was recorded using the fish and non-fish images of Vogels (1999), we will only report the results of the tree categorization, as both overall performance level and generalization from old to novel exemplars was better for the tree than the fish categorization. Indeed, it is possible that performance in the fish categorization task was partially based on rote learning of individual exemplars (see Vogels, 1999). In order to correlate categorization performance and single-unit responses, neurons and performance were also measured simultaneously for transformed and scrambled stimuli, using probe tests as described in the companion paper. The recordings were made in the anterior part of the IT, lower bank and fundus of the STS, and parts of the perirhinal cortex. Because the perirhinal and STS receives input from the more posterior and lateral parts of IT (Seltzer & Pandya, 1978, 1989; Baizer et al., 1991; Seltzer & Pandya, 1994; Suzuki & Amaral, 1994; Saleem & Tanaka, 1996), the recordings included higher-order cortices which may have more complex response properties than those of more posterior and lateral IT regions. Some portions of the results have been published in abstract form (Vogels, 1996). Materials and methods Subjects and surgery The monkeys Kees and Plato, from the accompanying behavioural study (Vogels, 1999), served as subjects. Subsequent to that behavioural study, a recording chamber was implanted on the anterior dorsolateral part of the skull, allowing a dorsal approach to the anterior temporal cortex. The positioning of the recording chamber was guided by magnetic resonance anatomical images that were made before surgery. Implantation of the chamber was performed using aseptic surgical techniques and under deep anaesthesia (achieved with Nembutal, 30 mg/kg, after sedation with Ketamine, 10 mg/kg). Recordings were started after a 2-week recovery time. The animals were already used to head fixation (see Vogels, 1999). The recordings were made on a daily basis, 5 days a week. The monkeys were water deprived for the previous 20 h, but received dry food ad libitum, supplemented by fruits as necessary. The monkeys were allowed to work until satiated (usually 3 4 h). At the end of the recording sessions, several penetrations were made in the cortex of Plato with metal reference wires at selected positions in the recording chamber. Subsequently, the monkey was killed with an overdose of Nembutal and perfused with fixative. The brain tissue was cut and sections were stained with Cresyl violet. Reconstruction of the recorded area was accomplished by identifying the tracks of the reference wires and recent electrode penetrations. The depth of the neuron with respect to the pattern of white and grey matter transitions, as noted during the recordings and visualized on the stained sections, as well as with respect to the ventral part of the skull, were used to assign the neurons to the cortical convexity or the STS. The microdrive depth readings of the skull were obtained during the recording sessions. After finishing the recordings reported in this paper, Kees participated in other experiments, and subsequent recordings were made with tungsten and stainless steel electrodes, coated with the fluorescent dye carbocyanine (DiI), at selected guiding tube positions. By passing an anodal current (µ 15 µa for µ 30 s) through the stainless steel electrodes, we made electrolytic lesions and iron deposits at locations with visually responsive neurons at depths similar to those recorded in the present study. Subsequently, penetrations with metal reference wires were made at several positions with different depths, after which the animal was killed with an overdose of barbiturates and perfused with a solution of formaldehyde (10%) and potassium ferrocyanide (3%) to stain the iron deposits (Prussian blue reaction). The brain was cut (60 µ sections) and the sections were examined with a fluorescence microscope to locate the DiI-stained penetrations. Sections were then stained with Cresyl violet and recording positions were reconstructed using the locations of (i) the tracks of the reference wires, (ii) identified DiI-stained and other recent penetrations, and (iii) the microlesions, and (iv) the depth readings of the neurons and location with respect to the pattern of grey/white matter transitions as observed during the recordings. The procedures conformed to the guidelines established by NIH for the care and use of laboratory animals. Apparatus The apparatus is identical to that used in the behavioural study (Vogels, 1999). In short, eye movements were measured using the scleral search coil technique and sampled (sampling rate 200 Hz) by a PC. The same PC controlled the experiment, stored the behavioural data and controlled juice delivery. The stimuli were displayed on a 21-inch Phillips computer screen (refresh rate: 74 Hz) by a second PC, connected to the first one. Single-cell recordings were made with commercial parylene and kapton-coated tungsten electrodes (World Precision Instruments; impedance 1 2 MΩ). The electrode was lowered into the brain within a stainless steel 23G guiding tube. This tube was guided through a second tube that was positioned above the dura and fixed to a Narishige hydraulic microdrive. Single spikes were isolated from the amplified and filtered electrode signal with a Real Time Waveform Discriminator (SPS-8701). The times of spike occurrence, stimulus and behavioural events were stored and displayed by a third PC. Also, the average response strength for each stimulus was computed and displayed on-line. Stimuli The same stimuli as those used in the behavioural study (Vogels, 1999) were used during recording. These 410 stimuli were colour

Visual categorization and IT 1241 images of trees or other artificial and natural objects. The average luminance (21 c/m 2 ) and size (6.8 8.5 ) of the different stimuli were matched for the trees and non-trees (see table 1 in Vogels, submitted). Stimuli were presented centrally during fixation of a red or white spot. In some tests, images were transformed or scrambled, which was performed on-line during the intertrial interval preceding stimulus presentation. The following five stimulus transformations were presented: twofold size increase, twofold size decrease, 90 clockwise planar rotation, position shifted 6.5 down on the vertical meridian, and an achromatic grey-level transformation. Scrambling was accomplished by randomly repositioning rectangular image sections, as explained in Vogels (1999). The relative position of the scrambled parts of a given image was identical in every trial in which a given scrambled stimulus was shown. Thus, the same stimulus was used in the repeated trials of a particular scrambled image, allowing averaging of the neuron s responses in those trials. Task and experimental protocols The categorization task as described by Vogels (1999) was used: the monkeys were trained to categorize the stimuli by making a saccadic eye movement to one of two target spots, which were presented simultaneously with the stimulus. The subject must make a saccadic eye movement to the right-hand or left-hand target spot upon presentation of a tree or non-tree image, respectively. Correct responses were immediately rewarded. Trials in which the monkey broke off fixation before or during stimulus presentation without making a saccade to a target spot were aborted and discarded for further analysis. The stimulus and target spots were turned off as the eye trace entered the target window, or, in the case of aborted trials, 100 ms after leaving the fixation window. If the monkey had not left the fixation window within 2500 ms after stimulus onset, an extremely rare event, the stimulus was turned off and the intertrial interval started. The intertrial interval was 2 s. The procedure differed in two ways from that used in the categorization training (Vogels, 1999). Firstly, no time-out was imposed after incorrect trials, and secondly, trials in which the reaction time was less than 150 ms (13 trials or 0.02% of all trials with saccade to a target spot) were rewarded if they happened to be correct, but were treated as aborted trials and discarded for further analysis. Standard test For each daily recording session, 60 stimuli were randomly selected from the stimulus set, with the restriction that half of the stimuli were tree images while the remaining 30 were non-tree images. This random stimulus sampling procedure implies that cells of a particular penetration were tested with identical stimuli, while cells of different penetrations were tested with only partially overlapping stimulus sets. The rationale for not testing all neurons with the same stimuli but instead using different stimulus sets in the different sessions was to reduce the number of exposures to the same stimuli, thus limiting the learning of individual exemplars (rote learning, see Vogels, 1999). The monkey performed the categorization task while we searched for responsive neurons. Once a responsive neuron was encountered, at least four unaborted trials of each of the 60 stimuli (median 4; first quartile, 4; third quartile, 5) were presented in an interleaved fashion. It was found that the responses of the neurons to these stimuli were extremely consistent from trial to trial, so that this number of trials was sufficient to measure the stimulus selectivity reliably. Stimulus transformation test After completing the standard test, we selected four stimuli, two exemplars of the category (i.e. trees) and two exemplars not belonging to that category (i.e. two non-trees). These four stimuli were selected on the basis of the on-line computed response strength of the neuron for all 60 test stimuli, and always included the exemplar eliciting the largest response and an exemplar for which the response was negligible. The five transformed images of each of these four exemplars were presented interleaved with the four original, nontransformed images and 16 other, randomly chosen, exemplars (half trees, half non-trees). The latter images were included in order to prevent any response bias. Thus, 40 different stimuli were presented, 20 of which were transformed images. The transformed images were presented in probe trials (Vogels, 1999), with each response, whether correct or not, being rewarded so that we were able to measure the spontaneous categorization of these transformed stimuli. For the untransformed images, only correct responses were rewarded. At least four, usually five, blocks of the 40 trials were run for a given neuron. Image scrambling test The procedure for this test was basically the same as for the stimulus transformation test. Four stimuli were selected and scrambled to one of five different degrees (number of scrambled image parts ranging from 4 to 1024; see Vogels, 1999). The five scrambled versions of each of the four images were shown interleaved with the four original, unscrambled images and 16 other, randomly chosen exemplars. The scrambled images were presented in probe trials. The neuron was tested with at least four, usually five, blocks of the 40 images. Data analysis Quantification and statistical tests of responses Spikes were counted in two windows of the same duration, one immediately preceding stimulus onset, and a second one starting 50 ms after stimulus onset (stimulus time window). Net responses were calculated by trial-wise subtraction of the activity within the prestimulus time window from that within the stimulus window. The time windows were identical for all trials of a test for a particular neuron. However, the duration of the window for a particular test depended on the shortest behavioural reaction time in that test and thus varied among neurons. Thus, for each test of a given neuron, the shortest saccadic response latency was determined and this value was then used to limit the upper bound of the stimulus time window of that test. This upper bound was defined as the largest multiple of 10 that is less than the measured saccadic latency (i.e. if the shortest saccadic latency observed in a particular test was 186 ms, then the upper bound of the time window was 180 ms). The upper bounds of the stimulus time window varied from 150 ms to 200 ms among neurons. The median upper bound for Plato s neurons (170 ms: first quartile, 160 ms; third quartile, 170 ms) was shorter than that of the other monkey (190 ms: first quartile, 180 ms; third quartile, 200 ms). These upper bounds, defined using the shortest saccadic latency found in a test of a particular neuron, were about 50 ms shorter than the average saccadic latency in each animal (see Results and Fig. 1). The median duration of the time window was 120 ms and 140 ms, respectively, in Plato and Kees. This small difference in the length of the time window reflects the difference in average reaction times between the two monkeys (see Results and Fig. 1). Using the thus-defined upper bounds of the stimulus time window guarantees that changes in visual stimulation, arising from the saccade, could not contaminate the spike counts. Indeed, the activity of the cell can only be affected by saccade-induced visual stimulation at a

1242 R. Vogels variability in the two samples was also very similar (SDs of 2.72 and 2.93 for ssl and lsl trials, respectively). This suggests little if no contamination of the spike counts from effects of saccade-related extra-retinal factors. Analysis of variance (ANOVA; split-plot design; Kirk, 1968) was used to test the significance of the responses to any stimulus by comparing the spike counts in the time windows before and after stimulus onset. All neurons reported in this paper showed a significant main effect from this response variable (P 0.05). FIG. 1. Distribution of saccadic latencies for Plato and Kees in the trials of the neuronal tests. The reaction times are plotted for tree (solid line) and nontree images (stippled line) separately. The number of trials was 24 236 (half tree, half non-tree images) and 35 140 in Plato and Kees, respectively. time no shorter than the saccadic latency plus the visual response latency of the neuron (above 70 ms in these neurons, see Results). The effects of extra-retinal, saccade-related factors, e.g. saccade programming, could in principle contaminate the spike count. In order to determine a possible contribution of such extra-retinal influences, the following analysis was carried out on the net responses calculated using the time windows defined above. If presaccadic extra-retinal factors affected the response of the neurons, one would expect differences in the average responses between trials in which the saccade occurred at or just after the upper bound of the stimulus time window and trials in which the saccade for the same image occurred much later. Thus, we selected, if possible, two trials for a given neuron, a short saccadic latency (ssl) trial and a long saccadic latency (lsl) trial. In a ssl trial, the saccadic latency had to be shorter than the upper bound of the stimulus time window plus 20 ms, while in the lsl trial, the saccade latency had to be at least 50 ms longer than in the ssl trial. Such trial pairs were available for 61 neurons, using the following selection criteria: (i) the net number of spikes in one of the two trials should be at least three; (ii) if the saccadic latency in two or more trials of an image differed by at least 50 ms from the ssl of that image, then the trial with the longest latency was chosen as the lsl trial; and (iii) if a ssl occurred for more than one image of a test, obeying the above criteria, the trial pair with the shortest ssl was selected. For the 61 trial pairs thus selected, the median difference in saccadic latency between ssl and lsl was 76 ms. The mean net number of spikes, averaged for the 61 neurons, was 4.08 and 3.80 spikes for the ssl and lsl trials, respectively, a difference not statistically significant [paired t-test; t(60) 0.51; NS], and the Stimulus selectivity measures The degree of stimulus selectivity of a neuron was quantified in two ways. A first measure of the degree of stimulus selectivity consisted of the number of stimuli (NSt; maximal 60) for which the response was at least one-third of the maximal response of the neuron. We did not simply use the number of stimuli which elicited a statistically significant response, because: (i) the latter depends on the number of trials in which each stimulus is tested in a given neuron; and (ii) we found that in some cases responses to some stimuli were statistically significant but negligible when compared to the maximal response of the cell. Conceptually, one can compare this first measure of stimulus selectivity to bandwidth indices used to describe the tuning width for continuous stimulus dimensions, for example orientation, e.g. width at half-height. The second quantitative measure of stimulus selectivity is the Sparseness index introduced by Rolls & Treves (1990). This index was used in the present study to compare our results to those obtained in studies of face-selective IT neurons (Rolls & Tovee, 1995). It is a measure of the proportion of effective stimuli based on the response to each of the 60 stimuli and is independent of the number of stimuli used (insofar as these stimuli are a representative sample of all possible stimuli). The sparseness is related to the length of the tail of the distribution of net firing rates for the different stimuli (Treves & Rolls, 1991). A low value indicates that there is a long tail to the distribution, equivalent to only a few images with high responses. The Sparseness index, using n stimuli, is computed as follows: Sparseness [Σ i 1,n (R i /n)] 2 /[Σ i 1,n (R i 2 /n)] where Ri is the net response to stimulus i (with negative net responses clipped to 0, as described by Rolls & Tovee, 1995). The Sparseness Index can range from 0.017, corresponding to responses larger than 0 for one stimulus only (high stimulus selectivity), to 1.00, indicating equal responses for all stimuli (no selectivity). Category specificity index In order to measure the degree of category specificity, an index T was computed for each of the neurons. This index was defined as the proportion of tree stimuli the neuron responded to relative to the total number of stimuli to which the neuron responded. Again, we used one-third of the maximal response of the neuron as the criterion for responsiveness. Thus, if a neuron responded with at least one-third of its maximal response to 10 stimuli, and eight of those stimuli were trees, than the T index for that particular neuron would be 80. Transformation test analysis For each neuron tested, the net response to the best non-tree was subtracted from the response to the best tree image and then divided by the sum of the two responses. This stimulus selectivity index was computed for each type of stimulus transformation of these two images. For each stimulus transformation, we computed the Spearman rank correlation between the stimulus selectivity indices for the

Visual categorization and IT 1243 untransformed and transformed images for all the neurons tested, and this correlation was taken as a measure of the invariance of the stimulus selectivity for that kind of stimulus transformation. Results The monkeys performed very well during the recordings: the proportion of correct responses, averaged over all recording sessions, was 99% (SD 4%) and 97% (SD 5%) for Kees and Plato, respectively. None of the images was consistently categorized into the wrong class, as each stimulus was correctly categorized with a score of at least 50% with only three stimuli (one tree image and two non-tree images), and eight stimuli (seven tree images and one non-tree image) having a correct proportion of less than 80% in Kees and Plato, respectively. Note that a consistent misclassification would yield a proportion of correct response of 0% (e.g. a tree image consistently categorized as a non-tree will give a proportion of correct responses of 0%, rather than 50%, for that stimulus). Distributions of the saccadic response latencies of all trials of the neuronal tests are shown in Fig. 1 for each monkey, and tree and non-tree images separately. The reaction times were shorter than those observed during the categorization training (see table 3 in Vogels, 1999), which reflects the more extensive practice of the task. As during the categorization training (see Vogels, 1999), the average reaction time was shorter in Plato (mean: 234 ms) than in Kees (mean: 248 ms). In both monkeys, the distribution of the reaction times for the tree and non-tree images differed significantly (twotailed t-test; P 0.001 in each monkey). The difference in average reaction time for tree (mean: 255 ms) and non-tree images (mean: 259 ms) was small for Kees, but statistically significant given the very large number of observations. For the other monkey, reaction times were on average longer for the tree (mean: 240 ms) than the non-tree images (mean: 228 ms), fitting the difference in reaction times observed during the categorization training of this monkey (Vogels, 1999). In this animal, the distribution of the reaction times is broader for the tree than the non-tree images. The present report is based on the analysis of 219 (90 and 129 in Plato and Kees, respectively) single temporal cortical neurons that responded significantly to any of the images of the tree stimulus set. Figure 2 shows the range of recording sites for the two monkeys. The recording sites of Plato were superimposed upon a lateral view of a standard rhesus monkey brain using the coronal sections of the standard brain and those of this animal. The temporal sulcal pattern of Plato was similar to that of the standard brain map. The recording sites of Kees are shown on a drawing of his actual brain. Based on a comparison of the recovered tracks and their microdrive position readings, the error in electrode position was estimated to be less than 1.5 mm. More anterior sites were explored in Plato than in Kees, while in the latter animal, recordings were also made in the posterior part of visual area TE. The recordings were obtained from the STS and temporal cortex ventral to the STS. Drawings of actual frontal sections of each of the monkeys at different anterior/posterior levels (mm anterior to the auditory meatus) are shown in Fig. 2, the tracks indicating the medial/lateral range explored at these levels. Twelve per cent of the neurons recorded in Plato were in the ventral part of the temporal pole, while all other neurons were recorded at least 3 mm more posteriorly (Table 1). These posterior penetrations explored the fundus and medial part of the lower bank of the STS, and the ventral part of the inferior temporal cortex, with the most medial recordings located in the perirhinal cortex. The most anterior recordings in Kees (Table 2) overlapped the region of the posterior recordings of Plato (Fig. 2). As for Plato, most neurons in Kees were located in the fundus and lower bank of the STS, although the possibility cannot be excluded that a few neurons were in the upper bank of the STS. The other cells were recorded in the ventral part of IT lateral to or in the anterior middle temporal sulcus (AMTS). The most ventral medial positions in Kees, with A P levels larger than 17, encroached upon the perirhinal cortex. Stimulus selectivity Figure 3 shows the distribution of the NSt and Sparseness indices, summarizing the image selectivity of these neurons. The median NSt of the population was 10 (first quartile, 5; third quartile, 23), indicating that on average only 17% of the images evoked a response stronger than one-third of the maximal response of the neuron. Also note that 17% of the neurons were extremely selective, responding to less than four of the 60 images. In our sample of neurons, the Sparseness indices ranged from 0.02 to 0.96 with a median of 0.34 (0.20 0.59; Fig. 3). There was excellent correlation between the Sparseness index and NSt, as demonstrated by the Spearman rank correlation of 0.94 (n 219; P 0.0001). The degree of stimulus selectivity depended on the anterior posterior level of the penetrations (Tables 1 and 2). For this analysis, the neurons of adjacent guiding tube positions were classified into four and five anterior posterior groups for Plato and Kees, respectively, corresponding to the sections illustrated in Fig. 2. In Kees, the degree of Sparseness decreased significantly with increasing anterior posterior level (Kees: F 4,122 6.6; P 0.0001), as did the NSt (Kees: F 4,122 6.6; P 0.0001). The Sparseness was significantly different (Mann Whitney U-test; P 0.05) between neurons of the STS and IT cortex ventral to this sulcus at only a single anterior posterior level in this same animal (Table 2). Selectivity at these levels was greater for neurons in the ventral cortical convexity. For the NSt measure, none of the differences between neurons of the STS and ventral IT was significant, although the same trend was present as with the Sparseness index. None of these effects concerning the anatomical distribution of the neurons was significant in the other animal, although a trend towards an increase in stimulus selectivity is also apparent in this monkey (Table 1). It should be noted that in both animals the observed degree of selectivity encompassed a wide range at most recording positions, implying that the above-described differences between the various parts of IT are average trends. Degree of category specificity In order to relate the stimulus selectivity to the learned tree category, we determined whether the neurons differentiate exemplars of this category from exemplars of other categories. To express this quantitatively, the category specificity index T was computed for each neuron. Overall, 71% of the neurons had a T index between 10 and 90 (Fig. 4), indicating responses to exemplars belonging to different categories. An example of such a neuron that responded to trees as well as to non-trees (T 54) is shown in Fig. 5A. For this and related figures, we ranked the 60 stimuli according to their net response, indicating the tree and non-tree stimuli by closed and open bars, respectively. Seventeen per cent of the neurons responded almost exclusively to non-trees (T 10; Fig. 4). An example of such a neuron (T 0) is shown in Fig. 5C. More importantly, T 90 for 12% of the neurons, indicating a strong preference for tree exemplars (category specific). Examples of such category-specific neurons are shown in Figs 6 8. Thus, although most neurons responded to exemplars belonging to different categories, about a quarter of the neurons responded differentially to exemplars of the trained category versus other stimuli,

1244 R. Vogels FIG. 2. Recording sites in Plato and Kees. The anterior posterior ranges of the recording sites are indicated on the drawings of the lateral view of the brain. Coronal sections at the levels indicated in the lateral brain drawings show tracks of electrodes and reference wires. The most medial and anterior recordings were in the rhinal cortex, others were located in the ventral part of TE, lower bank and fundus of the STS. (a) rhinal sulcus; (b) AMTS; (c) STS; (d) lateral sulcus. TABLE 1. Stimulus selectivity at different anterior posterior levels for Plato A/P level 16 18 20 24 STS n 16 18 18 Sparseness 0.42 (0.25 0.69) 0.49 (0.21 0.59) 0.32 (0.26 0.56) NSt 12 (7 26) 17 (7 21) 8 (4 18) Convexity n 6 5 16 Sparseness 0.33 (0.16 0.53) 0.63 (0.20 0.67) 0.29 (0.10 0.35) NSt 10 (3 19) 22 (9 27) 8 (4 12) Total n 22 23 34 11 Sparesness 0.40 (0.25 0.71) 0.49 (0.31 0.63) 0.32 (0.21 0.50) 0.32 (0.12 0.54) NSt 12 (7 27) 17 (9 29) 8 (4 16) 10 (4 22) Number (n), median Sparseness index and median NSt of neurons recorded in the STS and cortical convexity at four posterior anterior levels (mm anterior to the Horsley Clarke 0). The first and third quartiles are in parentheses. with one-tenth of neurons strongly preferring tree over non-tree exemplars. Fifteen pairs of neurons were recorded within 100 µm of one another, based on the microdrive readings. These 30 cells were highly stimulus selective, with a median NSt and Sparseness index of 11 (first quartile, 3; third quartile, 18) and 0.32 (0.19 0.52), respectively. In order to examine whether neighbouring neurons have similar categorical specificity, we computed the Spearman Rank correlation for the T indices of the 15 neuron pairs. The resulting correlation coefficient of 0.05 was not significantly different from 0. The lack of correlation does not appear to be due to small variations in the T indices, because the quartile ranges were 67 and 83 for the first and second member, respectively. These results suggest that the categorical specificities of neighbouring neurons are unrelated. Degree of within-category selectivity The monkeys gave the same response to exemplars of the learned category. Thus, the question arises whether a similar stimulus equivalence is present at the single-cell level: do some single neurons respond to all tree exemplars? It must be obvious that this question is meaningful only for those neurons that are stimulus selective. Therefore, we analysed the selectivity for the tree exemplars for those

Visual categorization and IT 1245 TABLE 2. Stimulus selectivity at different anterior posterior levels for Kees A/P level 12 14.5 17 19.5 22 STS n 10 5 12 17 30 Sparseness 0.26 (0.12 0.42) 0.83 (0.63 0.84) 0.49* (0.39 0.79) 0.29 (0.16 0.55) 0.22 (0.13 0.34) NSt 8 (2 19) 43 (21 45) 20 (8 39) 10 (3 19) 4 (2 11) Convexity n 19 5 11 8 10 Sparseness 0.60 (0.35 0.76) 0.34 (0.9 0.38) 0.24* (0.18 0.59) 0.20 (0.16 0.33) 0.21 (0.7 0.14) NSt 26 (9 37) 10 (2 14) 9 (6 13) 7 (3 11) 4 (1 8) Total n 29 10 23 25 40 Sparseness 0.38 (0.21 0.72) 0.68 (0.27 0.79) 0.42 (0.24 0.71) 0.28 (0.16 0.41) 0.21 (0.12 0.33) NSt 17 (6 33) 26 (8 40) 11 (8 33) 8 (3 17) 4 (3 11) Number (n), median Sparseness index and median NSt of neurons recorded in the STS and cortical convexity at five posterior anterior levels (mm anterior to Horsley Clarke 0). The first and third quartiles are indicated in parentheses. *SD between STS and convexity (Mann Whitney U-test, P 0.05). FIG. 4. Distribution of the T-index (n 219), expressing the proportion of trees among the stimuli to which a neuron responded. FIG. 3. Distributions of stimulus selectivity. (A) Distribution of the Sparseness index. (B) Distribution of the number of stimuli to which the neuron responded with at least one-third of its maximal responses (NSt). Minimum, 1; maximum, 60. Note the non-linear abscissa. The number of neurons in (A) and (B) is 219. neurons responding to 10 non-trees or less, reasoning that other neurons are not sufficiently stimulus selective and/or are activated by too many non-trees to participate in the representation of the tree images. Also, neurons that responded to none of the tree images were excluded. It was found that for each of the 124 neurons (57% of the total number of neurons) selected in this way, there was at least one tree image for which the net response was zero or below zero. In fact, for these 124 neurons, the median number of trees for which the net response was smaller than 0.5 spikes/s was 13 (first quartile, 7; third quartile, 18), indicating that the average neuron did not respond to 43% of the trees tested. This demonstrates that these neurons show a strong within-category selectivity, i.e. are responsive to only a limited subset of a learned category. The median Sparseness index for the tree images alone was 0.31 (minimum, 0.05; maximum, 0.83) for this group of neurons. Overall, the number of tree images to which a neuron responded with at least one-third of its response correlated (Spearman rank correlation, R 0.45; P 0.0001, n 219) with the number of nontree images to which it responded (Fig. 9). Note that neurons responding to many trees (broad within-category selectivity) also respond to many non-trees (small category specificity), implying that the degree of within-category selectivity correlates with the degree of category specificity. Thus, neurons responding to a few or none of the non-tree images (T 90), which are indicated by filled circles in Fig. 9, all show within-category selectivity. Category-specific neurons Although each of the category-specific neurons showed withincategory selectivity, i.e. responding only to a subset of the tree images, the activation of these neurons can signal the presence of a member of a category. Given their potential importance for categoriz-

1246 R. Vogels FIG. 5. Examples of selectivity in single anterior temporal neurons. (A) Stimulus-selective neuron responding to trees as well non-trees. Sparseness index 0.33. The 60 images are ranked according to the net response. Filled and open bars indicate responses to exemplars of the trained category (trees) and to other exemplars (non-trees), respectively. The SEs of the responses to the first 15 ranked stimuli are indicated by the filled diamonds. (B) Grey-level reproductions of the five best images of neurons shown in (A). (C) Stimulus-selective neuron responding to non-trees. Sparseness index 0.16. Same conventions as in (A). (D) The five non-tree images producing the best responses of neuron from (C).

Visual categorization and IT 1247 FIG. 6. Example of a category-specific neuron with a large within-category selectivity. (A) Net responses for the 60 images (tree, filled; non-tree, open bars). Same conventions as in Fig. 5. Sparseness index 0.09. (B) Black-and-white reproductions of the four best tree images and best non-tree image. (C) Effect of scrambling the tree image 1 shown in (B) upon the net response. (D) Net response for the different transformation of tree image 1. ST, standard, untransformed image; LA, size increase; SM, size decrease; PO, position shifted; RO, rotated image; AC, achromatic image. ation, the properties of these neurons will be described in more detail in the next sections. Anatomical distribution Eight category-specific neurons were found in Plato between anterior posterior levels 18 24 anterior, with three of them in the STS. The other 18 category-specific neurons were recorded in Kees and had anterior posterior positions ranging from 12 to 22 anterior with 12 of them in the STS. Thirteen (72%) of the category-specific neurons in Kees were found at A P coordinate 19 or higher, which, however, was not significantly larger (χ 2, NS) than expected given that 51% of the neurons recorded in this monkey were located in this part of the cortex. Stimulus selectivity The selectivities of these category-specific neurons were examined in more detail in an attempt to determine the image features necessary

1248 R. Vogels FIG. 7. Example of a highly selective, category-specific neuron. (A) Reproductions of the five highest ranked stimuli. (B) Net responses for the 60 images (tree, green; non-tree, red). The numbers at the abscissa indicate the images of (A) and (C). Same conventions as in Fig. 5. Sparseness index 0.07. (C) Reproductions of the five lowest ranked tree images. (D) Effect of scrambling of the tree image 1 in (A) upon the net response. (E) Net response for the different transformation of tree image 1. ST, standard, untransformed image; LA, size increase; SM, size decrease; PO, position shifted; RO, rotated image; AC, achromatic image. and sufficient for activation. This was performed by comparing the responses to the different tree and non-tree images and, if available, its response in the stimulus transformation and scrambling test. Figure 6 shows a sharply tuned tree-selective neuron, the response of which was also strongly affected by scrambling (Fig. 6C) but not by eliminating colour from its preferred image (Fig. 6D). The orientation

Visual categorization and IT 1249 FIG. 8. Example of a category-specific neuron responding to many tree images. (A) Reproductions of the five highest ranked tree stimuli. (B) Net responses for the 60 images (tree, filled; non-tree, open). The numbers at the abscissa indicate the images of (A), (C) and (D). Same conventions as in Fig. 5. Sparseness index 0.37. (C) Reproductions of tree images with rank 6 10. (D) Reproductions of the five lowest ranked tree images. The original images were in colour. of the image was critical, as shown by the strong effect of image rotation. For three other neurons, shape selectivity could explain the differences in responses to the various images. Figure 7 illustrates the responses of another strongly stimulusselective neuron. Scrambling reduced the response sharply, even in the least scrambled (2 2) image, suggesting that the neuron is sensitive for form information. Rotating the image did not affect the response of the cell, while all other transformations did so significantly (F 5,24 2.67; P 0.05). Note the threefold reduction in response for the grey-level image, suggesting that colour contributes to the

1250 R. Vogels FIG. 9. Correlation of the number of tree images and the number of non-tree images to which a neuron responds with at least one-third of its maximal response. The plotted data points were allowed to deviate 0.5 units from the observed, integer, data points so that all the data points are visible. The filled symbols indicate the category-specific neurons. selectivity of this neuron. Such combinations of shape and colour features were critical for five other category-specific neurons as well. A neuron showing less stimulus selectivity is presented in Fig. 8. It responded best to trees with a large area of foliage on a whitish background. Other neurons (eight) responded to a wide variety of shapes, suggesting, by exclusion of shape features, the contribution of texture cues. These examples illustrate the within-category selectivity of the category-specific neurons, and also the difficulty in determining the critical features they respond to. Indeed, for some category-specific neurons it was impossible to determine their critical features from their responses to the 60 images. Also, other investigators have noted that for some inferior temporal neurons it is very difficult to find a feature common to the complex images to which the neuron responds (Desimone et al., 1984; Mikami et al., 1994). This analysis suggests that the neurons do not respond to trees as such, but to shape, colour and texture features, or combinations of these features that happen to be present in the tree image. Because different exemplars of the same category can vary greatly in the presence of particular features, these neurons will show withincategory selectivity. Time course of responses If these neurons contribute to the categorization, their categoryselective responses should occur before the behavioural response. Given our choice of a time window which ends at or before the shortest behavioural reaction time, it is evident that the categoryspecific response occurred before the behavioural response. Nonetheless, in order to determine in more detail the time course of the neuronal responses relative to stimulus onset and occurrence of the behavioural response, population histograms for the 26 categoryspecific neurons were computed as follows. For each of these neurons, we computed a peri-event time histogram for the tree images (average activity for 30 tree images) and one for the non-tree images (average activity for 30 non-tree images). These two histograms of each neuron were then normalized with respect to the maximal number of spikes in any one of the 10-ms bins of the two histograms, and the normalized histograms of the 26 neurons were averaged. Two kinds of population histograms, triggered on stimulus onset (Fig. 10B) and on the beginning of the saccade (Fig. 10A), were computed. The population histograms show that the striking difference in average response for tree and non-tree images is already present at the onset of the neuronal response (Fig. 10A) and well before the behavioural response (Fig. 10B). In a second analysis, population histograms of the activity for the best and worst tree, i.e. the tree image eliciting the largest and smallest response in a neuron, respectively, were computed. Thus, for each category-specific neuron, histograms, triggered either on the stimulus or on the saccade were computed for the best and worst tree image, and the histograms of the 26 neurons were then averaged after normalization. Only correct trials (98% of all trials) were used. As shown in Fig. 10C, the within-category stimulus selectivity is extremely strong because the population response to the worst tree image is virtually absent. The within-category selectivity is also present in the period after the short time windows used to calculate the net response (see Materials and methods), i.e. 200 ms after stimulus onset, and thus is not the result of missing late responses to the worst images. The same holds for the category specificity (Fig. 10A). The population histogram using the saccade as trigger (Fig. 10D) clearly shows that the neural activity in these categoryspecific neurons strongly depends on which tree image was presented, given the same behavioural response (only correct trials were averaged). If the strong responses to the tree images merely reflected programming or execution of the saccade, one would have expected a response in the worst image condition too, as the same behavioural response was emitted (i.e. a rightwards saccade) to those images as to the best images. Comparison of behavioural categorization and neural responses Spontaneous categorization performance and single neuron responses were measured simultaneously for scrambled and transformed images, allowing a comparison of the neuronal responses to the behavioural categorization responses. The results of such a comparison will be described first for the category-specific neurons tested with the scrambled images. The effect of stimulus transformations will be reported in two parts, first for the category-specific neurons, then for all neurons tested with the transformed images. Indeed, given our finding that temporal cortical neurons are insufficiently stimulus invariant to respond similarly to all exemplars of the same category, it will be of interest to determine whether the neurons tolerate the tested transformations of a single exemplar. Effect of image scrambling The responses of eight category-specific neurons were recorded in the scrambling test. The average normalized response of these eight neurons for the best tree and non-tree image is shown in Fig. 11A as a function of the degree of scrambling. The average response of the neurons decreases significantly (ANOVA on net responses; F 5,35 12.17; P 0.0001) with increasing degrees of image scrambling. The fourfold scrambling is sufficient to reduce the average response by 56%, and the response at the largest degree of scrambling was a mere 10% of the average response to the unscrambled shape, which is similar to the average response to the unscrambled non-trees. Figure 10B shows the proportion of correct responses in the behavioural categorization of these same images. There is a large degradation in performance once the tree image is scrambled, but one which falls off more steeply than the neuronal responses. However, it should be noted that scrambling showed an effect in some neurons (e.g. those of Fig. 6) as strong as that obtained behaviourally. Effect of stimulus transformations: category-specific cells Because single-cell as well as behavioural responses were measured simultaneously in the stimulus transformation test, one can compare

Visual categorization and IT 1251 FIG. 10. Population histograms of the 26 category-specific neurons. The response of each neuron was normalized before computation of the population histograms. (A) Population PSTHs comparing the activity for the tree images (solid line) and non-tree images (stippled line). For each neuron, the responses to all the tested tree images and all the tested non-tree images were averaged before normalization. The leftmost vertical line indicates stimulus onset (0 ms on abscissa), while the right vertical line shows the occurrence of the mean saccadic response, averaged for tree and non-tree images. The mean reaction time in these trials was 250 and 249 ms for the tree and non-tree images, respectively. (B) Population histogram of activity for tree images (solid line) and non-tree images (stippled line), triggered on the occurrence of the saccadic response (vertical line, 0 ms on abscissa). (C) Population PSTHs comparing the activity for the best (solid line) and worst tree image (stippled line). The leftmost vertical line indicates stimulus onset, while the right vertical line shows the occurrence of the mean saccadic response. The mean reaction time in these trials was 250 and 255 ms for the best and worst tree images, respectively. (D) Same as (C), except triggered by the occurrence of the saccadic response (vertical line). The difference in the level of spontaneous activity between the histograms of (A) and (C) is due to the averaging of the responses to different images in the case of the tree and non-tree conditions (A) before the normalization. Because the neurons did not respond to all tree images, their responses averaged over these images will be less, relative to the spontaneous activity level, than when the response to the best image only is averaged (C). the degree of behavioural invariance for these stimulus transformations to the degree of neuronal invariance for the same transformations. The mean normalized response of the eight category-specific neurons for which stimulus transformation data are available is shown in Fig. 12A. Normalization was performed with respect to the net response to the untransformed tree for each neuron. On average, the neurons still responded significantly to the transformed images, and a comparison of the responses to the exemplars of the trained category (open bars in Fig. 12A) with the responses to other exemplars (filled bars) shows that, on average, the neurons are capable of signalling whether a tree image was presented, even if that image has been transformed. The behavioural results showed that categorization was more or less invariant for changes in size and orientation, or for the presence or absence of colour. However, changing the stimulus position strongly affected the categorization performance. Indeed, the proportions of correct responses for the position-shifted stimuli were not significantly different from chance (50%). Effect of stimulus transformations: all cells tested A total of 43 neurons was tested for the effect of the different stimulus transformations, the results of which are shown in Table 3 and Fig. 13. Figure 13 plots, for each of the stimulus transformations, the distribution of responses normalized for each neuron with respect to the response to the normal stimulus, and Table 3 shows the median of these distributions. As for the category-specific cells, this larger population of neurons shows, at least on average, considerable invariance in their response to these stimulus transformations, with position and colour transformations having the largest effects. Table 3 also shows the Spearman Rank correlation coefficients of the stimulus selectivities of the 43 neurons for transformed and untransformed images (see Materials and methods). A high correlation indicates that the degree of selectivity and stimulus preference of the neurons is similar for transformed and untransformed images, i.e. that stimulus selectivity is transformation invariant, while different selectivities would result in low correlation coefficients. The observed correlations were all statistically different from 0, and ranged between 0.70 and 0.91. The position and colour transformations yielded the lowest correlation coefficients, indicating that the stimulus selectivity can depend on the position and colour content of the image. Table 3 also shows the proportion of correct categorizations of these same images, obtained during the recordings. These behavioural results are

1252 R. Vogels FIG. 11. Effect of image scrambling: category-specific neurons and behavioural response. (A) Mean normalized response of eight category-specific neurons for scrambled tree (solid line) and non-tree images (stippled line). Normalization was performed by setting the response for the unscrambled tree image to 1 for each neuron. (B) Proportion of correct responses for the tree and non-tree images, the neural responses to which are shown in (B). Error bars indicate SEs of the mean in (A) and (B). very similar to those obtained in the smaller (sub)sample of the category-specific neurons (Fig. 12). These results show that, as a population, the neurons show sufficient stimulus transformation invariances to contribute to categorization behaviour which is, for some transformations, invariant. Discussion Neurons in the anterior temporal cortex responded selectively to the complex stimuli being categorized during the recordings. Some of these neurons responded in a category-specific way, and were activated almost exclusively by exemplars of the trained category. However, these category-specific neurons as well as the other neurons were selective for exemplars of the same trained category. Thus, the responses of single temporal neurons are insufficiently invariant to accommodate the wide range of variation present in the features of different exemplars of a given category, and thus do not explicitly represent a category prototype. The neurons were stimulus selective before the occurrence of the behavioural response and, as a population, showed selectivity for transformed images which the monkey could categorize behaviourally. Thus, although individual temporal neurons do not represent all possible members of a trained category, a population of such neurons can contribute to the categorization using an exemplar-based code. Before discussing how this can be accomplished, we will compare the present results on the stimulus FIG. 12. Stimulus transformation invariance of selectivity of category-specific neurons. (A) Averaged normalized responses to the best tree (open bars) and non-tree image (filled bars). The normalized responses were averaged for eight category-specific neurons. The responses were normalized with respect to the net response to the untransformed tree image. SEs of the mean are indicated. (B) Average proportion of correct behavioural categorizations for the same images. The chance level is 50%. Same conventions as in (A). ST, standard, untransformed image; LA, size increase; SM, size decrease; PO, position shifted; RO, rotated image; AC, achromatic image. TABLE 3. Results of stimulus transformation invariance test Resp R % LA 0.88 0.90 83 SM 0.82 0.84 76 PO 0.74 0.73 55 RO 0.91 0.91 94 AC 0.79 0.70 94 Medians of response (Resp) normalized with respect to the response to the untransformed image (n 43). R, Spearman rank correlation coefficient of stimulus selectivities (all P 0.0001). %, Proportion of correct behavioural categorizations. LA, larger image; SM, smaller image; PO, position shifted; RO, 90 rotated image; AC, achromatic image. selectivity of temporal neurons to those of previous studies in the same areas. Stimulus selectivity The results from both animals combined represent a relatively wide range of anterior posterior positions which were explored in the temporal cortex. The most anterior recordings were obtained from the ventral part of the temporal pole, which is part of the perirhinal cortex (Suzuki & Amaral, 1994). Neurons of this anterior perirhinal

Visual categorization and IT 1253 cortex is concerned with further processing of visual, object-related, information (Nakamura & Kubota, 1996). The present results indicate that neurons of this higher-order cortex show strong selectivity for exemplars of the same, trained category, and thus do not represent categories explicitly. Other neurons in the present study were located in the STS, which receives input from the lateral part of IT (Seltzer & Pandya, 1978, 1989; Baizer et al., 1991; Seltzer & Pandya, 1994), and thus also represents a further stage of visual processing. The presence of a high within-category selectivity for neurons in these higher order areas suggests that single STS neurons do not represent single, trained ordinate-level categories. Rolls & Tovee (1995) measured the responses of 14 STS neurons to 68 achromatic images. The mean Sparseness index reported in their study, 0.33 (SD 0.22) tends to be lower than the average Sparseness index found in our sample of neurons located at similar, posterior, recording positions in the STS ( 0.40). This difference could be due to the small sample size of the Rolls and Tovee study, but more likely reflects differences in the selection of neurons. Indeed, Rolls and Tovee s sample consists of neurons selective for faces, which implies strong stimulus selectivity. Rolls & Tovee (1995) found a mean Sparseness index of 0.60 for different face stimuli in their sample of face-selective neurons, which is much larger than the average Sparseness of 0.40 for the tree images for the category-specific neurons of the present study. This suggests that the within-category selectivity is greater for the trained, nonsocial, tree category than for the face category. However, differences between face selectivity and the selectivity for exemplars of nonsocial categories may only be a matter of degree, as Rolls and Tovee reported that typical face neurons show selectivity for different faces. FIG. 13. Distribution of neuronal responses to transformed images (n 43). The response of each neuron was normalized with respect to its response to the untransformed image, a value above 1 indicating a larger response to the transformed image compared to the untransformed image. The data are for the image producing the best response. (A) Size increase. (B) Size decrease. (C) Position shifted. (D) Rotated image. (E) Achromatic image. region as well as others located somewhat more posteriorly showed a relatively high degree of stimulus selectivity. This fits a similar observation by Nakamura et al. (1994) who also used complex colour images as stimuli. In agreement with the present study, the same group also reported that the stimulus selectivity of anterior temporal neurons was greater than those of more posteriorly located neurons of the STS (Mikami et al., 1994). The high stimulus selectivity together with anatomical studies showing that the anterior part of the temporal cortex receives input from posterior IT (Suzuki & Amaral, 1994; Saleem & Tanaka, 1996) suggest that the anterior temporal Within-category selectivity and categorization The present results suggest that a single temporal neuron is insufficiently stimulus invariant to respond to the wide range of features present in different exemplars of a learned, natural category. Most neurons respond to exemplars of different categories, probably because these exemplars share the single or particular combinations of features the neuron prefers. The apparent category specificity of some neurons is then due to the responses to complex features that were not present in the sample of non-trees tested. It is possible that the selectivity of some of these category-specific neurons is the result of the categorization training, whereby neurons become more selective for complex features common in tree exemplars but uncommon in non-tree exemplars. Our finding of exemplar-specific responses might be the result of the animals not using a single prototype for the categorization. The behavioural results during the categorization training (Vogels, 1999) indicate that the categorization of the tree stimulus was not merely based upon learning of each of the individual exemplars, but that some integration of the different stimuli was taking place. However, as extensively discussed in human categorization literature (see Smith & Medin, 1981 for review), this does not imply that the categorization is based on a single prototype. Indeed, it is possible that the monkeys learned more than one prototype and based their categorization on the similarity of the stimulus to one of these subprototypes. Furthermore, during later training the monkey was exposed repeatedly to several of the exemplars, allowing learning of some of the exemplars. Thus, it could be that the subject categorized the stimuli according to their resemblance to a set of learned exemplars. The same exemplar-based categorization is possible in most real-world cases of categorization, as most individuals are repeatedly exposed to some exemplars, allowing learning of the individual exemplars. The finding of the within-category selectivity agrees well with

1254 R. Vogels such multiple prototype or exemplar-based views (see Komatsu, 1992 for review), and suggests that the categorization behaviour is based on the activity of units each responding to a limited number of exemplars, with any given single unit allowing a certain degree of stimulus generalization. This is comparable to the suggested distributed, view-dependent representation of subordinate objects (Logothetis & Pauls, 1995): IT neurons only respond to a restricted set of views of an object, although behaviourally one can learn to respond in a view-invariant way. It should be noted here that the consistent non-tree responses to the novel scrambled stimuli during the recording sessions indicate that both animals were still categorizing as such, as random choices to these novel exemplars would have otherwise occurred. Also, we tested one animal (Kees) with novel tree and novel non-tree stimuli, intermixed with old stimuli, after the recording sessions, and these new stimuli were categorized correctly by this animal, indicating that he was not simply responding to a set of learned stimuli. Correlation of neuronal responses and spontaneous categorization behaviour Qualitatively, behavioural performance matched neuronal responses in the scrambling test, as both decreased as a function of the degree of scrambling. Yet, the decline in behavioural performance was steeper than for the corresponding average neuronal responses, with behavioural responses more strongly affected than the neuronal responses for tree images with the smallest degree of scrambling. However, one would expect some neurons not responding to the unscrambled tree images to be activated by novel features in scrambled images, and that these neurons would signal a stimulus other than a tree image. Thus, scrambling the image would reduce the response of neurons responsive to the unscrambled exemplars of a category and at the same time increase the response of other neurons responsive to features not present in the exemplars of that category. The contributions to the categorization decision by the neurons responding specifically to these scrambled images would cause a stronger behavioural scrambling effect than expected from mere reduced responses in neurons preferring the unscrambled tree images. The effect of the stimulus transformations on the neuronal stimulus selectivity did not correlate perfectly with that observed behaviourally, e.g. changes in stimulus position had a much stronger effect on behavioural categorization than on the neuronal responses. However, the neurons responses showed a considerable degree of invariance for those stimulus transformations that had little or no effect on the behavioural categorization, i.e. changes in size, colour or orientation. Indeed, we found, in agreement with previous studies (for review see Tanaka, 1996; Vogels & Orban, 1996), that the stimulus selectivity of temporal cortical neurons is largely invariant for changes in the stimulus size, although its response strength can be modulated by changes in size. On average, a 90 degree image rotation had little effect on the response or stimulus selectivity, in agreement with Miyashita & Chang (1988). However, image rotation showed a strong effect in some neurons (e.g. Fig. 6), as observed in other studies of the shape selectivity of neurons (Tanaka et al., 1991; Logothetis & Pauls, 1995). Eliminating colour had one of the largest effects of the various stimulus transformations, in agreement with the presence of colour selectivity in temporal neurons (Desimone et al., 1984; Tanaka et al., 1991; Komatsu et al., 1992). However, many neurons were invariant for the achromatic transformation and, on average, the stimulus preference was largely invariant for the colour-to-grey-level transformation, indicating that, as a population, these neurons can signal the presence of the exemplars when these are presented achromatically. Thus, the results of the transformation test are in agreement with the hypothesis that these neurons contribute to the categorization task. This invariance for the stimulus transformations can be contrasted with the strong variations in response to different exemplars of the trained category. This is likely to be due to the fact that different exemplars of the same category can differ greatly in their stimulus properties, while if size, position or colour of a particular image is changed, some stimulus properties (e.g. shape) will remain invariant. Neural coding of categories Although the activity of single temporal neurons is insufficient to code for natural categories of objects, a population of such neurons can provide sufficient information to categorize these complex images. Indeed, preliminary results of a modelling study suggest that the information provided by the neurons of the present study is sufficient to explain the monkey s categorization behaviour. A neural network, consisting of input units that have stimulus selectivities identical to the biological neurons observed in the present study, learned the tree categorization and generalized from learned to novel images. The present results are in agreement with a population coding model of categories in which a category is represented by the activity patterns of a population of neurons, each responsive to distinct but overlapping sets of exemplars. The categorization of a novel stimulus will be based on: (i) the similarity of its population activity profile to the population activity elicited by other exemplars of a learned category; and (ii) the dissimilarity of its population activity profile to the activity produced by exemplars of other categories. Learning processes can then determine which units are most useful for betweencategory discrimination and within-category generalization, allowing abstraction of the category. Furthermore, learning can fine-tune the feature selectivity of the units so that between-category discrimination and within-category generalization is enhanced. The latter process will increase the incidence of category-specific units. The information from the different exemplar-selective units needs to be linked to a single behavioural response, or, when generalizing to human beings, to units involved in lexical retrieval (Damasio et al., 1996). One possible locus for the integration of the information from temporal neurons selective for different exemplars of a given category is the striatum. Indeed, it has been shown recently that single IT neurons project to different rod-like modules of the striatum, each of which is innervated by a large number of axons of IT neurons (Cheng et al., 1997), allowing integration of the outputs of multiple IT neurons. Acknowledgements The assistance of Dr W. Spileers with the eye surgery and A. Coeman with the daily training of the monkeys is kindly acknowledged. A. Coeman, G. Van Parrijs and G. Meulemans assisted with figure preparation, and P. Kayenbergh provided technical support. M. De Paep maintained all the software and hardware. Dr L. Arckens help with the fluorescent tracer microscopy is greatly acknowledged. I also thank Drs G.A. Orban, W. VanDuffel, A. Rosier and S. Raiguel for critical reading and discussions of this material. This research was supported by the Geneeskundige Stichting Koningin Elizabeth and G.0712.96. Abbreviations AMTS, anterior middle temporal sulcus; IT, inferior temporal cortex; lsl, long saccadic latency; NSt, number of effective stimuli; ssl, short saccadic latency; STS, superior temporal sulcus; T, proportion of effective tree stimuli References Baizer, J.S., Ungerleider, L.G. & Desimone, R. (1991) Organization of visual inputs to the inferior temporal and posterior parietal cortex in macaques. J. Neurosci., 11, 168 190.

Visual categorization and IT 1255 Cheng, K., Saleem, K.S. & Tanaka, K. (1997) Organization of corticostriatal and corticoamygdalar projections arising from the anterior inferotemporal area TE of the macaque monkey: a Phaseolus vulgaris Leucoagglutinin study. J. Neurosci., 17, 7902 7925. Damasio, H., Grabowski, T.J., Tranel, D., Hichwa, R.D. & Damasio, A.R. (1996) A neural basis for lexical retrieval. Nature, 380, 499 505. Desimone, R., Albright, T.D., Gross, C.G. & Bruce, C. (1984) J. Neurosci., 4, 2051 2062. Herrnstein, R.J. (1990) Levels of categorization. In Edelman, G.M., Gall, W.E. & Cowan, W.M. (eds), Signal and Sense: Local and Global Order in Perceptual Maps. Wiley-Liss, New York, USA, pp. 385. Huber, L. & Lenz, R. (1996) Categorization of prototypical stimulus classes by pigeons. Quart. J. Exp. Psychol., 49B, 111 133. Kirk, R. (1968) Experimental Design Procedures for the Behavioral Sciences. Brooks Cole, Belmont, USA. Komatsu, L.K. (1992) Recent views of conceptual structure. Psychol. Bull., 112, 500 526. Komatsu, H., Ideura, Y., Kaji, S. & Yamane, S. (1992) Color selectivity of neurons in the inferior temporal cortex of the awake macaque monkey. J. Neurosci., 12, 408 424. Logothetis, N.K. & Pauls, J. (1995) Psychophysical and physiological evidence for viewer-centered object representations in the primate. Cerebr. Cortex, 3, 270 288. Logothetis, N.K. & Scheinberg, D.L. (1996) Visual object recognition. Ann. Rev. Neurosci., 19, 577 621. Mackintosh, N.J. (1995) Categorization by people and pigeons: the twentysecond Bartlett memorial lecture. Quart. J. Exp. Psychol., 4B, 193 214. Mikami, A., Nakamura, K. & Kubota, K. (1994) Neuronal responses to photographs in the superior temporal sulcus of the rhesus monkey. Behav. Brain Res., 60, 1 13. Miyashita, Y. & Chang, H.S. (1988) Neuronal correlate of pictorial short-term memory in the primate temporal cortex. Nature, 331, 68 70. Nakamura, K. & Kubota, K. (1996) The primate temporal pole: its putative role in object recognition and memory. Behav. Brain Res., 77, 53 77. Nakamura, K., Matsumoto, K., Mikami, A. & Kubota, K. (1994) Visual response properties of single neurons in the temporal pole of behaving monkeys. J. Neurophysiol., 71, 1206 1221. Roberts, W.A. & Mazmanian, D.S. (1988) Concept learning at different levels of abstraction by pigeons, monkeys and people. J. Exp. Psychol. Anim. Behav. Proc., 14, 247 260. Rolls, E.T. & Tovee, M.J. (1995) Sparseness of the neuronal representation of stimuli in the primate temporal visual cortex. J. Neurophysiol., 73, 713 726. Rolls, E.T. & Treves, A. (1990) The relative advantages of sparse versus distributed encoding for associative neuronal networks in the brain. Network, 1, 407 412. Rosch, E., Mervis, C.B., Gray, W.D., Johnson, D.M. & Boyes-Braem, P. (1976) Basic objects in natural categories. Cognit. Psychol., 8, 382 439. Saleem, K.S. & Tanaka, K. (1996) Divergent projections from the anterior inferotemporal area TE to the perirhinal and entorhinal cortices in the macaque monkey. J. Neurosci., 16, 4757 4775. Seltzer, B. & Pandya, D.N. (1978) Afferent cortical connections and architectonics of the superior temporal sulcus ans surrounding cortex in the rhesus monkey. Brain Res., 149, 1 24. Seltzer, B. & Pandya, D.N. (1989) Intrinsic connections and architectonics of the superior temporal sulcus in the rhesus monkey. J. Comp. Neurol., 290, 451 471. Seltzer, B. & Pandya, D.N. (1994) Parietal, temporal, and occipital projections to cortex of the superior temporal sulcus in the rhesus monkey: a retrograde tracer study. J. Comp. Neurol., 343, 445 463. Smith, E.E. & Medin, D.L. (1981) Categories and Concepts. Harvard University Press, Cambridge, MA, USA. Suzuki, W.A. & Amaral, D.G. (1994) Perirhinal and parahippocampal cortices of the macaque monkey: cortical afferents. J. Comp. Neurol., 350, 497 533. Tanaka, K. (1996) Inferotemporal cortex and object vision. Annu. Rev. Neurosci., 19, 109 139. Tanaka, K., Saito, H., Fukuola, Y. & Moriya, M. (1991) Coding of visual images of objects in the inferotemporal cortex of the macaque monkey. J. Neurophysiol., 66, 170 189. Treves, A. & Rolls, E.T. (1991) What determines the capacity of autoassociative memories in the brain? Network, 2, 371 397. Ungerleider, L.G. & Mishkin, M. (1982) Two cortical visual systems. In Ingle, D.J. (ed.), Analysis of Visual Behavior. MIT Press, Cambridge, MA, USA, pp. 549 586. Vogels, R. (1996) Representation of natural visual categories in anterior temporal cortex. Soc. Neurosci. Abstr., 22, 1937. Vogels, R. & Orban, G.A. (1996) Coding of stimulus invariances by inferior temporal neurons. Prog. Brain Res., 112, 195 211. Vogels, R. (1999) Categorization of complex visual images by rhesus monkeys. Part 1: behavioural study. Eur. J. Neurosci., 11, 1223 1238.