Prosody Perception in Cochlear Implant Users: EEG evidence

Transcription

1 Department of Neurology, Hannover Medical School Center for Systemic Neuroscience, University of Veterinary Medicine Hannover Prosody Perception in Cochlear Implant Users: EEG evidence Thesis Submitted in partial fulfilment of the requirements for the degree DOCTOR OF PHILOSOPHY (PhD) awarded by the University of Veterinary Medicine Hannover by Deepashri Agrawal Akola, India Hannover, 2012

2

3 Supervisor: Supervison Group: 1st Evaluation: Prof. Dr. Reinhard Dengler Prof. Dr. Reinhard Dengler Prof. Dr. Andrej Kral Prof. Dr. Elke Zimmermann Prof. Dr. Stefan Debner Prof. Dr. Reinhard Dengler Department of Neurology, Hannover Medical School, Hannover Prof. Dr. Andrej Kral Institute for AudioNeurotechnology, Hannover Medical School, Hannover Prof. Dr. Elke Zimmermann Department of Zoology, University of Veterinary Medicine, Hannover Prof. Dr. Stefan Debner Department of Psychology, Carl-von Ossietzky Unvisersity, Oldenburg 2nd Evaluation: Prof. Dr. Thomas F. Münte Universitätsklinikum Schleswig-Holstein Klinik für Neurologie, Lübeck Date of final exam: 5/10/2012

4

5 Parts of the thesis have been published or submitted for publication previously in: Deepashri Agrawal, Lydia Timm, Filipa Campos Viola, Stefan Debener, Andreas Büchner, Reinhard Dengler, Matthias Wittfoth. ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies, BMC Neuroscience (In press) Agrawal D., Thorne J.D., Viola F.C., Debener S., Timm L., Büchner A., Dengler R., and Wittfoth M. Electrophysiological responses to emotional prosody perception: a comparison of cochlear implant users, NeuroImage Clinical (under review) Results of this thesis were presented in form of posters or presentations at the following conferences: Agrawal D., Timm L., Dengler R. and Wittfoth M. Emotions transfered by Cochlear implants: an ERP study exploring the possibilities. International conference of auditory implantable devices, Baltimore, USA (2012) Agrawal D., Timm L., Dengler R. and Wittfoth M. Emotions transfered by Cochlear implants: An EEG study. DGKN, Klön Germany (2012) Agrawal D., Timm L., Dengler R. and Wittfoth M. Emotion perception in CI: an ERP study. Human Brain Mapping, Quebec City, Canada (2011)

6

7 Contents 1 Introduction Emotions Cochlear Implants Advanced Combination Encoder (ACE) MP Electroencephalography (EEG) N 100 component P 200 component Late positive component Oscillatory activity EEG in the neuroscience of emotion Research questions Manuscript I Abstract References Manuscript II Summary Introduction Materials and methods Participants Stimuli Procedure ERP procedure and analysis... 33

8 ii Contents Data processing Statistical Analysis Results Accuracy data Reaction times Event-related potentials Time-frequency analysis Correlation between accuracy rate and gamma band activity Discussion Behavioural findings ERPs Time-frequency results References Overall Discussions Conclusion and future directions Summary 67 6 Zusammenfassung 69 7 References 71 8 Acknowledgement 77

9 List of Figures 1.1 Block diagram illustrating ACE Schematic illustration of MP Pitch contours of the three prosodies (Manuscript II) Accuracy rate Manuscript(II) Reaction time Manuscript(II) ERP waveforms for three emotional prosodies for NH controls, Ace users and MP3000 users Manuscript (II) Induce gamma power plots for ACE users and MP3000 users Manuscript(II)... 44

10

11 List of Tables 3.1 Demographic data of CI users (Manuscript II) Acoustic parameters of emotional sentences (Manuscript II) Peak N100 and P 200 mean latency as well as amplitude with standard deviations in parenthesis (Manuscript II)

12

13 Chapter 1 Introduction 1.1 Emotions According to Bharata, (NS VI. 31), in Sanskrit- "NAHI RASADATE KASCIDAPYARTHAH PRAVARTATE". (No meaning can proceed from speech in the absence of emotions) Emotions are needed for the expression of feelings and they can be conveyed in many ways, including words, facial expressions, gestures and through body language. Such actions can provide important information about an individual s emotional state. Speech features fall into two categories: segmental and suprasegmental. In brief, segmental speech concerns the structure of speech and suprasegmental speech concerns the melody or intonation, or prosody of speech. Kochanski and Shih (2002) listed four functions of prosody: (i) to convey lexical meaning (e.g. in tonal languages such as Chinese-Mandarin), (ii) to convey non-lexical information through intonation (questions versus declarative sentences), (iii) to relay discourse functions (new information in a discourse is often accented while old information is de-accented) and, most importantly, (iv) to express emotions (e.g. excitement is expressed by means of high pitch and fast speed), (KOCHANSKI u. SHIH 2002). Impairment in the identification of emotional prosody has a negative influence on social and emotional functioning and leads to poor interpretation of the emotional states of others. The importance of emotional prosody has been evidenced in studies

14 2 Chapter 1. Introduction of populations in which prosody is underdeveloped or disturbed, such as in individuals with Parkinson s disease (SCHRODER et al. 2006; PAULMANN et al. 2008), schizophrenia (BOZIKAS et al. 2006), autism (PAUL et al. 2005), or basal ganglia damage (VAN LANCKER SIDTIS et al. 2006). Due to their inability to perceive subtle changes in acoustic features, hearing impaired individuals are disadvantaged when trying to understand the affective states of others. Some profoundly hearing impaired individuals are able to perceive acoustic changes that occur in the frequency, time and intensity components of the speech signals, whereas others are only able to perceive changes in the time and intensity components (ERBER 1972; MOST u. SHURGI 1993). For this latter group of individuals, a cochlear implant has the potential to restore some degree of hearing, as well as to aid speech perception. 1.2 Cochlear Implants Clark (2003) describes the modern cochlear implant (CI) as being: A bionic ear, which restores useful hearing in severely to profoundly deaf people when the organ of hearing situated in the inner ear (cochlea) has not developed or is destroyed by disease or injury. This device bypasses the inner ear and provides information to the hearing centres through direct stimulation of the hearing nerve. The benefits of hearing impaired individuals receiving CIs are quite remarkable; most are able to hear speech and even conduct ordinary conversations over the telephone. The implant system consists of a microphone, a speech processor, a transmitter, a receiver and an electrode array, which is located inside the cochlea. The speech processor is responsible for decomposing the input audio signal into different frequency

15 1.2. Cochlear Implants 3 bands or channels while at the same time delivering the most appropriate stimulation pattern to the electrodes. CIs have been developed to compensate for speech loss, and they use algorithms that map sounds to electrode stimulation patterns in a speech coding strategy. Such algorithms are designed to support the segmental aspects of speech in order to maximise intelligibility, and as a result CIs are less supportive of the suprasegmental features of speech, which are important for prosodic perception. Thus, current speech processing strategies are inadequate in their delivery of prosodic information. Several speech coding strategies have been developed based on a speech model that extracts fundamental frequencies and formants from speech (e.g. F0/F1, F0/F1/F2 and MPEAK). In contrast, more recent strategies are based on hearing models in which information is passed through digital filter banks in order to generate stimulating pulsatile sequences. Examples of such speech encoding strategies include spectral peak (SPEAK), continuous interleaved sampling (CIS) and the advanced combination encoder (ACE). The fundamental aim of these strategies is to increase temporal resolution by concentrating on the most perceptually relevant spectral components and neglecting the least significant components. Although remarkable developments have been made in speech coding algorithms, they do not focus on prosody perception, and therefore much improvement is possible. The work presented in this thesis will concentrate on ACE and its new variant, the MP3000 speech coding strategy Advanced Combination Encoder (ACE) ACE operates by mapping the signal power spectrum to electrodes, where only N out of M (N < M) electrodes with the largest amplitude are activated. Figure 1.1 gives a schematic representation of the processing stages in the ACE speech coding strategy. Signals from the microphone (audio) are pre-emphasised to amplify the high

16 4 Chapter 1. Introduction Bandpass filters BPF 1 Envelope detection Audio Pre-emp BPF 2 Envelope detection Select Frame sequence and largest Mapping AGC amplitudes BPF M Envelope detection Sampling and selection Filter bank Figure 1.1: Block diagram illustrating ACE (Courtesy to Noguiera et al 2005) frequency components with the help of digital filters. After passing through adaptive gain control (AGC) to limit the distortion of loud sounds by selectively reducing the amplification, the AGC output is digitised and sent through a filter bank, where the spectral envelopes are then estimated. that was implanted inside the cochlea. Each filter bank represents each electrode Most systems have 22 electrodes, with the basal electrode corresponding to band 22 and the apical electrode corresponding to band 1. Finally, signal amplitudes are mapped to the corresponding electrodes and acoustic amplitudes are compressed into the dynamic range of the CI recipient, which is determined by the threshold of maximum comfortable loudness level for electrical stimulation. More specifically, out of the 22 bands, typically the 8 12 bands with largest envelope amplitudes are selected for stimulation in each cycle; this type of selection works well as it captures the perceptually relevant features of speech such as the formant peaks. In most cases, the maximum selection criterion performs spectral peak selection.

17 1.2. Cochlear Implants MP3000 MP3000 (also known as PACE, psychoacoustic ACE) is an ACE variant incorporating a psychoacoustic model (NOGUEIRA et al. 2005). The psychoacoustic masking model is used for the selection of the N bands out of the total (M) bands, presented schematically in Figure 1.2. In this strategy, a digitised signal sampled at 16 khz is sent through the filter bank. In contrast to AGC, the filter bank is implemented using a fast Fourier transform (FFT). Subsequently, the envelope is estimated for each spectral band of the audio signal based on the psychoacoustic masking model, in contrast to the peak-picking algorithm used in ACE. Thus, it selects not the largest amplitude envelope, but the bands that are most important in terms of hearing perception. Finally the selected bands are mapped onto the electrode array. In MP3000, the signal amplitudes that deviate most from the estimated masking thresholds are retained on the basis that signal amplitudes smaller than the masking threshold are not audible and can therefore be discarded. As this strategy has the advantage of utilising only the useful information, in this thesis it is hypothesised that this strategy is well-suited for speech perception as well as prosody perception. Several recent studies investigate prosody perception in CI users. A study on mood perception of CI users reported that intensity is the primary cue for prosody recognition, followed by fundamental frequency, followed by the spectral and voice characteristics (HOUSE 1994). Another study examined the effects of altering the fundamental frequency (F0) on the perception of prosody and speaker gender, in both normal-hearing individuals and CI users (MEISTER et al. 2009). The authors reported that CI users showed a poorer hearing performance, but showed a performance similar to normal hearing controls when asked to differentiate between a statement and a question, as well as when asked to identify whether they heard a male or female voice. Although CI users have the preference for using different acoustic cues in dif-

18 6 Chapter 1. Introduction Digital audio FFT Envelope detection Selection algorithm Mapping Frame sequence Filter bank Psychoacoustic model Sampling and selection Figure 1.2: Block diagram illustrating MP3000.(Courtesy of Noguiera et al. 2005) ferentiating emotions, they find recognition confusing, especially when acoustic cues representing emotions are similar. The studies comparing two speech coding strategies on prosody perception with objective measures such as EEG, are scarce. Hence, the goal of the work presented here is to explore prosody perception in CI users using objective measures. 1.3 Electroencephalography (EEG) A good understanding of the underlying processes of prosody perception is hindered by a lack of adequate techniques for the on-line measurement of psychological processes, and researchers often find it difficult to determine the cortical events underpinning a given task based on behavioural results. Electroencephalography (EEG), an objective means of measuring brain activity, can be used to better understand the basic neural mechanisms involved in the processing of affective prosody. EEG measures the bioelectric activity of the brain non-invasively via electrodes placed on the surface of the scalp, with a temporal resolution better than 1 ms and spatial resolution of 2.5 cm at the cortical surface. The most useful application of EEG is the event-related potential (ERP) technique, where ERPs represent

19 1.3. Electroencephalography (EEG) 7 transient changes in EEG voltage, reflecting systematic brain activity which, in turn, is triggered by an internal or external sensory stimulus or motor response (DE ZUBICARAY et al. 2006). Thus, ERPs are small voltage variations resulting from the brain s response to a presented stimulus and they can therefore be regarded as manifestations of specific psychological processes. ERP signals are small in amplitude in comparison to the EEG signals in which they are embedded, and thus must be discriminated from noise (background EEG). This is best achieved by averaging, where samples that are time-locked to the repeated occurrence of a particular event are averaged together depending on the signal-to-noise ratio. In this way non-time-locked potentials are greatly reduced, leaving only the ERPs. Various stimuli can be used to evoke ERPs, such as visual, auditory, motor, pain and electric pulse stimuli. ERPs evoked by external auditory stimuli are known as auditory evoked potentials (AEPs) and are recognised as positive and negative waves or peaks in the EEG signal following stimulus onset. These peaks are generally described in terms of their characteristic distribution over the scalp, their polarity and their latency (e.g. the P 200 is a positive peak occurring 200 ms after the onset of the stimulus). ERPs are comprised of both exogenous and endogenous components. Exogenous components are influenced by the physical features of a stimulus and are almost unaffected by changes in cognitive state (HILLYARD u. MUNTE 1984). In contrast, endogenous components are thought to reflect the cognitive state of the participants (DONCHIN u. HEFFLEY 1979; DESMEDT u. DEBECKER 1979). However, there are reports in the literature indicating that some components share characteristics of both groups, e.g. N 100, P 200 (SHIBASAKI u. MIYAZAKI 1992), depending on the stimulus properties. There are several components that are important in neuro-psychological research. The work of this thesis concentrates on the

20 8 Chapter 1. Introduction early components, namely the N100 and the P 200 and the late positivity (a positive component occurring between 500 ms and 1200 ms) N 100 component The N 100 component was originally investigated in a dichotic listening paradigm and is one of the most easily identified components, regardless of the specific analysis approach employed. In adults the N100 peaks between 80 and 120 ms after the onset of a stimulus, and is distributed mostly over the fronto-central region of the scalp (HILLYARD et al. 1973). Generally, this component is assumed to reflect the selective attention to basic stimulus characteristics and intentional discrimination processing (e.g.,vogel u. LUCK 2000). The latency and amplitude of the peak depends upon the stimulus modality. Auditory stimuli elicit a larger N 100 with shorter latency than visual stimuli (HUGDAHL et al. 1995). This is often described as the N100 P 200 or N1 P 2 complex in combination with the subsequent P 200 evoked potential P 200 component The P 200, like the N100, has long been considered to be an obligatory cortical potential because it has low inter-individual variability and high reliability (ROTH et al.1975; SANDMAN u. PATTERSON 2000). Functionally speaking, it has been shown that the P 200 component increases whenever participants are asked to attend to a particular stimulus characteristic, for example frequency, time or color, (HILLYARD u. MUNTE 1984), and is therefore often assumed to reflect selective attention processes. In auditory ERPs, the P 200 is known to be influenced by stimulus pitch (PANTEV et al. 1996) and intensity (FJELL u. WALHOVD 2003). Several researchers have reported that P 200 is an index of the extraction of emotional salience from acoustic cues, whether or not they contain linguistic information (SAUTER u. EIMER 2010;

21 1.3. Electroencephalography (EEG) 9 PINHEIRO et al. 2011; LIU et al. 2012). The above mentioned characteristics make this component optimal for the investigation of prosody recognition in CI users Late positive component The late positive component or late positive complex (LPC) is a positive-going ERP occurring around 600 ms after stimulus onset. This component has two functionally distinct peaks: one is associated with memory processes and the other is related to language. Although both peaks have roughly similar topographies, they appear to come from different sources in the brain. In the past decade, researchers have identified that LPC is strongly modulated by the emotional intensity of a stimulus: emotional stimuli of either positive or negative valence elicit a larger (i.e., more positive) LPC than neutral stimuli (KEIL et al. 2002; HAJCAK et al. 2010). Due to its advantage of reflecting memory as well as emotional language processes, this component is well suited for the work of this thesis, which is to examine prosody recognition abilities Oscillatory activity Oscillatory cortical activities in the human brain do not form a homogeneous class of responses, and instead take on a diverse range of mechanisms with correspondingly diverse levels of significance. While the examination of ERPs has provided useful insights into the nature and timing of neuronal events that sub-serve perceptual and cognitive processes, little attention has been paid to the raw EEG data from which ERPs are derived. The EEG signal reflects neural oscillations and synchronisations, and the oscillations represent the mechanism of inter-neuronal communication and binding of information processed in distributed brain regions. These oscillations can be studied using time frequency analysis or spectral analysis, whereby the frequencies of the EEG signal are decomposed into amplitude and phase components, thus characteris-

22 10 Chapter 1. Introduction ing temporal changes (on a millisecond time scale) with respect to task events. This analysis reveals that EEG does not simply reflect random background noise, rather there are event-related changes in the magnitude and phase of EEG oscillations at specific frequencies that support their role in event processing (MAKEIG et al. 2004). Oscillatory responses to a sensory or cognitive event are usually classified according to the natural frequencies of the brain: delta, 0.5 to 3 Hz; theta, 3.5 to 7 Hz; alpha 8 to13 Hz; beta, 14 to 35 Hz and gamma, 35 to 70 Hz. Each frequency band can be associated with a specific cortical activity Gamma band activity Gamma-band activity refers to those oscillations that correspond to the higher frequency range of the temporal spectrum, typically above 35 Hz. This activity is distributed diffusely over the entire brain (reflecting parallel processing) and is thought to be crucial for mutual information transfer between networks in the brain (BASAR u. GUNTEKIN 2008). Thus, there may also be significance for emotional processing because sub-cortical limbic systems and networks must be connected to neo-cortical modules. A useful approach in the classification of this activity is the frequently cited nomenclature introduced by Galambos (1992). Galambos distinguished between (i) spontaneous gamma rhythms, which are not related to any stimulus; (ii) evoked gamma band responses, which are elicited and precisely time-locked to the onset of an external stimulus; (iii) emitted gamma band oscillations, which are time-locked to a stimulus that has been omitted, and (iv) induced gamma band rhythms, which are initiated by, but not time or phase-locked to, a stimulus (GALAMBOS u. MAKEIG 1992). The work presented in this thesis focuses on the last of these phenomena. To estimate induced oscillations, a time frequency decomposition is applied to each trial, and the resultant power is averaged across trials. The power of the average is the evoked response, while those unexplained by the power of the average are

23 1.4. EEG in the neuroscience of emotion 11 referred to as induced responses. The induced activity is the correlate of enormous synchronous neuronal assemblies that are unnoticed in evoked responses. Oscillations are induced because their self-organised emergence is not evoked directly by the stimulus but induced vicariously through nonlinear and possibly autonomous mechanisms (DAVID et al. 2006). The induced gamma band responses may be related to the computational operation of the cerebral cortex to link consistent relations among incoming signals. In humans, induced gamma band responses have been reported in various processing modalities such as the visual cortex (LUTZENBERGER et al. 1995; TALLON-BAUDRY et al. 1996), the auditory cortex (MAKEIG 1993) and the sensorimotor cortex (KRISTEVA-FEIGE et al. 1993) by means of non-invasive EEG or magnetoencephalography (MEG) measurements. Although there is an account of gamma oscillation in visual and auditory perception research there is a dearth of literature on prosody perception. Thus, there is a need to investigate these oscillations in prosody perception, especially for populations with altered hearing abilities. 1.4 EEG in the neuroscience of emotion Various studies have reported that ERPs are extremely useful in studying normal (KU- TAS u. HILLYARD 1980) and impaired (HAGOORT et al. 1996) semantic speech comprehension, as well as the processing of linguistic prosody (STEINHAUER et al. 1999). Only a few studies have employed ERPs to investigate recognition of emotional voice quality in particular, and affective prosody in general. Twist et al. (1991) investigated ERPs in emotional prosodic stimuli in an oddball task presented to patients with right and left side brain damage. Neutral single-syllable words served as frequent stimuli and words with unexpected intonation as rare (target) stimuli; participants were instructed to press a button on the occurrence of rare stimuli. The study found that, in response to the target stimuli, the P300 exhibited both a diminished ampli-

24 12 Chapter 1. Introduction tude and a delayed latency for patients with right-brain damage, when compared to either patients with left-brain damage or healthy controls (TWIST et al. 1991). Bostanov and Kotchoubey (2004) investigated the recognition of affective prosody using emotional exclamations (e.g. Wow, Oooh ) in a passive oddball paradigm. They found an N 300 to contextually incongruous exclamations (BOSTANOV u. KOTCHOUBEY 2004) and assumed the N 300 to be an indicator of semantically inappropriate words, similar to the well-known N 400. Kotz et al. (2004) revealed differences in the P 200 component of different valences tested. The P 200 amplitude was largest in response to positive stimuli. In addition to the early component, they also found a difference between valences at a later stage (largest negativity for neutral stimuli 400 ms after the stimulus onset) (KOTZ et al. 2004). Wambacq et al. (2004) investigated the voluntary and non-voluntary processing of emotional prosody, revealing a timing difference between the two conditions, whereby emotional prosody was processed 360 ms post-stimulus onset in the voluntary processing condition (revealed by a P 360), but 200 ms earlier (revealed by a P 160 ms post-stimulus) in the non-voluntary condition (WAMBACQ u. JERGER 2004). Paulmann and colleagues performed a series of studies (PAULMANN u. KOTZ 2008a; PAULMANN et al. 2011) focusing on the importance of ERPs in prosodic differentiation, conflict evaluation and lateralisation. Thus, it is apparent that ERPs are a very reliable objective technique for prosody evaluation. ERPs are gaining importance in CI research as the demand for objective measures increases. Several studies investigate speech perception in children and adults with CIs (refer to Late Auditory Event-Related Potentials in Children with Cochlear Implants: A Review, by (JOHNSON 2009) for more information). Sharma and Dorman (1999) found a double-peaked N 100 in response to voice onset time; the inter-peak latencies approximated the voice onset time between a voiceless consonant and the onset of vowel vocalisation in children with CIs (SHARMA u. DORMAN 1999). Other studies

25 1.5. Research questions 13 in adults examine how the N100, P 200 and N200 waveforms reflect speech discrimination skills (TALYOR u. BALDEWEG 2002), but there are no studies combining CI, prosody and EEG. This introduction has summarised current research investigating a number of aspects of prosody perception, revealing certain aspects that are, to date, un-addressed. As previously noted, many previous studies have focused on visual emotions, but there is a lack of work on auditory emotion perception (prosody perception). Furthermore, prosody perception in individuals with CIs is poorly understood. Thus, there is a need for systematic investigation of the neural correlates of emotional prosody perception in CI users. 1.5 Research questions The work presented in this thesis attempts to answer the following research questions: i. Are CI simulations an appropriate tool for the investigation of differences between strategies on prosody perception? ii. Can CI users perceive emotional prosody? iii. Are ERPs a reliable measure to explore prosody recognition? iv. Is the gamma band activity in the response to prosodic stimuli modulated according to the acoustic properties? If so, is there a characteristic time-course and topography of this modulation? v. Most importantly, is the MP3000 better than the ACE strategy for prosody perception?

26 14 Chapter 1. Introduction In order to examine these questions, two studies have been conducted. The first study, described in manuscript I, focuses on abilities of normal-hearing (NH) subjects receiving original stimuli as well as CI simulations, in order to investigate the differences between two speech coding strategies (ACE and MP3000) on prosodic features. The second study, presented in manuscript II, focuses on the ability of CI users to identify prosodic stimuli and compares the ACE and MP3000 strategies using EEG.

27 Chapter 2 Manuscript I ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies Deepashri Agrawal 1, Lydia Timm 1, Filipa C. Viola 2, Stefan Debener 2, Andreas Büchner 3, Reinhard Dengler 1 & Matthias Wittfoth 1 1 Department of Neurology, Hannover Medical School,Hannover, Germany 2 Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany 3 Department of Otolaryngology, Hannover Medical School, Hannover, Germany This manuscript has been accepted for publication in BMC Neuroscience and can be accessed: BMC Neuroscience 2012, 13:113 website: http: // Note: In this chapter, MP3000 strategy is referred to as PACE (PACE is the actual name, and MP3000 is the commercial name for the same strategy). 2.1 Abstract Background Emotionally salient information in spoken language can be provided by variations in speech melody (prosody) or by emotional semantics. Emotional prosody is essential to convey feelings through speech. In sensori-neural hearing loss, impaired speech perception can be improved by cochlear implants (CIs). The aim of this study was to investigate the performance of normal-hearing (NH) participants on the perception of emotional prosody with

28 16 Chapter 2. Manuscript I vocoded stimuli. Semantically neutral sentences with emotional (happy, angry and neutral) prosody were used. Sentences were manipulated to simulate two CI speech-coding strategies: the advanced combination encoder (ACE) and the newly developed psychoacoustic advanced combination encoder (PACE). Twenty NH adults were asked to recognise emotional prosody from ACE and PACE simulations. Performance was assessed using behavioural tests and event-related potentials (ERPs). Results Behavioural data revealed superior performance with original stimuli compared to the simulations. For simulations, better recognition for happy and angry prosody was observed compared to the neutral. Irrespective of simulated or unsimulated stimulus type, a significantly larger P 200 event-related potential was observed for happy prosody after sentence onset than the other two emotions. Furthermore, the amplitude of P 200 was significantly more positive for PACE strategy use compared to the ACE strategy. Conclusions Results suggested the P 200 peak as an indicator of active differentiation and recognition of emotional prosody. A larger P 200 peak amplitude for happy prosody indicated the importance of fundamental frequency (F0) cues in prosody processing. Advantage of PACE over ACE highlighted a privileged role of the psychoacoustic masking model in improving prosody perception. Taken together, the study emphasises on the importance of vocoded simulation to better understand the prosodic cues which CI users may be utilising. Keywords: Emotional prosody, Cochlear implants, Simulations, Event-related potentials.

29 2.2 References 1. Ross ED: The aprosodias. Functional-anatomic organization of the affective components of language in the right hemisphere. Arch Neurol 1981, 38(9): Murray IR, Arnott JL: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am 1993, 93(2): Schroder C, Mobes J, Schutze M, Szymanowski F, Nager W, Bangert M, Munte TF, Dengler R: Perception of emotional speech in Parkinson's disease. Mov Disord 2006, 21(10): Nikolova ZT, Fellbrich A, Born J, Dengler R, Schroder C: Deficient recognition of emotional prosody in primary focal dystonia. Eur J Neurol 2011, 18(2): Chee GH, Goldring JE, Shipp DB, Ng AH, Chen JM, Nedzelski JM: Benefits of cochlear implantation in early-deafened adults: the Toronto experience. J Otolaryngol 2004, 33(1):26-31.

30 6. Kaplan DM, Shipp DB, Chen JM, Ng AH, Nedzelski JM: Early-deafened adult cochlear implant users: assessment of outcomes. J Otolaryngol 2003, 32(4): Donaldson GS, Nelson DA: Place-pitch sensitivity and its relation to consonant recognition by cochlear implant listeners using the MPEAK and SPEAK speech processing strategies. J Acoust Soc Am 2000, 107(3): Sandmann P, Dillier N, Eichele T, Meyer M, Kegel A, Pascual-Marqui RD, Marcar VL, Jancke L, Debener S: Visual activation of auditory cortex reflects maladaptive plasticity in cochlear implant users. Brain 2012, 135(Pt 2): Mohr PE, Feldman JJ, Dunbar JL, McConkey-Robbins A, Niparko JK, Rittenhouse RK, Skinner MW: The societal costs of severe to profound hearing loss in the United States. Int J Technol Assess Health Care 2000, 16(4): Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M: Speech recognition with primarily temporal cues. Science 1995, 270(5234):

31 11. Buechner A, Brendel M, Krueger B, Frohne-Buchner C, Nogueira W, Edler B, Lenarz T: Current steering and results from novel speech coding strategies. Otol Neurotol 2008, 29(2): Nogueira W, Vanpoucke F, Dykmans P, De Raeve L, Van Hamme H, Roelens J: Speech recognition technology in CI rehabilitation. Cochlear Implants Int 2010, 11 Suppl 1: Loizou PC: Signal-processing techniques for cochlear implants. IEEE Eng Med Biol Mag 1999, 18(3): Nogueira W, Buechner A, Lenarz T, Edler B: A Psychoacoustic "NofM"- type Speech Coding Strategy for Cochlear Implants. Journal on Applied Signal Processing, Special Issue on DSP in Hearing Aids and Cochlear Implants, Eurasip 2005, 127(18): Lai WK, & Dillier, N: Investigating the MP3000 coding strategy for music perception. In 11 Jahrestagung der Deutschen Gesellschaft für Audiologie: 2008; Kiel, Germany; 2008: Weber J, Ruehl, S., & Buechner, A: Evaluation der Sprachverarbeitungsstrategie MP3000 bei Erstanpassung. In 81st Annual Meeting of the German Society of Oto-Rhino-Laryngology, Head and

32 Neck Surgery. Wiesbaden: German Medical Science GMS Publishing House; Kutas M, Hillyard SA: Event-related brain potentials to semantically inappropriate and surprisingly large words. Biol Psychol 1980, 11(2): Steinhauer K, Alter K, Friederici AD: Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nat Neurosci 1999, 2(2): Schapkin SA, Gusev AN, Kuhl J: Categorization of unilaterally presented emotional words: an ERP analysis. Acta Neurobiol Exp (Wars) 2000, 60(1): Kotz SA, Meyer M, Alter K, Besson M, von Cramon DY, Friederici AD: On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang 2003, 86(3): Pihan H, Altenmuller E, Ackermann H: The cortical processing of perceived emotion: a DC-potential study on affective speech prosody. Neuroreport 1997, 8(3):

33 22. Kotz SA, Paulmann S: When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Res 2007, 1151: Hillyard SA, Picton TW: On and off components in the auditory evoked potential. Percept Psychophys 1978, 24(5): Rosburg T, Boutros NN, Ford JM: Reduced auditory evoked potential component N100 in schizophrenia--a critical review. Psychiatry Res 2008, 161(3): Anderson L, Shimamura AP: Influences of emotion on context memory while viewing film clips. Am J Psychol 2005, 118(3): Zeelenberg R, Wagenmakers EJ, Rotteveel M: The impact of emotion on perception: bias or enhanced processing? Psychol Sci 2006, 17(4): Grandjean D, Sander D, Pourtois G, Schwartz S, Seghier ML, Scherer KR, Vuilleumier P: The voices of wrath: brain responses to angry prosody in meaningless speech. Nat Neurosci 2005, 8(2): Grandjean D, Sander D, Lucas N, Scherer KR, Vuilleumier P: Effects of emotional prosody on auditory extinction for voices in patients with spatial neglect. Neuropsychologia 2008, 46(2):

34 29. Scherer KR: Vocal communication of emotion: A review of research paradigms. Speech Communication 2003, 40: Luo X, Fu QJ: Frequency modulation detection with simultaneous amplitude modulation by cochlear implant users. J Acoust Soc Am 2007, 122(2): Seither-Preisler A, Patterson R, Krumbholz K, Seither S, Lutkenhoner B: Evidence of pitch processing in the N100m component of the auditory evoked field. Hear Res 2006, 213(1-2): Schirmer A, Kotz SA: Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends Cogn Sci 2006, 10(1): Pinheiro AP, Galdo-Alvarez S, Rauber A, Sampaio A, Niznikiewicz M, Goncalves OF: Abnormal processing of emotional prosody in Williams syndrome: an event-related potentials study. Res Dev Disabil 2011, 32(1): Garcia-Larrea L, Lukaszevicz, A.C., & Mauguiere, F.: Revisiting the oddball paradigm. Non-target vs. neutral stimuli and the evaluation of ERP attentional effects. Neuropsychologia 1992, 30:

35 35. Alain C, Woods DL, Covarrubias D: Activation of duration-sensitive auditory cortical fields in humans. Electroencephalogr Clin Neurophysiol 1997, 104(6): Picton TW, Goodman WS, Bryce DP: Amplitude of evoked responses to tones of high intensity. Acta Otolaryngol 1970, 70(2): Meyer M, Baumann S, Jancke L: Electrical brain imaging reveals spatiotemporal dynamics of timbre perception in humans. Neuroimage 2006, 32(4): Shahin A, Bosnyak DJ, Trainor LJ, Roberts LE: Enhancement of neuroplastic P2 and N1c auditory evoked potentials in musicians. J Neurosci 2003, 23(13): Paulmann S, Pell MD, Kotz SA: How aging affects the recognition of emotional speech. Brain Lang 2008, 104(3): Kotz SA, Meyer M, Paulmann S: Lateralization of emotional prosody in the brain: an overview and synopsis on the impact of study design. Prog Brain Res 2006, 156:

36 41. Alter K, Rank, E., Kotz, S. A., Toepel, U., Besson, M., Schirmer, A., & Friederici, A. D.: Affective encoding in the speech signal and in eventrelated brain potentials. Speech Communication 2003, 40: Johnstone T, van Reekum CM, Oakes TR, Davidson RJ: The voice of emotion: an FMRI study of neural responses to angry and happy vocal expressions. Soc Cogn Affect Neurosci 2006, 1(3): Spreckelmeyer KN, Kutas M, Urbach T, Altenmuller E, Munte TF: Neural processing of vocal emotion and identity. Brain Cogn 2009, 69(1): Lang SF, Nelson CA, Collins PF: Event-related potentials to emotional and neutral stimuli. J Clin Exp Neuropsychol 1990, 12(6): Qin MK, Oxenham AJ: Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. J Acoust Soc Am 2003, 114(1): Laneau J, Wouters J, Moonen M: Relative contributions of temporal and place pitch cues to fundamental frequency discrimination in cochlear implantees. J Acoust Soc Am 2004, 116(6):

37 47. Drennan WR, Rubinstein JT: Music perception in cochlear implant users and its relationship with psychophysical capabilities. J Rehabil Res Dev 2008, 45(5): Wittfoth M, Schroder C, Schardt DM, Dengler R, Heinze HJ, Kotz SA: On emotional conflict: interference resolution of happy and angry prosody reveals valence-specific effects. Cereb Cortex 2010, 20(2): Swanson B, & Mauch, H: Nucleus MATLAB Toolbox Software User Manual Boersma P, & Weenink, D: Praat: doing phonetics by computer Jasper H: Progress and problems in brain research. J Mt Sinai Hosp N Y 1958, 25(3): Delorme A, Makeig S: EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods 2004, 134(1): Debener S, Thorne, J., Schneider, T.R. & Viola, F.C: Using ICA for the analysis of multi-channel EEG data. In Simultaneous EEG and fmri.

38 Edited by Debener MUS. New York, NY: Oxford University Press; 2010: Viola FC, Thorne J, Edmonds B, Schneider T, Eichele T, Debener S: Semiautomatic identification of independent components representing EEG artifact. Clin Neurophysiol 2009, 120(5):

39 Chapter 3 Manuscript II Electrophysiological responses to emotional prosody perception in cochlear implant users Agrawal D. 1, Thorne J.D. 2, Viola F.C. 2, Debener S. 2, Timm L. 1, Büchner A. 3, Dengler R. 1 & Wittfoth M. 1, 4 1 Department of Neurology, Hannover Medical School, Hannover, Germany 2 Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany 3 Department of Otolaryngology, Hannover Medical School, Hannover, Germany 4 NICA (NeuroImaging and Clinical Applications), Hannover, Germany 3.1 Summary This EEG study investigated the ability of cochlear implant (CI) users to recognise emotional prosody. Two CI speech processing strategies were compared: the ACE (advanced combination encoder) and the newly developed MP3000. Semantically neutral sentences spoken in three different emotional prosodies (neutral, angry, happy) were presented to 20 post-lingually deafened CI users and age-matched normal-hearing controls. Event related potentials (ERPs) were recorded to study the N100 and the P200 responses as well as the late positive potential (LPP). Event-related spectral power modulations were also calculated. CI users implemented with the MP3000 strategy showed a higher proportion of correctly recognised prosodic information compared to the ACE strategy users. Our ERP results demonstrated that emotional prosody elicited significant N100 and P 200 peaks, whereas the LPP was not different across strategies. Furthermore, the P 200 amplitude in response to happy prosodic information was

40 28 Chapter 3. Manuscript II significantly more positive for the MP3000 strategy compared to the ACE strategy. On spectral power analysis, two typical gamma activities were observed in the MP3000 users only: (i) an early gamma activity in the ms time window reflecting bottom-up attention regulation; and (ii) a late gamma activity between ms post-stimulus onset, probably reflecting top-down cognitive control. Our study suggests that the MP3000 strategy is better than ACE with regard to emotional prosody perception as confirmed by behavioural and electrophysiological responses. It could be shown that spectral analysis is a useful tool that can reveal differences between two Cl processing strategies in their recognition of prosody-specific features of language. Key Words: Emotional prosody, Cochlear implants, ERP, P 200 peak, Gamma band power 3.2 Introduction In spoken language, emotionally salient information can be communicated by variations in speech melody (emotional prosody) or by emotional semantics (verbal emotional content). Emotional prosody is the ability to express emotions through variations of pitch, intensity and duration (Scherer, 2003). Individuals with severe to profound hearing loss have a limited dynamic range in these parameters, thus their prosody recognition is affected. Cochlear implants (CI) are thought to improve not only language perception per se, but also specific aspects of language. However, it is still an open question whether such improvements include the recognition of emotional prosodic information. CIs encode sounds electronically, bypassing the damaged cochlea, and stimulating the auditory nerve electrically. Speech coding strategies are extremely important in CI processing as they decompose audio signals into different frequency bands and deliver the stimulation pattern to electrodes, maximising the user s communicative potential. A number of speech processing strategies have been developed over the past two decades mimicking firing patterns inside the normal cochlea (Loizou, 1999). The advanced combination encoder (ACE) was developed in the 1990s. This strategy separates speech signals into a number of sub bands

41 3.2. Introduction 29 (M) and derives the envelope information from each band signal. A subset of these (N) with the largest amplitude are then selected for stimulation (N out of M). In 2005 a new strategy, psychoacoustic advanced combination encoder (PACE), commercially known as MP3000 (this term will be used in manuscript), was developed. This strategy is based on a psychoacousticmasking model which neglects redundant signals, thus saving valuable bandwidth for those components that are usually perceived by normal-hearing (NH) individuals. The strategy is similar to the MP3 compression algorithm (Nogueira et al., 2005). There are reports highlighting the performance of CI users in understanding speech, in particular from phoneme to sentences, in quiet as well as in noisy environments. For example, researchers (Fu et al., 2005) reported that CI users voice gender identification was nearly perfect (94% correct) when large differences in the fundamental frequency (F0) existed between male and female talkers. House and colleagues used semantically neutral Swedish utterances of four target emotions (angry, happy, sad, and neutral), and found that for CI users, the performance of mean vocal emotion recognition increased from 44% to 51% one year after processor activation (House, 1994b). Comparably, another study (Luo and Fu, 2007) investigated the ability of NH listeners and CI users to recognise vocal emotions. CI users performed poorly with their own conventional processor, but their performance improved significantly as the number of channels was increased. Although other studies have focused on prosodic features, the dependent factors in these were speech recognition tests. However, it is difficult to interpret the outcome when there is no behavioural score change, thus creating the need for objective measurements. In the last decade, studies have used event-related potentials (ERPs) to study emotion recognition. There is evidence that ERPs are an important objective measurement of auditory emotional prosody differentiation and recognition (Pinheiro et al., 2011). ERP differences across emotions can be found as early as 200 ms after stimulus onset during both visual and auditory emotional processing (e.g., Schapkin et al., 2000; Agrawal et al., 2012; Kotz et al., 2006). The traditional ERP methodology reveals the phase locked (evoked) neural activities evoked by a particular cognitive process. Time-frequency (TF) analysis, on the other hand,

42 30 Chapter 3. Manuscript II can reveal the non-phase-locked (induced) neural activity that is hidden in standard averaged ERPs. In particular, we hypothesised that non-phase-locked brain activity should yield complementary effects not observable in the classic ERPs reported above (Makeig et al., 2004; Tallon-Baudry et al., 1999). Additionally, non-phase locked brain activity in the gamma-band range should indicate facilitated prosody recognition modulated by strategy (Hannemann et al., 2007; Lenz et al., 2007). However, although frequency-domain (spectral) analysis has been applied to auditory ERPs by various researchers in memory and speech perception paradigms (Fuentemilla et al., 2006; Muller et al., 2009), it is yet to be used in emotional prosody analysis in CI users. The present study was designed to understand more clearly the differences between CI users and NH participants, particularly for recognising emotional prosody. We hypothesised that prosody recognition in the auditory domain of NH individuals would be superior to that of participants with CIs. Based on differences between two speech coding strategies and the advantage of MP3000 strategy in improving spatial recognition cues, we further hypothesised that MP3000 might perform better compared with the ACE strategy in identifying prosody. In the current study, prosodic information was presented by using neutral, angry and happy tones of voice. Differences were assessed for the behavioural responses, the ERPs as well as TF analysis. The present study is especially important given the dearth of empirical studies on emotional prosody recognition with CI devices. 3.3 Materials and methods Participants Forty right-handed native German speakers (22 females, 18 males), aged years (mean = 41.5 years, SD = 7) participated in the experiment. The first group of participants (Group I) consisted of 20 CI users (mean = 42.1 years, SD = 7.01) wearing a Nucleus CI system, as detailed in Table 3.1. Subjects had used their implants continuously for at least 12 months and had at least 20% speech in noise perception scores on the Oldenburg sentence test (Wagener

43 3.3. Materials and methods 31 et al., 1999) prior to the study. Furthermore, subjects were divided into two subgroups with the aim of comparing two speech-coding strategies (ACE vs. MP3000). The first group of participants (Group IA) consisted of ten individuals (mean age of 42.1, SD = 8.2) with an ACE strategy as their default while their speech processors were programmed with MP3000 strategy for the experiment. Similarly, the remaining ten participants in (Group IB) were CI users (mean = 41.1 SD = 7.3) with MP3000 as their default and ACE as the experimental strategy. A control group (Group II) comprised age and gender-matched NH participants (age range: years; mean = 41 years, SD = 7.1). All participants had normal intelligence and reported no history of psychological or neurological problems. In order to test for depression, Beck s Depression Inventory (BDI) was used (Beck et al., 1996). None of the subjects had clinically-relevant symptoms of a depressive episode. The study was carried out in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of the Hannover Medical School. All participants gave written informed consent prior to the recording and received monetary compensation for their participation Stimuli The stimulus material consisted of 150 semantically-neutral German sentences with neutral, happy and angry prosody (50 each) spoken by a trained female German speaker. Stimuli were recorded with a sampling rate of 44.1-kHz and a 16-bit digitiser (Kotz et al., 2003; Kotz and Paulmann, 2007; Wittfoth et al., 2010). All sentences started with personal pronouns (For example, Sie hat die Zeitung gelesen ; She has read the newspaper ). The stimulus material was analysed prosodically using Praat (Boersma, 2005). Table 3.2 details the differences in the fundamental frequency (F0), intensity and duration of the sentences extracted. Differences in the pitch contours are depicted in Figure 3.1.