Prosody Perception in Cochlear Implant Users: EEG evidence



Similar documents
ACOUSTICAL CONSIDERATIONS FOR EFFECTIVE EMERGENCY ALARM SYSTEMS IN AN INDUSTRIAL SETTING

The Role of the Efferent System in Auditory Performance in Background Noise

CHAPTER 6 PRINCIPLES OF NEURAL CIRCUITS.

Lecture 2, Human cognition

Expanding Performance Leadership in Cochlear Implants. Hansjuerg Emch President, Advanced Bionics AG GVP, Sonova Medical

What Audio Engineers Should Know About Human Sound Perception. Part 2. Binaural Effects and Spatial Hearing

L2 EXPERIENCE MODULATES LEARNERS USE OF CUES IN THE PERCEPTION OF L3 TONES

Masters research projects. 1. Adapting Granger causality for use on EEG data.

Tinnitus and the Brain

Auditory memory and cerebral reorganization in post-linguistically deaf adults

Dr. Abdel Aziz Hussein Lecturer of Physiology Mansoura Faculty of Medicine

Overview of Methodology. Human Electrophysiology. Computing and Displaying Difference Waves. Plotting The Averaged ERP

3030. Eligibility Criteria.

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

Portions have been extracted from this report to protect the identity of the student. RIT/NTID AURAL REHABILITATION REPORT Academic Year

Obtaining Knowledge. Lecture 7 Methods of Scientific Observation and Analysis in Behavioral Psychology and Neuropsychology.

Workshop Perceptual Effects of Filtering and Masking Introduction to Filtering and Masking

The NAL Percentage Loss of Hearing Scale

Emotion Detection from Speech

Functional Auditory Performance Indicators (FAPI)

Timing Errors and Jitter

Thirukkural - A Text-to-Speech Synthesis System

Tonal Detection in Noise: An Auditory Neuroscience Insight

Brain Computer Interfaces (BCI) Communication Training of brain activity

62 Hearing Impaired MI-SG-FLD062-02

The loudness war is fought with (and over) compression

Advanced Speech-Audio Processing in Mobile Phones and Hearing Aids

Lecture 4: Jan 12, 2005

Neural Response Imaging: Measuring Auditory-Nerve Responses from the Cochlea with the HiResolution Bionic Ear System

5th Congress of Alps-Adria Acoustics Association NOISE-INDUCED HEARING LOSS

Speech-Language Pathology Curriculum Foundation Course Linkages

Hearing and Deafness 1. Anatomy & physiology

ERP indices of lab-learned phonotactics

Figure1. Acoustic feedback in packet based video conferencing system

Single trial analysis for linking electrophysiology and hemodynamic response. Christian-G. Bénar INSERM U751, Marseille

MANAGING QUEUE STABILITY USING ART2 IN ACTIVE QUEUE MANAGEMENT FOR CONGESTION CONTROL

Functional neuroimaging. Imaging brain function in real time (not just the structure of the brain).

Proceedings of Meetings on Acoustics

Video-Based Eye Tracking

THE VOICE OF LOVE. Trisha Belanger, Caroline Menezes, Claire Barboa, Mofida Helo, Kimia Shirazifard

Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA

Electrophysiology of language

Formant Bandwidth and Resilience of Speech to Noise

Program curriculum for graduate studies in Speech and Music Communication

Functional Communication for Soft or Inaudible Voices: A New Paradigm

Hearing Tests And Your Child

A Microphone Array for Hearing Aids

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

ERPs in Cognitive Neuroscience

The Design and Implementation of Multimedia Software

Auditory evoked response, clicks, notch noise bandwidth, frequency

Audiology and Hearing Science

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

PURE TONE AUDIOMETRY Andrew P. McGrath, AuD

Cochlear Implant, Bone Anchored Hearing Aids, and Auditory Brainstem Implant

Categories of Exceptionality and Definitions

Data Analysis Methods: Net Station 4.1 By Peter Molfese

Dr V. J. Brown. Neuroscience (see Biomedical Sciences) History, Philosophy, Social Anthropology, Theological Studies.

Case Study THE IMPORTANCE OF ACCURATE BEHAVIOURAL TESTING IN INFANT HEARING AID FITTINGS

Tinnitus: a brief overview

How To Know If A Cochlear Implant Is Right For You

C HAPTER NINE. Signal Processing for Severe-to-Profound Hearing Loss. Stefan Launer and Volker Kühnel. Introduction

Cochlear implants for children and adults with severe to profound deafness

Control of affective content in music production

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

Documentation Wadsworth BCI Dataset (P300 Evoked Potentials) Data Acquired Using BCI2000's P3 Speller Paradigm (

Visual Attention and Emotional Perception

REGULATIONS FOR THE DEGREE OF MASTER OF SCIENCE IN AUDIOLOGY (MSc[Audiology])

COCHLEAR NERVE APLASIA : THE AUDIOLOGIC PERSPECTIVE A CASE REPORT. Eva Orzan, MD Pediatric Audiology University Hospital of Padova, Italy

Tutorial about the VQR (Voice Quality Restoration) technology

SPEECH OR LANGUAGE IMPAIRMENT EARLY CHILDHOOD SPECIAL EDUCATION

A Study of Brainwave Entrainment Based on EEG Brain Dynamics

SPEECH Biswajeet Sarangi, B.Sc.(Audiology & speech Language pathology)

IT-MAIS. Infant-Toddler Meaningful Auditory Integration Scale. Instructions, Questionnaire and Score Sheet

Predicting Speech Intelligibility With a Multiple Speech Subsystems Approach in Children With Cerebral Palsy

PhD student at the Catholic University of the Sacred Heart, Department of Psychology,

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

Ten Simple Rules for Designing and Interpreting ERP Experiments Steven J. Luck University of Iowa

Curriculum Vitae EDUCATION WORK AND EXPERIENCE

The Effect of Long-Term Use of Drugs on Speaker s Fundamental Frequency

Independence of Visual Awareness from the Scope of Attention: an Electrophysiological Study

An Introduction to ERP Studies of Attention

Lecture 1-10: Spectrograms

ON SELECTIVE ATTENTION: PERCEPTION OR RESPONSE?

Vision: Receptors. Modes of Perception. Vision: Summary 9/28/2012. How do we perceive our environment? Sensation and Perception Terminology

Hearing Tests And Your Child

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

Hearing Tests for Children with Multiple or Developmental Disabilities by Susan Agrawal

TECHNICAL LISTENING TRAINING: IMPROVEMENT OF SOUND SENSITIVITY FOR ACOUSTIC ENGINEERS AND SOUND DESIGNERS

Cognitive Neuroscience. Questions. Multiple Methods. Electrophysiology. Multiple Methods. Approaches to Thinking about the Mind

Transcription:

Department of Neurology, Hannover Medical School Center for Systemic Neuroscience, University of Veterinary Medicine Hannover Prosody Perception in Cochlear Implant Users: EEG evidence Thesis Submitted in partial fulfilment of the requirements for the degree DOCTOR OF PHILOSOPHY (PhD) awarded by the University of Veterinary Medicine Hannover by Deepashri Agrawal Akola, India Hannover, 2012

Supervisor: Supervison Group: 1st Evaluation: Prof. Dr. Reinhard Dengler Prof. Dr. Reinhard Dengler Prof. Dr. Andrej Kral Prof. Dr. Elke Zimmermann Prof. Dr. Stefan Debner Prof. Dr. Reinhard Dengler Department of Neurology, Hannover Medical School, Hannover Prof. Dr. Andrej Kral Institute for AudioNeurotechnology, Hannover Medical School, Hannover Prof. Dr. Elke Zimmermann Department of Zoology, University of Veterinary Medicine, Hannover Prof. Dr. Stefan Debner Department of Psychology, Carl-von Ossietzky Unvisersity, Oldenburg 2nd Evaluation: Prof. Dr. Thomas F. Münte Universitätsklinikum Schleswig-Holstein Klinik für Neurologie, Lübeck Date of final exam: 5/10/2012

Parts of the thesis have been published or submitted for publication previously in: Deepashri Agrawal, Lydia Timm, Filipa Campos Viola, Stefan Debener, Andreas Büchner, Reinhard Dengler, Matthias Wittfoth. ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies, BMC Neuroscience (In press) Agrawal D., Thorne J.D., Viola F.C., Debener S., Timm L., Büchner A., Dengler R., and Wittfoth M. Electrophysiological responses to emotional prosody perception: a comparison of cochlear implant users, NeuroImage Clinical (under review) Results of this thesis were presented in form of posters or presentations at the following conferences: Agrawal D., Timm L., Dengler R. and Wittfoth M. Emotions transfered by Cochlear implants: an ERP study exploring the possibilities. International conference of auditory implantable devices, Baltimore, USA (2012) Agrawal D., Timm L., Dengler R. and Wittfoth M. Emotions transfered by Cochlear implants: An EEG study. DGKN, Klön Germany (2012) Agrawal D., Timm L., Dengler R. and Wittfoth M. Emotion perception in CI: an ERP study. Human Brain Mapping, Quebec City, Canada (2011)

Contents 1 Introduction 1 1.1 Emotions... 1 1.2 Cochlear Implants... 2 1.2.1 Advanced Combination Encoder (ACE)... 3 1.2.2 MP3000... 5 1.3 Electroencephalography (EEG)... 6 1.3.1 N 100 component... 8 1.3.2 P 200 component... 8 1.3.3 Late positive component... 9 1.3.4 Oscillatory activity... 9 1.4 EEG in the neuroscience of emotion... 11 1.5 Research questions... 13 2 Manuscript I 15 2.1 Abstract... 15 2.2 References... 17 3 Manuscript II 27 3.1 Summary... 27 3.2 Introduction... 28 3.3 Materials and methods... 30 3.3.1 Participants... 30 3.3.2 Stimuli... 31 3.3.3 Procedure... 32 3.3.4 ERP procedure and analysis... 33

ii Contents 3.3.5 Data processing... 34 3.3.6 Statistical Analysis... 36 3.4 Results... 37 3.4.1 Accuracy data... 37 3.4.2 Reaction times... 37 3.4.3 Event-related potentials... 39 3.4.4 Time-frequency analysis... 42 3.4.5 Correlation between accuracy rate and gamma band activity.. 43 3.5 Discussion... 43 3.5.1 Behavioural findings... 45 3.5.2 ERPs... 45 3.5.3 Time-frequency results... 46 3.6 References... 51 4 Overall Discussions 61 4.1 Conclusion and future directions... 65 5 Summary 67 6 Zusammenfassung 69 7 References 71 8 Acknowledgement 77

List of Figures 1.1 Block diagram illustrating ACE... 4 1.2 Schematic illustration of MP3000... 6 3.1 Pitch contours of the three prosodies (Manuscript II)... 33 3.2 Accuracy rate Manuscript(II)... 38 3.3 Reaction time Manuscript(II)... 39 3.4 ERP waveforms for three emotional prosodies for NH controls, Ace users and MP3000 users Manuscript (II)... 41 3.5 Induce gamma power plots for ACE users and MP3000 users Manuscript(II)... 44

List of Tables 3.1 Demographic data of CI users (Manuscript II)............. 32 3.2 Acoustic parameters of emotional sentences (Manuscript II)... 34 3.3 Peak N100 and P 200 mean latency as well as amplitude with standard deviations in parenthesis (Manuscript II)................ 40

Chapter 1 Introduction 1.1 Emotions According to Bharata, (NS VI. 31), in Sanskrit- "NAHI RASADATE KASCIDAPYARTHAH PRAVARTATE". (No meaning can proceed from speech in the absence of emotions) Emotions are needed for the expression of feelings and they can be conveyed in many ways, including words, facial expressions, gestures and through body language. Such actions can provide important information about an individual s emotional state. Speech features fall into two categories: segmental and suprasegmental. In brief, segmental speech concerns the structure of speech and suprasegmental speech concerns the melody or intonation, or prosody of speech. Kochanski and Shih (2002) listed four functions of prosody: (i) to convey lexical meaning (e.g. in tonal languages such as Chinese-Mandarin), (ii) to convey non-lexical information through intonation (questions versus declarative sentences), (iii) to relay discourse functions (new information in a discourse is often accented while old information is de-accented) and, most importantly, (iv) to express emotions (e.g. excitement is expressed by means of high pitch and fast speed), (KOCHANSKI u. SHIH 2002). Impairment in the identification of emotional prosody has a negative influence on social and emotional functioning and leads to poor interpretation of the emotional states of others. The importance of emotional prosody has been evidenced in studies

2 Chapter 1. Introduction of populations in which prosody is underdeveloped or disturbed, such as in individuals with Parkinson s disease (SCHRODER et al. 2006; PAULMANN et al. 2008), schizophrenia (BOZIKAS et al. 2006), autism (PAUL et al. 2005), or basal ganglia damage (VAN LANCKER SIDTIS et al. 2006). Due to their inability to perceive subtle changes in acoustic features, hearing impaired individuals are disadvantaged when trying to understand the affective states of others. Some profoundly hearing impaired individuals are able to perceive acoustic changes that occur in the frequency, time and intensity components of the speech signals, whereas others are only able to perceive changes in the time and intensity components (ERBER 1972; MOST u. SHURGI 1993). For this latter group of individuals, a cochlear implant has the potential to restore some degree of hearing, as well as to aid speech perception. 1.2 Cochlear Implants Clark (2003) describes the modern cochlear implant (CI) as being: A bionic ear, which restores useful hearing in severely to profoundly deaf people when the organ of hearing situated in the inner ear (cochlea) has not developed or is destroyed by disease or injury. This device bypasses the inner ear and provides information to the hearing centres through direct stimulation of the hearing nerve. The benefits of hearing impaired individuals receiving CIs are quite remarkable; most are able to hear speech and even conduct ordinary conversations over the telephone. The implant system consists of a microphone, a speech processor, a transmitter, a receiver and an electrode array, which is located inside the cochlea. The speech processor is responsible for decomposing the input audio signal into different frequency

1.2. Cochlear Implants 3 bands or channels while at the same time delivering the most appropriate stimulation pattern to the electrodes. CIs have been developed to compensate for speech loss, and they use algorithms that map sounds to electrode stimulation patterns in a speech coding strategy. Such algorithms are designed to support the segmental aspects of speech in order to maximise intelligibility, and as a result CIs are less supportive of the suprasegmental features of speech, which are important for prosodic perception. Thus, current speech processing strategies are inadequate in their delivery of prosodic information. Several speech coding strategies have been developed based on a speech model that extracts fundamental frequencies and formants from speech (e.g. F0/F1, F0/F1/F2 and MPEAK). In contrast, more recent strategies are based on hearing models in which information is passed through digital filter banks in order to generate stimulating pulsatile sequences. Examples of such speech encoding strategies include spectral peak (SPEAK), continuous interleaved sampling (CIS) and the advanced combination encoder (ACE). The fundamental aim of these strategies is to increase temporal resolution by concentrating on the most perceptually relevant spectral components and neglecting the least significant components. Although remarkable developments have been made in speech coding algorithms, they do not focus on prosody perception, and therefore much improvement is possible. The work presented in this thesis will concentrate on ACE and its new variant, the MP3000 speech coding strategy. 1.2.1 Advanced Combination Encoder (ACE) ACE operates by mapping the signal power spectrum to electrodes, where only N out of M (N < M) electrodes with the largest amplitude are activated. Figure 1.1 gives a schematic representation of the processing stages in the ACE speech coding strategy. Signals from the microphone (audio) are pre-emphasised to amplify the high

4 Chapter 1. Introduction Bandpass filters BPF 1 Envelope detection Audio Pre-emp BPF 2 Envelope detection Select Frame sequence and largest Mapping AGC amplitudes BPF M Envelope detection Sampling and selection Filter bank Figure 1.1: Block diagram illustrating ACE (Courtesy to Noguiera et al 2005) frequency components with the help of digital filters. After passing through adaptive gain control (AGC) to limit the distortion of loud sounds by selectively reducing the amplification, the AGC output is digitised and sent through a filter bank, where the spectral envelopes are then estimated. that was implanted inside the cochlea. Each filter bank represents each electrode Most systems have 22 electrodes, with the basal electrode corresponding to band 22 and the apical electrode corresponding to band 1. Finally, signal amplitudes are mapped to the corresponding electrodes and acoustic amplitudes are compressed into the dynamic range of the CI recipient, which is determined by the threshold of maximum comfortable loudness level for electrical stimulation. More specifically, out of the 22 bands, typically the 8 12 bands with largest envelope amplitudes are selected for stimulation in each cycle; this type of selection works well as it captures the perceptually relevant features of speech such as the formant peaks. In most cases, the maximum selection criterion performs spectral peak selection.

1.2. Cochlear Implants 5 1.2.2 MP3000 MP3000 (also known as PACE, psychoacoustic ACE) is an ACE variant incorporating a psychoacoustic model (NOGUEIRA et al. 2005). The psychoacoustic masking model is used for the selection of the N bands out of the total (M) bands, presented schematically in Figure 1.2. In this strategy, a digitised signal sampled at 16 khz is sent through the filter bank. In contrast to AGC, the filter bank is implemented using a fast Fourier transform (FFT). Subsequently, the envelope is estimated for each spectral band of the audio signal based on the psychoacoustic masking model, in contrast to the peak-picking algorithm used in ACE. Thus, it selects not the largest amplitude envelope, but the bands that are most important in terms of hearing perception. Finally the selected bands are mapped onto the electrode array. In MP3000, the signal amplitudes that deviate most from the estimated masking thresholds are retained on the basis that signal amplitudes smaller than the masking threshold are not audible and can therefore be discarded. As this strategy has the advantage of utilising only the useful information, in this thesis it is hypothesised that this strategy is well-suited for speech perception as well as prosody perception. Several recent studies investigate prosody perception in CI users. A study on mood perception of CI users reported that intensity is the primary cue for prosody recognition, followed by fundamental frequency, followed by the spectral and voice characteristics (HOUSE 1994). Another study examined the effects of altering the fundamental frequency (F0) on the perception of prosody and speaker gender, in both normal-hearing individuals and CI users (MEISTER et al. 2009). The authors reported that CI users showed a poorer hearing performance, but showed a performance similar to normal hearing controls when asked to differentiate between a statement and a question, as well as when asked to identify whether they heard a male or female voice. Although CI users have the preference for using different acoustic cues in dif-

6 Chapter 1. Introduction Digital audio FFT Envelope detection Selection algorithm Mapping Frame sequence Filter bank Psychoacoustic model Sampling and selection Figure 1.2: Block diagram illustrating MP3000.(Courtesy of Noguiera et al. 2005) ferentiating emotions, they find recognition confusing, especially when acoustic cues representing emotions are similar. The studies comparing two speech coding strategies on prosody perception with objective measures such as EEG, are scarce. Hence, the goal of the work presented here is to explore prosody perception in CI users using objective measures. 1.3 Electroencephalography (EEG) A good understanding of the underlying processes of prosody perception is hindered by a lack of adequate techniques for the on-line measurement of psychological processes, and researchers often find it difficult to determine the cortical events underpinning a given task based on behavioural results. Electroencephalography (EEG), an objective means of measuring brain activity, can be used to better understand the basic neural mechanisms involved in the processing of affective prosody. EEG measures the bioelectric activity of the brain non-invasively via electrodes placed on the surface of the scalp, with a temporal resolution better than 1 ms and spatial resolution of 2.5 cm at the cortical surface. The most useful application of EEG is the event-related potential (ERP) technique, where ERPs represent

1.3. Electroencephalography (EEG) 7 transient changes in EEG voltage, reflecting systematic brain activity which, in turn, is triggered by an internal or external sensory stimulus or motor response (DE ZUBICARAY et al. 2006). Thus, ERPs are small voltage variations resulting from the brain s response to a presented stimulus and they can therefore be regarded as manifestations of specific psychological processes. ERP signals are small in amplitude in comparison to the EEG signals in which they are embedded, and thus must be discriminated from noise (background EEG). This is best achieved by averaging, where samples that are time-locked to the repeated occurrence of a particular event are averaged together depending on the signal-to-noise ratio. In this way non-time-locked potentials are greatly reduced, leaving only the ERPs. Various stimuli can be used to evoke ERPs, such as visual, auditory, motor, pain and electric pulse stimuli. ERPs evoked by external auditory stimuli are known as auditory evoked potentials (AEPs) and are recognised as positive and negative waves or peaks in the EEG signal following stimulus onset. These peaks are generally described in terms of their characteristic distribution over the scalp, their polarity and their latency (e.g. the P 200 is a positive peak occurring 200 ms after the onset of the stimulus). ERPs are comprised of both exogenous and endogenous components. Exogenous components are influenced by the physical features of a stimulus and are almost unaffected by changes in cognitive state (HILLYARD u. MUNTE 1984). In contrast, endogenous components are thought to reflect the cognitive state of the participants (DONCHIN u. HEFFLEY 1979; DESMEDT u. DEBECKER 1979). However, there are reports in the literature indicating that some components share characteristics of both groups, e.g. N 100, P 200 (SHIBASAKI u. MIYAZAKI 1992), depending on the stimulus properties. There are several components that are important in neuro-psychological research. The work of this thesis concentrates on the

8 Chapter 1. Introduction early components, namely the N100 and the P 200 and the late positivity (a positive component occurring between 500 ms and 1200 ms). 1.3.1 N 100 component The N 100 component was originally investigated in a dichotic listening paradigm and is one of the most easily identified components, regardless of the specific analysis approach employed. In adults the N100 peaks between 80 and 120 ms after the onset of a stimulus, and is distributed mostly over the fronto-central region of the scalp (HILLYARD et al. 1973). Generally, this component is assumed to reflect the selective attention to basic stimulus characteristics and intentional discrimination processing (e.g.,vogel u. LUCK 2000). The latency and amplitude of the peak depends upon the stimulus modality. Auditory stimuli elicit a larger N 100 with shorter latency than visual stimuli (HUGDAHL et al. 1995). This is often described as the N100 P 200 or N1 P 2 complex in combination with the subsequent P 200 evoked potential. 1.3.2 P 200 component The P 200, like the N100, has long been considered to be an obligatory cortical potential because it has low inter-individual variability and high reliability (ROTH et al.1975; SANDMAN u. PATTERSON 2000). Functionally speaking, it has been shown that the P 200 component increases whenever participants are asked to attend to a particular stimulus characteristic, for example frequency, time or color, (HILLYARD u. MUNTE 1984), and is therefore often assumed to reflect selective attention processes. In auditory ERPs, the P 200 is known to be influenced by stimulus pitch (PANTEV et al. 1996) and intensity (FJELL u. WALHOVD 2003). Several researchers have reported that P 200 is an index of the extraction of emotional salience from acoustic cues, whether or not they contain linguistic information (SAUTER u. EIMER 2010;

1.3. Electroencephalography (EEG) 9 PINHEIRO et al. 2011; LIU et al. 2012). The above mentioned characteristics make this component optimal for the investigation of prosody recognition in CI users. 1.3.3 Late positive component The late positive component or late positive complex (LPC) is a positive-going ERP occurring around 600 ms after stimulus onset. This component has two functionally distinct peaks: one is associated with memory processes and the other is related to language. Although both peaks have roughly similar topographies, they appear to come from different sources in the brain. In the past decade, researchers have identified that LPC is strongly modulated by the emotional intensity of a stimulus: emotional stimuli of either positive or negative valence elicit a larger (i.e., more positive) LPC than neutral stimuli (KEIL et al. 2002; HAJCAK et al. 2010). Due to its advantage of reflecting memory as well as emotional language processes, this component is well suited for the work of this thesis, which is to examine prosody recognition abilities. 1.3.4 Oscillatory activity Oscillatory cortical activities in the human brain do not form a homogeneous class of responses, and instead take on a diverse range of mechanisms with correspondingly diverse levels of significance. While the examination of ERPs has provided useful insights into the nature and timing of neuronal events that sub-serve perceptual and cognitive processes, little attention has been paid to the raw EEG data from which ERPs are derived. The EEG signal reflects neural oscillations and synchronisations, and the oscillations represent the mechanism of inter-neuronal communication and binding of information processed in distributed brain regions. These oscillations can be studied using time frequency analysis or spectral analysis, whereby the frequencies of the EEG signal are decomposed into amplitude and phase components, thus characteris-

10 Chapter 1. Introduction ing temporal changes (on a millisecond time scale) with respect to task events. This analysis reveals that EEG does not simply reflect random background noise, rather there are event-related changes in the magnitude and phase of EEG oscillations at specific frequencies that support their role in event processing (MAKEIG et al. 2004). Oscillatory responses to a sensory or cognitive event are usually classified according to the natural frequencies of the brain: delta, 0.5 to 3 Hz; theta, 3.5 to 7 Hz; alpha 8 to13 Hz; beta, 14 to 35 Hz and gamma, 35 to 70 Hz. Each frequency band can be associated with a specific cortical activity. 1.3.4.1 Gamma band activity Gamma-band activity refers to those oscillations that correspond to the higher frequency range of the temporal spectrum, typically above 35 Hz. This activity is distributed diffusely over the entire brain (reflecting parallel processing) and is thought to be crucial for mutual information transfer between networks in the brain (BASAR u. GUNTEKIN 2008). Thus, there may also be significance for emotional processing because sub-cortical limbic systems and networks must be connected to neo-cortical modules. A useful approach in the classification of this activity is the frequently cited nomenclature introduced by Galambos (1992). Galambos distinguished between (i) spontaneous gamma rhythms, which are not related to any stimulus; (ii) evoked gamma band responses, which are elicited and precisely time-locked to the onset of an external stimulus; (iii) emitted gamma band oscillations, which are time-locked to a stimulus that has been omitted, and (iv) induced gamma band rhythms, which are initiated by, but not time or phase-locked to, a stimulus (GALAMBOS u. MAKEIG 1992). The work presented in this thesis focuses on the last of these phenomena. To estimate induced oscillations, a time frequency decomposition is applied to each trial, and the resultant power is averaged across trials. The power of the average is the evoked response, while those unexplained by the power of the average are

1.4. EEG in the neuroscience of emotion 11 referred to as induced responses. The induced activity is the correlate of enormous synchronous neuronal assemblies that are unnoticed in evoked responses. Oscillations are induced because their self-organised emergence is not evoked directly by the stimulus but induced vicariously through nonlinear and possibly autonomous mechanisms (DAVID et al. 2006). The induced gamma band responses may be related to the computational operation of the cerebral cortex to link consistent relations among incoming signals. In humans, induced gamma band responses have been reported in various processing modalities such as the visual cortex (LUTZENBERGER et al. 1995; TALLON-BAUDRY et al. 1996), the auditory cortex (MAKEIG 1993) and the sensorimotor cortex (KRISTEVA-FEIGE et al. 1993) by means of non-invasive EEG or magnetoencephalography (MEG) measurements. Although there is an account of gamma oscillation in visual and auditory perception research there is a dearth of literature on prosody perception. Thus, there is a need to investigate these oscillations in prosody perception, especially for populations with altered hearing abilities. 1.4 EEG in the neuroscience of emotion Various studies have reported that ERPs are extremely useful in studying normal (KU- TAS u. HILLYARD 1980) and impaired (HAGOORT et al. 1996) semantic speech comprehension, as well as the processing of linguistic prosody (STEINHAUER et al. 1999). Only a few studies have employed ERPs to investigate recognition of emotional voice quality in particular, and affective prosody in general. Twist et al. (1991) investigated ERPs in emotional prosodic stimuli in an oddball task presented to patients with right and left side brain damage. Neutral single-syllable words served as frequent stimuli and words with unexpected intonation as rare (target) stimuli; participants were instructed to press a button on the occurrence of rare stimuli. The study found that, in response to the target stimuli, the P300 exhibited both a diminished ampli-

12 Chapter 1. Introduction tude and a delayed latency for patients with right-brain damage, when compared to either patients with left-brain damage or healthy controls (TWIST et al. 1991). Bostanov and Kotchoubey (2004) investigated the recognition of affective prosody using emotional exclamations (e.g. Wow, Oooh ) in a passive oddball paradigm. They found an N 300 to contextually incongruous exclamations (BOSTANOV u. KOTCHOUBEY 2004) and assumed the N 300 to be an indicator of semantically inappropriate words, similar to the well-known N 400. Kotz et al. (2004) revealed differences in the P 200 component of different valences tested. The P 200 amplitude was largest in response to positive stimuli. In addition to the early component, they also found a difference between valences at a later stage (largest negativity for neutral stimuli 400 ms after the stimulus onset) (KOTZ et al. 2004). Wambacq et al. (2004) investigated the voluntary and non-voluntary processing of emotional prosody, revealing a timing difference between the two conditions, whereby emotional prosody was processed 360 ms post-stimulus onset in the voluntary processing condition (revealed by a P 360), but 200 ms earlier (revealed by a P 160 ms post-stimulus) in the non-voluntary condition (WAMBACQ u. JERGER 2004). Paulmann and colleagues performed a series of studies (PAULMANN u. KOTZ 2008a; PAULMANN et al. 2011) focusing on the importance of ERPs in prosodic differentiation, conflict evaluation and lateralisation. Thus, it is apparent that ERPs are a very reliable objective technique for prosody evaluation. ERPs are gaining importance in CI research as the demand for objective measures increases. Several studies investigate speech perception in children and adults with CIs (refer to Late Auditory Event-Related Potentials in Children with Cochlear Implants: A Review, by (JOHNSON 2009) for more information). Sharma and Dorman (1999) found a double-peaked N 100 in response to voice onset time; the inter-peak latencies approximated the voice onset time between a voiceless consonant and the onset of vowel vocalisation in children with CIs (SHARMA u. DORMAN 1999). Other studies

1.5. Research questions 13 in adults examine how the N100, P 200 and N200 waveforms reflect speech discrimination skills (TALYOR u. BALDEWEG 2002), but there are no studies combining CI, prosody and EEG. This introduction has summarised current research investigating a number of aspects of prosody perception, revealing certain aspects that are, to date, un-addressed. As previously noted, many previous studies have focused on visual emotions, but there is a lack of work on auditory emotion perception (prosody perception). Furthermore, prosody perception in individuals with CIs is poorly understood. Thus, there is a need for systematic investigation of the neural correlates of emotional prosody perception in CI users. 1.5 Research questions The work presented in this thesis attempts to answer the following research questions: i. Are CI simulations an appropriate tool for the investigation of differences between strategies on prosody perception? ii. Can CI users perceive emotional prosody? iii. Are ERPs a reliable measure to explore prosody recognition? iv. Is the gamma band activity in the response to prosodic stimuli modulated according to the acoustic properties? If so, is there a characteristic time-course and topography of this modulation? v. Most importantly, is the MP3000 better than the ACE strategy for prosody perception?

14 Chapter 1. Introduction In order to examine these questions, two studies have been conducted. The first study, described in manuscript I, focuses on abilities of normal-hearing (NH) subjects receiving original stimuli as well as CI simulations, in order to investigate the differences between two speech coding strategies (ACE and MP3000) on prosodic features. The second study, presented in manuscript II, focuses on the ability of CI users to identify prosodic stimuli and compares the ACE and MP3000 strategies using EEG.

Chapter 2 Manuscript I ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies Deepashri Agrawal 1, Lydia Timm 1, Filipa C. Viola 2, Stefan Debener 2, Andreas Büchner 3, Reinhard Dengler 1 & Matthias Wittfoth 1 1 Department of Neurology, Hannover Medical School,Hannover, Germany 2 Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany 3 Department of Otolaryngology, Hannover Medical School, Hannover, Germany This manuscript has been accepted for publication in BMC Neuroscience and can be accessed: BMC Neuroscience 2012, 13:113 website: http: //www.biomedcentral.com/1471-2202/13/113 Note: In this chapter, MP3000 strategy is referred to as PACE (PACE is the actual name, and MP3000 is the commercial name for the same strategy). 2.1 Abstract Background Emotionally salient information in spoken language can be provided by variations in speech melody (prosody) or by emotional semantics. Emotional prosody is essential to convey feelings through speech. In sensori-neural hearing loss, impaired speech perception can be improved by cochlear implants (CIs). The aim of this study was to investigate the performance of normal-hearing (NH) participants on the perception of emotional prosody with

16 Chapter 2. Manuscript I vocoded stimuli. Semantically neutral sentences with emotional (happy, angry and neutral) prosody were used. Sentences were manipulated to simulate two CI speech-coding strategies: the advanced combination encoder (ACE) and the newly developed psychoacoustic advanced combination encoder (PACE). Twenty NH adults were asked to recognise emotional prosody from ACE and PACE simulations. Performance was assessed using behavioural tests and event-related potentials (ERPs). Results Behavioural data revealed superior performance with original stimuli compared to the simulations. For simulations, better recognition for happy and angry prosody was observed compared to the neutral. Irrespective of simulated or unsimulated stimulus type, a significantly larger P 200 event-related potential was observed for happy prosody after sentence onset than the other two emotions. Furthermore, the amplitude of P 200 was significantly more positive for PACE strategy use compared to the ACE strategy. Conclusions Results suggested the P 200 peak as an indicator of active differentiation and recognition of emotional prosody. A larger P 200 peak amplitude for happy prosody indicated the importance of fundamental frequency (F0) cues in prosody processing. Advantage of PACE over ACE highlighted a privileged role of the psychoacoustic masking model in improving prosody perception. Taken together, the study emphasises on the importance of vocoded simulation to better understand the prosodic cues which CI users may be utilising. Keywords: Emotional prosody, Cochlear implants, Simulations, Event-related potentials.

2.2 References 1. Ross ED: The aprosodias. Functional-anatomic organization of the affective components of language in the right hemisphere. Arch Neurol 1981, 38(9):561-569. 2. Murray IR, Arnott JL: Toward the simulation of emotion in synthetic speech: a review of the literature on human vocal emotion. J Acoust Soc Am 1993, 93(2):1097-1108. 3. Schroder C, Mobes J, Schutze M, Szymanowski F, Nager W, Bangert M, Munte TF, Dengler R: Perception of emotional speech in Parkinson's disease. Mov Disord 2006, 21(10):1774-1778. 4. Nikolova ZT, Fellbrich A, Born J, Dengler R, Schroder C: Deficient recognition of emotional prosody in primary focal dystonia. Eur J Neurol 2011, 18(2):329-336. 5. Chee GH, Goldring JE, Shipp DB, Ng AH, Chen JM, Nedzelski JM: Benefits of cochlear implantation in early-deafened adults: the Toronto experience. J Otolaryngol 2004, 33(1):26-31.

6. Kaplan DM, Shipp DB, Chen JM, Ng AH, Nedzelski JM: Early-deafened adult cochlear implant users: assessment of outcomes. J Otolaryngol 2003, 32(4):245-249. 7. Donaldson GS, Nelson DA: Place-pitch sensitivity and its relation to consonant recognition by cochlear implant listeners using the MPEAK and SPEAK speech processing strategies. J Acoust Soc Am 2000, 107(3):1645-1658. 8. Sandmann P, Dillier N, Eichele T, Meyer M, Kegel A, Pascual-Marqui RD, Marcar VL, Jancke L, Debener S: Visual activation of auditory cortex reflects maladaptive plasticity in cochlear implant users. Brain 2012, 135(Pt 2):555-568. 9. Mohr PE, Feldman JJ, Dunbar JL, McConkey-Robbins A, Niparko JK, Rittenhouse RK, Skinner MW: The societal costs of severe to profound hearing loss in the United States. Int J Technol Assess Health Care 2000, 16(4):1120-1135. 10. Shannon RV, Zeng FG, Kamath V, Wygonski J, Ekelid M: Speech recognition with primarily temporal cues. Science 1995, 270(5234):303-304.

11. Buechner A, Brendel M, Krueger B, Frohne-Buchner C, Nogueira W, Edler B, Lenarz T: Current steering and results from novel speech coding strategies. Otol Neurotol 2008, 29(2):203-207. 12. Nogueira W, Vanpoucke F, Dykmans P, De Raeve L, Van Hamme H, Roelens J: Speech recognition technology in CI rehabilitation. Cochlear Implants Int 2010, 11 Suppl 1:449-453. 13. Loizou PC: Signal-processing techniques for cochlear implants. IEEE Eng Med Biol Mag 1999, 18(3):34-46. 14. Nogueira W, Buechner A, Lenarz T, Edler B: A Psychoacoustic "NofM"- type Speech Coding Strategy for Cochlear Implants. Journal on Applied Signal Processing, Special Issue on DSP in Hearing Aids and Cochlear Implants, Eurasip 2005, 127(18):3044-3059. 15. Lai WK, & Dillier, N: Investigating the MP3000 coding strategy for music perception. In 11 Jahrestagung der Deutschen Gesellschaft für Audiologie: 2008; Kiel, Germany; 2008:1-4. 16. Weber J, Ruehl, S., & Buechner, A: Evaluation der Sprachverarbeitungsstrategie MP3000 bei Erstanpassung. In 81st Annual Meeting of the German Society of Oto-Rhino-Laryngology, Head and

Neck Surgery. Wiesbaden: German Medical Science GMS Publishing House; 2010. 17. Kutas M, Hillyard SA: Event-related brain potentials to semantically inappropriate and surprisingly large words. Biol Psychol 1980, 11(2):99-116. 18. Steinhauer K, Alter K, Friederici AD: Brain potentials indicate immediate use of prosodic cues in natural speech processing. Nat Neurosci 1999, 2(2):191-196. 19. Schapkin SA, Gusev AN, Kuhl J: Categorization of unilaterally presented emotional words: an ERP analysis. Acta Neurobiol Exp (Wars) 2000, 60(1):17-28. 20. Kotz SA, Meyer M, Alter K, Besson M, von Cramon DY, Friederici AD: On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang 2003, 86(3):366-376. 21. Pihan H, Altenmuller E, Ackermann H: The cortical processing of perceived emotion: a DC-potential study on affective speech prosody. Neuroreport 1997, 8(3):623-627.

22. Kotz SA, Paulmann S: When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Res 2007, 1151:107-118. 23. Hillyard SA, Picton TW: On and off components in the auditory evoked potential. Percept Psychophys 1978, 24(5):391-398. 24. Rosburg T, Boutros NN, Ford JM: Reduced auditory evoked potential component N100 in schizophrenia--a critical review. Psychiatry Res 2008, 161(3):259-274. 25. Anderson L, Shimamura AP: Influences of emotion on context memory while viewing film clips. Am J Psychol 2005, 118(3):323-337. 26. Zeelenberg R, Wagenmakers EJ, Rotteveel M: The impact of emotion on perception: bias or enhanced processing? Psychol Sci 2006, 17(4):287-291. 27. Grandjean D, Sander D, Pourtois G, Schwartz S, Seghier ML, Scherer KR, Vuilleumier P: The voices of wrath: brain responses to angry prosody in meaningless speech. Nat Neurosci 2005, 8(2):145-146. 28. Grandjean D, Sander D, Lucas N, Scherer KR, Vuilleumier P: Effects of emotional prosody on auditory extinction for voices in patients with spatial neglect. Neuropsychologia 2008, 46(2):487-496.

29. Scherer KR: Vocal communication of emotion: A review of research paradigms. Speech Communication 2003, 40:227-256. 30. Luo X, Fu QJ: Frequency modulation detection with simultaneous amplitude modulation by cochlear implant users. J Acoust Soc Am 2007, 122(2):1046-1054. 31. Seither-Preisler A, Patterson R, Krumbholz K, Seither S, Lutkenhoner B: Evidence of pitch processing in the N100m component of the auditory evoked field. Hear Res 2006, 213(1-2):88-98. 32. Schirmer A, Kotz SA: Beyond the right hemisphere: brain mechanisms mediating vocal emotional processing. Trends Cogn Sci 2006, 10(1):24-30. 33. Pinheiro AP, Galdo-Alvarez S, Rauber A, Sampaio A, Niznikiewicz M, Goncalves OF: Abnormal processing of emotional prosody in Williams syndrome: an event-related potentials study. Res Dev Disabil 2011, 32(1):133-147. 34. Garcia-Larrea L, Lukaszevicz, A.C., & Mauguiere, F.: Revisiting the oddball paradigm. Non-target vs. neutral stimuli and the evaluation of ERP attentional effects. Neuropsychologia 1992, 30:723-741.

35. Alain C, Woods DL, Covarrubias D: Activation of duration-sensitive auditory cortical fields in humans. Electroencephalogr Clin Neurophysiol 1997, 104(6):531-539. 36. Picton TW, Goodman WS, Bryce DP: Amplitude of evoked responses to tones of high intensity. Acta Otolaryngol 1970, 70(2):77-82. 37. Meyer M, Baumann S, Jancke L: Electrical brain imaging reveals spatiotemporal dynamics of timbre perception in humans. Neuroimage 2006, 32(4):1510-1523. 38. Shahin A, Bosnyak DJ, Trainor LJ, Roberts LE: Enhancement of neuroplastic P2 and N1c auditory evoked potentials in musicians. J Neurosci 2003, 23(13):5545-5552. 39. Paulmann S, Pell MD, Kotz SA: How aging affects the recognition of emotional speech. Brain Lang 2008, 104(3):262-269. 40. Kotz SA, Meyer M, Paulmann S: Lateralization of emotional prosody in the brain: an overview and synopsis on the impact of study design. Prog Brain Res 2006, 156:285-294.

41. Alter K, Rank, E., Kotz, S. A., Toepel, U., Besson, M., Schirmer, A., & Friederici, A. D.: Affective encoding in the speech signal and in eventrelated brain potentials. Speech Communication 2003, 40:61-70. 42. Johnstone T, van Reekum CM, Oakes TR, Davidson RJ: The voice of emotion: an FMRI study of neural responses to angry and happy vocal expressions. Soc Cogn Affect Neurosci 2006, 1(3):242-249. 43. Spreckelmeyer KN, Kutas M, Urbach T, Altenmuller E, Munte TF: Neural processing of vocal emotion and identity. Brain Cogn 2009, 69(1):121-126. 44. Lang SF, Nelson CA, Collins PF: Event-related potentials to emotional and neutral stimuli. J Clin Exp Neuropsychol 1990, 12(6):946-958. 45. Qin MK, Oxenham AJ: Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers. J Acoust Soc Am 2003, 114(1):446-454. 46. Laneau J, Wouters J, Moonen M: Relative contributions of temporal and place pitch cues to fundamental frequency discrimination in cochlear implantees. J Acoust Soc Am 2004, 116(6):3606-3619.

47. Drennan WR, Rubinstein JT: Music perception in cochlear implant users and its relationship with psychophysical capabilities. J Rehabil Res Dev 2008, 45(5):779-789. 48. Wittfoth M, Schroder C, Schardt DM, Dengler R, Heinze HJ, Kotz SA: On emotional conflict: interference resolution of happy and angry prosody reveals valence-specific effects. Cereb Cortex 2010, 20(2):383-392. 49. Swanson B, & Mauch, H: Nucleus MATLAB Toolbox Software User Manual. 2006. 50. Boersma P, & Weenink, D: Praat: doing phonetics by computer. 2005. 51. Jasper H: Progress and problems in brain research. J Mt Sinai Hosp N Y 1958, 25(3):244-253. 52. Delorme A, Makeig S: EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods 2004, 134(1):9-21. 53. Debener S, Thorne, J., Schneider, T.R. & Viola, F.C: Using ICA for the analysis of multi-channel EEG data. In Simultaneous EEG and fmri.

Edited by Debener MUS. New York, NY: Oxford University Press; 2010:121-135. 54. Viola FC, Thorne J, Edmonds B, Schneider T, Eichele T, Debener S: Semiautomatic identification of independent components representing EEG artifact. Clin Neurophysiol 2009, 120(5):868-877.

Chapter 3 Manuscript II Electrophysiological responses to emotional prosody perception in cochlear implant users Agrawal D. 1, Thorne J.D. 2, Viola F.C. 2, Debener S. 2, Timm L. 1, Büchner A. 3, Dengler R. 1 & Wittfoth M. 1, 4 1 Department of Neurology, Hannover Medical School, Hannover, Germany 2 Department of Psychology, Carl von Ossietzky University, Oldenburg, Germany 3 Department of Otolaryngology, Hannover Medical School, Hannover, Germany 4 NICA (NeuroImaging and Clinical Applications), Hannover, Germany 3.1 Summary This EEG study investigated the ability of cochlear implant (CI) users to recognise emotional prosody. Two CI speech processing strategies were compared: the ACE (advanced combination encoder) and the newly developed MP3000. Semantically neutral sentences spoken in three different emotional prosodies (neutral, angry, happy) were presented to 20 post-lingually deafened CI users and age-matched normal-hearing controls. Event related potentials (ERPs) were recorded to study the N100 and the P200 responses as well as the late positive potential (LPP). Event-related spectral power modulations were also calculated. CI users implemented with the MP3000 strategy showed a higher proportion of correctly recognised prosodic information compared to the ACE strategy users. Our ERP results demonstrated that emotional prosody elicited significant N100 and P 200 peaks, whereas the LPP was not different across strategies. Furthermore, the P 200 amplitude in response to happy prosodic information was

28 Chapter 3. Manuscript II significantly more positive for the MP3000 strategy compared to the ACE strategy. On spectral power analysis, two typical gamma activities were observed in the MP3000 users only: (i) an early gamma activity in the 100 250 ms time window reflecting bottom-up attention regulation; and (ii) a late gamma activity between 900 1100 ms post-stimulus onset, probably reflecting top-down cognitive control. Our study suggests that the MP3000 strategy is better than ACE with regard to emotional prosody perception as confirmed by behavioural and electrophysiological responses. It could be shown that spectral analysis is a useful tool that can reveal differences between two Cl processing strategies in their recognition of prosody-specific features of language. Key Words: Emotional prosody, Cochlear implants, ERP, P 200 peak, Gamma band power 3.2 Introduction In spoken language, emotionally salient information can be communicated by variations in speech melody (emotional prosody) or by emotional semantics (verbal emotional content). Emotional prosody is the ability to express emotions through variations of pitch, intensity and duration (Scherer, 2003). Individuals with severe to profound hearing loss have a limited dynamic range in these parameters, thus their prosody recognition is affected. Cochlear implants (CI) are thought to improve not only language perception per se, but also specific aspects of language. However, it is still an open question whether such improvements include the recognition of emotional prosodic information. CIs encode sounds electronically, bypassing the damaged cochlea, and stimulating the auditory nerve electrically. Speech coding strategies are extremely important in CI processing as they decompose audio signals into different frequency bands and deliver the stimulation pattern to electrodes, maximising the user s communicative potential. A number of speech processing strategies have been developed over the past two decades mimicking firing patterns inside the normal cochlea (Loizou, 1999). The advanced combination encoder (ACE) was developed in the 1990s. This strategy separates speech signals into a number of sub bands

3.2. Introduction 29 (M) and derives the envelope information from each band signal. A subset of these (N) with the largest amplitude are then selected for stimulation (N out of M). In 2005 a new strategy, psychoacoustic advanced combination encoder (PACE), commercially known as MP3000 (this term will be used in manuscript), was developed. This strategy is based on a psychoacousticmasking model which neglects redundant signals, thus saving valuable bandwidth for those components that are usually perceived by normal-hearing (NH) individuals. The strategy is similar to the MP3 compression algorithm (Nogueira et al., 2005). There are reports highlighting the performance of CI users in understanding speech, in particular from phoneme to sentences, in quiet as well as in noisy environments. For example, researchers (Fu et al., 2005) reported that CI users voice gender identification was nearly perfect (94% correct) when large differences in the fundamental frequency (F0) existed between male and female talkers. House and colleagues used semantically neutral Swedish utterances of four target emotions (angry, happy, sad, and neutral), and found that for CI users, the performance of mean vocal emotion recognition increased from 44% to 51% one year after processor activation (House, 1994b). Comparably, another study (Luo and Fu, 2007) investigated the ability of NH listeners and CI users to recognise vocal emotions. CI users performed poorly with their own conventional processor, but their performance improved significantly as the number of channels was increased. Although other studies have focused on prosodic features, the dependent factors in these were speech recognition tests. However, it is difficult to interpret the outcome when there is no behavioural score change, thus creating the need for objective measurements. In the last decade, studies have used event-related potentials (ERPs) to study emotion recognition. There is evidence that ERPs are an important objective measurement of auditory emotional prosody differentiation and recognition (Pinheiro et al., 2011). ERP differences across emotions can be found as early as 200 ms after stimulus onset during both visual and auditory emotional processing (e.g., Schapkin et al., 2000; Agrawal et al., 2012; Kotz et al., 2006). The traditional ERP methodology reveals the phase locked (evoked) neural activities evoked by a particular cognitive process. Time-frequency (TF) analysis, on the other hand,

30 Chapter 3. Manuscript II can reveal the non-phase-locked (induced) neural activity that is hidden in standard averaged ERPs. In particular, we hypothesised that non-phase-locked brain activity should yield complementary effects not observable in the classic ERPs reported above (Makeig et al., 2004; Tallon-Baudry et al., 1999). Additionally, non-phase locked brain activity in the gamma-band range should indicate facilitated prosody recognition modulated by strategy (Hannemann et al., 2007; Lenz et al., 2007). However, although frequency-domain (spectral) analysis has been applied to auditory ERPs by various researchers in memory and speech perception paradigms (Fuentemilla et al., 2006; Muller et al., 2009), it is yet to be used in emotional prosody analysis in CI users. The present study was designed to understand more clearly the differences between CI users and NH participants, particularly for recognising emotional prosody. We hypothesised that prosody recognition in the auditory domain of NH individuals would be superior to that of participants with CIs. Based on differences between two speech coding strategies and the advantage of MP3000 strategy in improving spatial recognition cues, we further hypothesised that MP3000 might perform better compared with the ACE strategy in identifying prosody. In the current study, prosodic information was presented by using neutral, angry and happy tones of voice. Differences were assessed for the behavioural responses, the ERPs as well as TF analysis. The present study is especially important given the dearth of empirical studies on emotional prosody recognition with CI devices. 3.3 Materials and methods 3.3.1 Participants Forty right-handed native German speakers (22 females, 18 males), aged 25 60 years (mean = 41.5 years, SD = 7) participated in the experiment. The first group of participants (Group I) consisted of 20 CI users (mean = 42.1 years, SD = 7.01) wearing a Nucleus CI system, as detailed in Table 3.1. Subjects had used their implants continuously for at least 12 months and had at least 20% speech in noise perception scores on the Oldenburg sentence test (Wagener

3.3. Materials and methods 31 et al., 1999) prior to the study. Furthermore, subjects were divided into two subgroups with the aim of comparing two speech-coding strategies (ACE vs. MP3000). The first group of participants (Group IA) consisted of ten individuals (mean age of 42.1, SD = 8.2) with an ACE strategy as their default while their speech processors were programmed with MP3000 strategy for the experiment. Similarly, the remaining ten participants in (Group IB) were CI users (mean = 41.1 SD = 7.3) with MP3000 as their default and ACE as the experimental strategy. A control group (Group II) comprised age and gender-matched NH participants (age range: 25 55 years; mean = 41 years, SD = 7.1). All participants had normal intelligence and reported no history of psychological or neurological problems. In order to test for depression, Beck s Depression Inventory (BDI) was used (Beck et al., 1996). None of the subjects had clinically-relevant symptoms of a depressive episode. The study was carried out in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of the Hannover Medical School. All participants gave written informed consent prior to the recording and received monetary compensation for their participation. 3.3.2 Stimuli The stimulus material consisted of 150 semantically-neutral German sentences with neutral, happy and angry prosody (50 each) spoken by a trained female German speaker. Stimuli were recorded with a sampling rate of 44.1-kHz and a 16-bit digitiser (Kotz et al., 2003; Kotz and Paulmann, 2007; Wittfoth et al., 2010). All sentences started with personal pronouns (For example, Sie hat die Zeitung gelesen ; She has read the newspaper ). The stimulus material was analysed prosodically using Praat 5.1.19 (Boersma, 2005). Table 3.2 details the differences in the fundamental frequency (F0), intensity and duration of the sentences extracted. Differences in the pitch contours are depicted in Figure 3.1.

32 Chapter 3. Manuscript II Table 3.1: Demographic data of CI users Demographic details Patients Age Gender Duration of CI use Type of strategy 1 39 M 3 yrs ACE 2 51 F 3 yrs ACE 3 35 F 3 yrs ACE 4 48 M 4 yrs ACE 5 42 F 3 yrs ACE 6 49 M 2 yrs ACE 7 47 F 2 yrs ACE 8 29 F 3 yrs ACE 9 52 M 4 yrs ACE 10 56 M 5 yrs ACE 11 35 F 3 yrs MP3000 12 43 F 4 yrs MP3000 13 32 M 2 yrs MP3000 14 42 M 2 yrs MP3000 15 39 F 3 yrs MP3000 16 46 F 4 yrs MP3000 17 42 F 3 yrs MP3000 18 48 M 3 yrs MP3000 19 40 M 4 yrs MP3000 20 41 F 3 yrs MP3000 3.3.3 Procedure Testing was carried out in a sound-treated chamber. Subjects were seated in a comfortable armchair facing a computer monitor placed at a distance of one metre. The stimuli were programmed with Presentation Version 14.1 (Neurobehavioural systems) and were presented in a random order via loudspeakers positioned to the left and right of the monitor at a listening level which participants indicated to be sufficiently audible. The task was a threechoice measure with each of the three emotions corresponding to one of the response keys on the response box. Stimuli were presented at a fixed presentation rate with an inter-trial interval of 2500 ms. Participants were asked to decide as accurately as possible whether the presented sentence was spoken in neutral, happy, or angry prosody. The matching of buttons to responses was counterbalanced across subjects within each response group. The

3.3. Materials and methods 33 Figure 3.1: Pitch contours of the three prosodies Praat generated pitch contours of neutral (dotted line), angry (solid line) and happy prosody (dashed line) for the original (unsimulated) sentence: Sie hat die Zeitung gelesen. The maximum difference in pitch across emotions can be seen between 200 to 1200 ms from the start of the sentence. experiment included two randomised runs of approximately 13 minutes each. CI users had their speech processors programmed for ACE and MP3000 strategies. The MP3000 map was optimised to ensure that the overall loudness was similar to that of the ACE map. The study was cross-over in design, hence, for CI users of group (IA), run A involved the use of their conventional speech coding strategy (ACE), whereas in the second run, run B, they used the experimental strategy (MP3000). Similarly for CI users of group (IB), run A entailed the use of their conventional strategy and run B the experimental strategy. To control for familiarity effects the runs were counterbalanced across subjects. Only the responses given after the completion of a sentence were included in later analyses. 3.3.4 ERP procedure and analysis The EEG signals were acquired using Brain Vision amplifier system (BrainProducts, Germany, www.brainproducts.de). Thirty electrodes were attached to an elastic cap (BrainProd-

34 Chapter 3. Manuscript II Table 3.2: Acoustic parameter of emotional sentences Strategy Stimulus Mean Duration Mean F0 Mean Intensity (sec.) (Hz.) (db) Original Neutral 1.60(0.3) 157.0 (23.0) 68.6 (1.0) Angry 1.70 (0.3) 191.5 (25.0) 70.0 (0.9) Happy 1.80 (0.4) 226.6 (24.6) 67.3 (0.9) ucts, Germany) at 10 20 positions (Jasper and Radmussen, 1958) and were referenced to the tip of the nose. In order to control for horizontal and vertical eye movements, a unipolar electro-oculogram was recorded using two electrodes: one placed at the outer canthus, and one below the right eye. The impedances of the active electrodes were kept below 10 KΩ. EEG and electro-oculograms were analogue filtered (0.1 100 Hz), and recorded with a sampling rate of 250 Hz. The EEG was recorded continuously on-line and stored for off-line processing. 3.3.5 Data processing 3.3.5.1 (a) preprocessing and artefact rejection The recorded brain activity was analysed offline using EEGLAB (Delorme and Makeig, 2004) open source software (version 9.0.4.5s) running under the MATLAB environment. The data were band-pass filtered from 1 to 30 Hz for ERP peak amplitude computations and 1 to 100 Hz for TF analysis. Trials with non-stereotypical artefact exceeding three standard deviations of an inbuilt probability function (jointprob.m) were removed. ICA was performed with the Infomax ICA algorithm on the continuous data (Debener et al., 2010; Debener, 2010) with the assumption that the recorded activity is a linear sum of independent components (ICs) arising from brain and non-brain artefact sources. For systematic removal of components representing ocular and cardiac artefact the EEGLAB plug-in CORRMAP (Viola et al., 2009), enabling semi-automatic component identification, was used. After artefact attenuation by back-projection of all but the artifactual ICs, the clean data were inspected for CI related artefacts. Furthermore, ICA topographies representing CI artefacts were identified by the

3.3. Materials and methods 35 centroid on the side of the implanted device and by the CI pedestal in the time course of the respective component, and were removed. For the CI users, two to three electrodes were removed to avoid direct contact with the CI electromagnetic coil attached to the mastoid. These CI missing channels were spherically interpolated. After preprocessing the mean number of artefact-free data epochs that were eventually available across subjects did not differ significantly between emotions. 3.3.5.2 (b) peak analysis The cleaned data were selectively averaged for each emotion condition from the onset of the stimulus, including a 200 ms pre-stimulus baseline, with an 1800 ms ERP time window. The auditory N1 was identified as the negative peak between 100 and 200 ms, the P 200 component as the positive peak in the 200 to 300 ms time window and the late positive peak (LPP) was identified as broad positive peak in the 500 to 1200 ms time window. In all cases, the baseline-to-peak value was taken as the magnitude of the response. Visual inspection of grand-average waveforms showed that the distribution of ERP effects was predominantly fronto-central. Therefore, only Cz was selected for further analysis. Grand averages for each condition were computed by averaging the single-subject ERP averages for each emotion. 3.3.5.3 (c) time-frequency analysis TF analysis of single trial data was performed using EEGLAB with the inbuilt function newtimef (Makeig et al., 2004). In order to decompose the signal in the time and frequency domain wavelet analysis was used. Epoched data were transformed into two-dimensional TF planes by convolving the waveforms with a Morlet wavelet at a width of three cycles for low frequency, increasing to fifteen cycles for high frequency. The percentage of power change in each window relative to the power in the baseline (from -200 to 0 ms pre-onset of the stimulus) was calculated. Difference plots allowed TF representations across two conditions to be compared. Induced power was derived by averaging single-trial power across trials (Tallon- Baudry et al., 1996). Evoked power was derived from the TF representation of the averaged

36 Chapter 3. Manuscript II signal. The TF analysis spanned the frequency bands of theta (4 8 Hz), alpha (9 15 Hz), beta (16 34 Hz), and gamma (35 60 Hz). In order to examine the spatial distribution in each frequency band, values were averaged between 0 ms and 1200 ms in selected time windows. Based on both visual inspection and analyses of power changes, windows that showed similar patterns of effects were clustered. This resulted in an early (0 400 ms) and a late (600 1200 ms) time window. On the basis of the acoustic analysis of prosodies, differences were found predominantly in the late window. In order to increase statistical power, and in agreement with previous work (Paulmann et al., 2011), the statistical analysis was limited to the fronto-central scalp region, where the early evoked gamma response could easily be identified. Obtained gamma power values were subjected to statistical analysis. 3.3.6 Statistical Analysis The statistical analysis of this crossover design is based on the intention-to-treat population including all randomised patients. The statistical evaluation of the accuracy score, reaction time and ERPs was performed (SPSS 14.0, SPSS Inc. Chicago, Illinois, USA) using a repeated-measure ANOVA with independent variables Prosody (neutral, angry, happy) and Subject group (NH, CI). For comparisons of strategy, ANOVA with repeated measures was performed with factors, Prosody (neutral, angry, happy), Strategy (ACE, MP3000), Shift (default, experimental) and Runs (run A run B), to rule out any adaptation effect. To assess the significance of TF results repeated-measure ANOVA with factors Prosody (neutral, angry, happy) and Strategy (ACE, MP3000) was performed for early as well as late time window. Finally, to assess the relationship between oscillatory power and behavioural scores, a Pearson correlation analysis was performed comparing gamma band power with accuracy rate. All hypotheses were tested to a significance level of 5% (two-sided). In order to correct for sphericity violation the Greenhouse-Geisser correction was used in relevant cases.

3.4. Results 37 3.4 Results 3.4.1 Accuracy data The group means of the performance accuracy scores (% correct) are depicted in Figure 3.2. An ANOVA revealed a significant main effect of group [F (1, 38) = 53.32, p = 0.001], showing that performance of NH controls (97 +/- 1%) was better than that of CI users (67+/- 18%) by an average of 30% (collapsed across prosody). However, differences in accuracy rates between NH controls and CI users for each emotion separately could not be observed. The comparison of performance across strategies revealed a significant main effect of strategy [F(1, 38) = 5.156, p = 0.029], demonstrating that performance accuracy was higher in MP3000 (72+/- 17%) compared to ACE (61+/-15%) by an average of 9% (collapsed across prosody). A significant interaction between strategy and prosody was also observed [F(1, 38) = 18.659, p = 0.001]. Follow up paired t-test revealed higher accuracy for happy prosody recognition with MP3000 [t(19) = 3.164, p = 0.005] compared to ACE. However, the accuracy scores between the two strategies were comparable for angry and neutral prosody. No other significant differences could be observed. Note that half of the subjects used ACE as their default strategy and MP3000 as experimental while the other half used MP3000 as their default strategy and ACE as experimental. Our results revealed no significant difference with regard to the accuracy rates [p >0.050] between the groups, indicating that the subjects with default ACE were comparable to subjects with experimental ACE. Similar results were obtained for MP3000 strategy, suggesting that effects were not biased by duration of use. 3.4.2 Reaction times The group means of the reaction time are depicted in Figure 3.3. Analysis of the reaction time data revealed a significant main effect of group [F(2,18) = 9.090, p = 0.001], showing that reaction time of CI users (840 ms) was longer compared to that of NH controls (520 ms) by an average of 320 ms (collapsed across prosody). Furthermore, a significant main effect

38 Chapter 3. Manuscript II Figure 3.2: Accuracy rate accuracy scores for NH controls vs. CI, ACE users vs. MP3000 users for neutral, angry and happy emotional prosody recognition in percentage. of prosody [F(2, 38) = 14.337, p = 0.001] was found, although there was no interaction. Breakdown analysis of the main effects revealed that reaction time in recognition of happy (480 ms), [ t(39) = 3.418, p = 0.020] and angry (500 ms), [t(39) = 3.536, p = 0.022] prosody were significantly shorter than those of neutral prosody (600 ms). No significant difference was found between the happy and angry prosody. Similarly, on prosody recognition in CI users, a significant main effect of prosody was observed [F(2, 38) = 16.315, p = 0.001]. Breakdown analysis showed that subjects took less time to recognise happy (820 ms) [t(39) = 5.081, p = 0.001] and angry (810 ms) [t(39) = 2.672, p = 0.011] prosody compared to neutral prosody (910 ms), collapsed across strategies. Data yielded no significant difference in reaction times between happy and angry prosody. There was no significant interaction between strategy used and prosody recognition. The same procedure described above was used to check if there was an effect of default vs. experimental strategy use. Analysis revealed no significant differences between the two [p >0.05] either for ACE or for MP3000.

3.4. Results 39 Figure 3.3: Reaction Time post offset reaction time for NH controls vs. CI, ACE users vs. MP3000 users for neutral, angry and happy emotional prosody recognition in seconds. 3.4.3 Event-related potentials Figure 3.4 depicts the ERP waveforms for the different subject groups across three emotional prosodies. Mean latencies and amplitudes for N100 and P 200 peaks are presented in Table 3.3 3.4.3.1 N 100 An ANOVA on N100 latency revealed significant differences between the NH and CI users [F(1, 38) = 6.080, p = 0.002]. We found no significant main effect of prosody and no interaction between prosody and subject group. The comparison of strategy revealed no significant effect of prosody. There was no significant interaction between strategy and prosody. For the amplitude analysis, ANOVA revealed significantly greater N100 amplitude for NH controls compared with CI users, [F(1, 38) = 6. 378, p = 0.003]. However, the interaction between prosody and group did not reach significance. The comparison of strategies showed that there was no significant main effect of prosody or strategy, and no interaction.

40 Chapter 3. Manuscript II 3.4.3.2 P 200 A significant main effect of subject group [F(1, 38) = 20.907, p = 0.001] was observed. Pairwise comparisons revealed that P 200 latency was shorter in NH controls compared to CI users by an average of 35 ms (collapsed across prosodies). No significant main effect of prosody was observed, and there was no group-by-prosody interaction. The comparison of strategies revealed no significant effects on P 200 latency measures. A significant group effect [F(1, 38) = 28.245, p = 0.001] indicated a reduced P 200 peak amplitude in CI users compared to NH controls (by an average of 3.8μV collapsed across prosodies). On comparison of strategies, a significant effect of strategy use [F(1, 18) =12.395, p = 0.006] was observed, indicating that P 200 peak amplitude was significantly larger in MP3000 users compared to ACE. Follow up t-tests revealed that MP3000 strategy use yielded larger amplitude for happy prosody [t(19) = 4.240, p = 0.001] and for neutral [t(19) = 2.240, p = 0.037] compared with ACE, but not for angry prosody. Table 3.3: Peak N100 and P 200 mean latency as well as amplitude with standard deviations in parenthesis N 100 peak latency (ms) Subjects Neutral Angry Happy Control (normals) 137 (11.5) 138 (13.5) 140 (10.1) ACE 145 (22.0) 156 (23.4) 151 (22.0) MP3000 154 (22.0) 155 (22.2) 154 (21.8) N100 peak amplitude (μv) Control (normals) 3.90 (1.8) 3.90 (1.5) 4.00 (1.9) ACE 2.31 (1.3) 2.82 (1.3) 2.60 (1.2) MP3000 2.81 (1.8) 2.50 (1.3) 2.55 (1.4) P 200 peak latency (ms) Subjects Neutral Angry Happy Control (neutrals) 240 (16.6) 240 (20.2) 234 (10.0) ACE 259 (25.0) 270 (26.1) 270 (24.1) MP3000 262 (25.0) 271 (25.6) 271 (25.2) P 200 peak amplitude (μv) Control (neutrals) 5.90 (1.5) 6.00 (1.5) 6.20 (1.8) ACE 2.23 (1.2) 2.38 (0.9) 2.34 (0.9) MP3000 2.21 (1.1) 2.31 (0.8) 2.81 (0.9)

3.4. Results 41 Figure 3.4: ERP waveforms for three emotional prosodies for NH controls, Ace users and MP3000 users Average ERP waveforms recorded at the Cz electrode for NH controls, ACE users and MP3000 users for all three emotional [neutral (black), angry (red) and happy (blue)] stimuli from 100 ms before onset to 600 ms after the onset of the sentences with respective scalp topographies at N 100 and P 200 peak (X-axis: latency in milliseconds, Y-axis: amplitude in 1 V). Top left: N100-P 200 2 waveform for NH controls. Middle: waveform for ACE users and right for MP3000 users. Bottom left scale topography of NH controls, middle: ACE users, right: MP3000 users for N 100 and P 200 respectively.

42 Chapter 3. Manuscript II 3.4.4 Time-frequency analysis Mean evoked and induced powers were calculated in two time windows (0 400 ms) and (600 1200ms) for four frequency bands: theta (4 8 Hz), alpha (9 15 Hz), beta (16 34 Hz), and gamma (35 60 Hz). There were no significant differences in either alpha or beta activity. 3.4.4.1 Theta Baseline-corrected spectral theta power evoked by prosody showed a clear peak in the early and late time windows. Statistical comparison of differences across groups revealed a significant main effect of group in the early window [F(1, 19) = 9.779, p = 0.003]. Theta activity was larger in NH controls compared to CI users. However, the main effect of prosody did not reach significance. The comparison of theta power across strategies revealed no significant differences in the early time window. In the late window, a significant main effect of strategy was observed [F(1, 18) =10.02, p =0.005], reflecting higher theta power when participants used the MP3000 strategy. However, the main effect of prosody did not reach significance. 3.4.4.2 Gamma Figures 3.5 depicts the baseline corrected TF plots for induced gamma activity at Cz for CI users. Recurring regions of gamma enhancement were observable in the TF plots. Bursts of gamma activity after stimulus onset were found for induced gamma activity. ANOVA revealed that in NH controls induced gamma activity was significantly larger compared to CI users irrespective of the prosody studied for early [F(1, 18) = 6.20, p =0.005] as well as late time window, [F(1, 18) = 5.312, p =0.010]. Furthermore, no interaction was observed between prosody and groups. On comparison across strategies for induced gamma activity in the early time window, a significant main effect of strategy [F(1, 38) = 8.172, p=0.020] was observed. In this window induced gamma power showed a prominent peak in a time window from 180 to 250 ms after stimulus onset in MP3000 users, whereas ACE users showed almost no gamma-band change,

3.5. Discussion 43 reflecting lack of prosody recognition. The interaction between prosody and strategy was also significant [F(1, 38) = 3.779, p=0.042]. Follow up comparisons indicated that gamma activity induced by happy prosody was significantly stronger for MP3000 users [t(19) = 2.789, p = 0.021] compared to ACE. The neutral and angry prosody did not show any significant difference. Similarly, in the later window, induced gamma activity increased along with the increase in the prosody recognition ability. MP3000 users displayed higher induced gamma power compared to ACE [F(1, 38) = 8.881, p=0.020]. This activity peaked at 1050 ms after stimulus onset. A significant interaction of strategy and prosody was observed [F(1, 38) = 4.241, p=0.033]. Follow up t-tests revealed that gamma activity induced by happy prosody for MP3000 was stronger than that induced by ACE, [t(19) = 2.430, p = 0.025], but not by neutral and angry prosody. No other comparisons yielded significant results. Taken together, the results in both the early and late time windows revealed that ACE and MP3000 users differed significantly, in particular in showing that induced gamma power was more pronounced for MP3000 users compared to ACE users in response to happy prosody. 3.4.5 Correlation between accuracy rate and gamma band activity For the correlation analysis, NH individuals showed a significant positive correlation between gamma power and the accuracy rate [r = 0.636; p = 0.019], thus high induced gamma power was associated with better prosody recognition. Similar observations were made for the CI group [r = 0.642, p=0.036]. 3.5 Discussion In this cross over study, significant differences were found between emotion recognition across subjects reflected by behavioural and electrophysiological measures. Comparisons between CI and NH listeners indicated that CI users had difficulty recognising prosody. This could

44 Chapter 3. Manuscript II Figure 3.5: Induce gamma power plots for ACE users and MP3000 users The average gamma power computed at Cz electrode for ACE users and MP3000 users for all three emotional prosodies (neutral, angry and happy) for ( 200 1500) time window in range of 25 to 70 Hz. Highlighted area along with scalp plots represent induced gamma power in early (0 400 ms) and in late (600 1200 ms) time windows at bootstrap significance (p < 0.01).

3.5. Discussion 45 be attributed to the limited dynamic range of an implant by which subjects have limited access to pitch, voice quality, vowel articulation and spectral envelope cues; features that are thought to be essential for emotional voice recognition. Although CI users were poor in their recognition of emotional prosody, they performed above chance suggesting that they are able to perceive some essential cues. 3.5.1 Behavioural findings In general, all participants took longer to recognise neutral prosodic sentences compared with sentences spoken in angry or happy prosody. A possible explanation might be that in the emotional judgment of prosody the non-ambiguous emotional associations are readily available, resulting in faster recognition. In contrast, neutral stimuli may elicit positive or negative associations which otherwise may not exist (Grandjean et al., 2008). Here, the reaction times simply reflect a longer decision time for neutral prosody. The happy prosody was recognised with the highest accuracy compared with the other two prosodies. There is evidence to suggest that negative stimuli are less expected and take more effort to process compared with positive stimuli, hence happy emotions are more socially salient (Johnstone et al., 2006; Lang and Bradley, 2009). Thus the high accuracy might be due to the social importance of happy emotions and the additional pitch cues. Moreover, NH subjects listening to CI simulations were significantly less accurate and took more time to recognise emotional prosodic information compared original stimuli, as observed in our recent simulation study (Agrawal et al., 2012). 3.5.2 ERPs In agreement with the behavioural results, CI users exhibited prolonged ERP latencies and reduced amplitudes compared with NH controls. This finding is supported by previous research (Luo and Fu, 2007), which has reported that the amount of sensory information processed through a CI is less compared to an intact cochlea, resulting in reduced synchronisation of neuronal activity required in generating auditory evoked potentials (Groenen et al., 2001;

46 Chapter 3. Manuscript II Sandmann et al., 2009). Although CI users had prolonged latency and reduced amplitude of ERPs, the structure of these potentials was similar to those recorded from NH controls. This implies that, despite the limited input provided by CIs, the central auditory system processes the prosodic stimuli consistently in a relatively normal fashion (Koelsch et al., 2004). Furthermore, the amplitude of the P 200 component in MP3000 users was larger compared to ACE. This could be due to the fact that MP3000 strategy avoids repetitive stimulation of neuronal ensembles (Buechner et al., 2008), by selecting components that are more widely dispersed across the spectrum to avoid clustering of stimulated channels (Nogueira et al., 2005). Thus, bands selected by the psychoacoustic model in MP3000 extracts the most meaningful components of the audio signal based on normal cochlear physiology. Due to this, only the relevant components are transferred and the redundant components are masked. Since this strategy is based on a normal-hearing principle, the extraction of fundamental (F0) frequency cues fares better compared with ACE, resulting in an improved recognition of happy prosody in the present study. The P 200 component reflects the initial encoding of the emotion (Balconi and Pozzoli, 2003). Previous studies have reported that emotional stimuli elicit larger ERP waveforms than neutral stimuli; most frequently as early as 200 ms after stimulus onset (Vanderploeg et al., 1987). This initial emotional encoding seems to be particularly influenced by pitch and intensity variations and is known to be influenced by stimulus pitch (Pantev et al., 1996). The current results concur with the available literature, especially for the happy prosody where pitch variations are largest (Table 2, acoustic parameters). 3.5.3 Time-frequency results To our knowledge, our study is the first attempt to show that the processing of emotional prosody in CI users can be distinguished in both early and late stages of brain oscillations during prosody recognition. Our results support the view that frequency-specific EEG responses differ from each other systematically as a function of the stimulus type (i.e. emotional vs. neutral). Theta and gamma bands showed a significant power increase with the emo-

3.5. Discussion 47 tional stimuli, whereas alpha and beta frequencies were not modulated by the prosody; an observation which is in line with previously reported effects (Aftanas et al., 2001a; Aftanas et al., 2004; Knyazev et al., 2009). Overall spectral power was larger for NH controls compared to CI users, as evidenced from ERP results. This is consistent with the idea that NH subjects have a better hearing resolution than CI users and that stimulus presentation is well received. On the other hand, the larger gamma band power in CI users could be indicative of additional neural activation in CI users to compensate for less efficient processing during extraction of stimulus features and integration of the perceived sensory input. Imaging studies support this assumption by demonstrating that task-related compensatory activations are higher in response to developmental deviations compared to normal maturational processes (Durston et al., 2003; Sheridan and Hausdorff, 2007). In regard to the comparisons of CI strategies used in the present experiment, MP3000 users showed significantly larger power at theta frequencies compared to ACE users. These findings are in line with the literature, where it has been observed that there is a general tendency for higher theta along with an increased motivation and emotional significance of the stimuli (Aftanas et al., 2001a; Aftanas et al., 2001b). This clearly explains the higher theta power for emotional stimuli, particularly happy emotions in MP3000 compared to ACE users suggesting that slow cerebral oscillations are suitable to study the processes related to emotion recognition. Observed differences can be attributed to the psychoacoustic model in MP3000 strategy. This algorithm increases the dynamic range, thereby improving the pitch as well as finer feature recognition. Two typical peaks in gamma activity were observed for prosody recognition in both the early and the late time windows. The early gamma activity occurred at about 100-250 ms after stimulus onset (induced), while the late peak commenced after 600 ms post-stimulus (induced). In the present study, MP3000 users showed significantly higher power in the gamma band for the recognition of emotions compared to ACE users. MP3000 users showed the highest power for happy emotion recognition and the lowest power for neutral stimuli. In

48 Chapter 3. Manuscript II contrast, ACE users displayed higher power for neutral compared to happy prosody. There is strong evidence that gamma power in the initial 250 ms is a correlate of sensory processing (Karakas and Basar, 1998; Busch et al., 2004) reflecting bottom-up processes driven by stimulus features such as loudness (Schadow et al., 2007) and pitch (Sedley et al., 2012). There is also evidence that early gamma activity represents an interface of bottom-up factors and topdown processes (Busch et al., 2006). Thus, it implies that in the current study, the MP3000 strategy encoded the physical characteristics better than ACE, which in turn improves the prosody recognition especially for happy prosody that has maximum pitch modulation. Similarly, increased induced gamma band power was also observed in the late time window of 600 to 1200 ms, peaking at 1050 ms. It seems likely that the second burst of long-latency gamma synchrony reported in the late time window may be related to the maintenance or refinement of the attention network established by the first burst. Thus, the selection of certain neuronal groups for integration into a large-scale synchronous gamma network accounts for the increased integration of attended information. Similarly, in the late time window the variation of acoustic features was at its peak. This suggests that these features were well received and coded by the brain to differentiate emotions effectively. MP3000 users had the higher gamma power during happy prosody recognition, whereas ACE users had the lowest power. Since previous studies have shown a similar gamma-band power increase locked to visual stimuli conditioned to emotional pictures (Stolarova et al., 2006), the current results support a specific role of gamma activity (35 60 Hz) for emotionally triggered functional activation states, which seem to be irrespective of the sensory modality (Stolarova et al., 2006). High-frequency components, such as the gamma-band activity, have been associated with local computations and the binding of fine-structured information (Tallon-Baudry et al., 1999), whereas low frequency components, such as theta, involve more global computations and are possibly amodal (Yordanova et al., 2002). Our results suggest that a more analytic ( local ) processing mode was engaged in the emotion identification with MP3000, compared to ACE. The burst of gamma observed in perceptual testing originates from a transient synchrony of neural populations involved in

3.5. Discussion 49 the processing of a sensory event, i.e. the stimulus enters the realm of conscious recognition (Hopfield and Brody, 2000; Mukamel et al., 2005). The positive correlation between gamma activity and neuronal spiking is a robust finding reported in aforementioned studies. Thus, the MP3000 strategy leads to a higher neuronal firing rate compared to ACE, which might have improved the identification of the prosody with this strategy. Gamma band responses also show a direct relation to task demands (Yordanova et al., 1997; Senkowski and Herrmann, 2002). Hence, it seems reasonable to assume higher activation of processing resources in MP3000 users for task-relevant stimuli, reflected by enhanced activity in the gamma-range. Furthermore, the high positive correlation of accuracy rate and gamma band power reflected an advantage of MP3000 over ACE on prosody recognition. These findings are in line with previous reports in which an association between power in gamma frequencies and response accuracy was more pronounced in good performers compared with bad performers. This difference was particularly manifested around the onset of the test stimuli (Kaiser et al., 2008). Thus the results suggest the relevance of gamma power for optimal differentiation between stimulus characteristics. There are some limitations that should be taken into consideration while making these conclusions. First, only prosodies recorded by one speaker were used, which constrained the generalisation of the findings. Second, only three emotions were intoned via the sentences. Despite these limitations the results are very promising and reflect the advantage of using ERP and TF analysis in evaluating CI users abilities that may not be evident in behavioural measures. Taken together, we have shown that subjects using the ACE strategy had difficulty perceiving emotional stimuli compared with MP3000 users. These difficulties were reflected in behavioural scores, ERPs, and TF measures. It was observed that subjects with ACE had to concentrate more to distinguish the emotions compared to NH and MP3000 users. High frequency oscillations typically show lower amplitudes compared to low frequency oscillations and therefore are not visible in ERPs and would not have contributed to the peak amplitudes (Edwards et al., 2005). Thus, these findings lend strong support to the hypothesis that macroscopically visible gamma band activity is functionally relevant for prosody recognition,

50 Chapter 3. Manuscript II even in CI users. Acknowledgements We thank all participants for their support and their willingness to be a part of this study. Funding information This research was supported by the grants from the Georg Christoph Lichtenberg Stipendium of Lower-Saxony, Germany and partially supported by the Fundacao para a Ciencia e Tecnologia, Lisbon, Portugal (SFRH/BD/37662/2007) to F.C.V.

3.6 References Aftanas L, Varlamov A, Pavlov S, Makhnev V, Reva N. Event-related synchronization and desynchronization during affective processing: emergence of valence-related timedependent hemispheric asymmetries in theta and upper alpha band. Int J Neurosci. 2001a;110(3-4):197-219. Aftanas LI, Reva NV, Savotina LN, Makhnev VP. [Neurophysiological correlates of induced discrete emotions in humans: an individual analysis]. Ross Fiziol Zh Im I M Sechenova. 2004;90(12):1457-71. Aftanas LI, Varlamov AA, Pavlov SV, Makhnev VP, Reva NV. Affective picture processing: event-related synchronization within individually defined human theta band is modulated by valence dimension. Neurosci Lett. 2001b;303(2):115-8. Agrawal D, Timm L, Viola FC, Debener S, Büchner A, Dengler R, et al. ERP evidence for the recognition of emotional prosody through simulated cochlear implant strategies. BMC Neuroscience. 2012;In press.

Balconi M, Pozzoli U. Face-selective processing and the effect of pleasant and unpleasant emotional expressions on ERP correlates. Int J Psychophysiol. 2003;49(1):67-74. Beck AT, Steer RA, Ball R, Ranieri W. Comparison of Beck Depression Inventories -IA and -II in psychiatric outpatients. J Pers Assess. 1996;67(3):588-97. Boersma P, Weenink D. Praat: doing phonetics by computer. 2005. Buechner A, Brendel M, Krueger B, Frohne-Buchner C, Nogueira W, Edler B, et al. Current steering and results from novel speech coding strategies. Otol Neurotol. 2008;29(2):203-7. Busch NA, Debener S, Kranczioch C, Engel AK, Herrmann CS. Size matters: effects of stimulus size, duration and eccentricity on the visual gamma-band response. Clin Neurophysiol. 2004;115(8):1810-20. Busch NA, Schadow J, Frund I, Herrmann CS. Time-frequency analysis of target detection reveals an early interface between bottom-up and top-down processes in the gamma-band. Neuroimage. 2006;29(4):1106-16. Debener S, Thorne J, Schneider TR, Viola FC, editors. Using ICA for the analysis of multi-channel EEG data. New York: Oxford University Press; 2010.

Debener S, Thorne, J., Schneider, T.R. & Viola, F.C. Using ICA for the analysis of multi-channel EEG data. In: Debener MUS, editor. Simultaneous EEG and fmri. New York, NY: Oxford University Press; 2010. p. 121-35. Delorme A, Makeig S. EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis. J Neurosci Methods. 2004;134(1):9-21. Durston S, Davidson MC, Thomas KM, Worden MS, Tottenham N, Martinez A, et al. Parametric manipulation of conflict and response competition using rapid mixed-trial event-related fmri. Neuroimage. 2003;20(4):2135-41. Edwards E, Soltani M, Deouell LY, Berger MS, Knight RT. High gamma activity in response to deviant auditory stimuli recorded directly from human cortex. J Neurophysiol. 2005;94(6):4269-80. Fu QJ, Chinchilla S, Nogaki G, Galvin JJ, 3rd. Voice gender identification by cochlear implant users: the role of spectral and temporal resolution. J Acoust Soc Am. 2005;118(3 Pt 1):1711-8. Fuentemilla L, Marco-Pallares J, Grau C. Modulation of spectral power and of phase resetting of EEG contributes differentially to the generation of auditory event-related potentials. Neuroimage. 2006;30(3):909-16.

Grandjean D, Sander D, Scherer KR. Conscious emotional experience emerges as a function of multilevel, appraisal-driven response synchronization. Conscious Cogn. 2008;17(2):484-95. Groenen PA, Beynon AJ, Snik AF, van den Broek P. Speech-evoked cortical potentials and speech recognition in cochlear implant users. Scand Audiol. 2001;30(1):31-40. Hannemann R, Obleser J, Eulitz C. Top-down knowledge supports the retrieval of lexical information from degraded speech. Brain Res. 2007;1153:134-43. Hopfield JJ, Brody CD. What is a moment? "Cortical" sensory integration over a brief interval. Proc Natl Acad Sci U S A. 2000;97(25):13919-24. House WF. Cochlear implants: it's time to rethink. Am J Otol. 1994;15(5):573-87. Jasper HH, Radmussen T. Studies of clinical and electrical responses to deep temporal stimulation in men with some considerations of functional anatomy. Res Publ Assoc Res Nerv Ment Dis. 1958;36:316-34. Johnstone T, van Reekum CM, Oakes TR, Davidson RJ. The voice of emotion: an FMRI study of neural responses to angry and happy vocal expressions. Soc Cogn Affect Neurosci. 2006;1(3):242-9.

Kaiser J, Heidegger T, Lutzenberger W. Behavioral relevance of gamma-band activity for short-term memory-based auditory decision-making. Eur J Neurosci. 2008;27(12):3322-8. Karakas S, Basar E. Early gamma response is sensory in origin: a conclusion based on cross-comparison of results from multiple experimental paradigms. Int J Psychophysiol. 1998;31(1):13-31. Knyazev GG, Slobodskoj-Plusnin JY, Bocharov AV. Event-related delta and theta synchronization during explicit and implicit emotion processing. Neuroscience. 2009;164(4):1588-600. Koelsch S, Kasper E, Sammler D, Schulze K, Gunter T, Friederici AD. Music, language and meaning: brain signatures of semantic processing. Nat Neurosci. 2004a;7(3):302-7. Koelsch S, Wittfoth M, Wolf A, Muller J, Hahne A. Music perception in cochlear implant users: an event-related potential study. Clin Neurophysiol. 2004b;115(4):966-72. Kotz SA, Meyer M, Alter K, Besson M, von Cramon DY, Friederici AD. On the lateralization of emotional prosody: an event-related functional MR investigation. Brain Lang. 2003;86(3):366-76. Kotz SA, Meyer M, Paulmann S. Lateralization of emotional prosody in the brain: an overview and synopsis on the impact of study design. Prog Brain Res. 2006;156:285-94.

Kotz SA, Paulmann S. When emotional prosody and semantics dance cheek to cheek: ERP evidence. Brain Res. 2007;1151:107-18. Lang PJ, Bradley MM. Emotion and the motivational brain. Biol Psychol. 2009;84(3):437-50. Lenz D, Schadow J, Thaerig S, Busch NA, Herrmann CS. What's that sound? Matches with auditory long-term memory induce gamma activity in human EEG. Int J Psychophysiol. 2007;64(1):31-8. Loizou PC. Signal-processing techniques for cochlear implants. IEEE Eng Med Biol Mag. 1999;18(3):34-46. Luo X, Fu QJ. Frequency modulation detection with simultaneous amplitude modulation by cochlear implant users. J Acoust Soc Am. 2007;122(2):1046-54. Makeig S, Debener S, Onton J, Delorme A. Mining event-related brain dynamics. Trends Cogn Sci. 2004;8(5):204-10. Mukamel R, Gelbard H, Arieli A, Hasson U, Fried I, Malach R. Coupling between neuronal firing, field potentials, and FMRI in human auditory cortex. Science. 2005;309(5736):951-4. Muller V, Gruber W, Klimesch W, Lindenberger U. Lifespan differences in cortical dynamics of auditory perception. Dev Sci. 2009;12(6):839-53.

Nogueira W, Buechner A, Lenarz T, Edler B. A Psychoacoustic "NofM"-type Speech Coding Strategy for Cochlear Implants. Journal on Applied Signal Processing, Special Issue on DSP in Hearing Aids and Cochlear Implants, Eurasip. 2005;127(18):3044-59. Pantev C, Elbert T, Ross B, Eulitz C, Terhardt E. Binaural fusion and the representation of virtual pitch in the human auditory cortex. Hear Res. 1996;100(1-2):164-70. Paulmann S, Ott DV, Kotz SA. Emotional speech perception unfolding in time: the role of the basal ganglia. PLoS One. 2011;6(3):e17694. Pinheiro AP, Galdo-Alvarez S, Rauber A, Sampaio A, Niznikiewicz M, Goncalves OF. Abnormal processing of emotional prosody in Williams syndrome: an event-related potentials study. Res Dev Disabil. 2011;32(1):133-47. Sandmann P, Eichele T, Buechler M, Debener S, Jancke L, Dillier N, et al. Evaluation of evoked potentials to dyadic tones after cochlear implantation. Brain. 2009;132(Pt 7):1967-79. Schadow J, Lenz D, Thaerig S, Busch NA, Frund I, Rieger JW, et al. Stimulus intensity affects early sensory processing: visual contrast modulates evoked gammaband activity in human EEG. Int J Psychophysiol. 2007;66(1):28-36.

Schapkin SA, Gusev AN, Kuhl J. Categorization of unilaterally presented emotional words: an ERP analysis. Acta Neurobiol Exp (Wars). 2000;60(1):17-28. Scherer KR. Vocal communication of emotion: A review of research paradigms. Speech Communication. 2003;40:227-56. Schurmann M, Basar E. Topography of alpha and theta oscillatory responses upon auditory and visual stimuli in humans. Biol Cybern. 1994;72(2):161-74. Sedley W, Teki S, Kumar S, Overath T, Barnes GR, Griffiths TD. Gamma band pitch responses in human auditory cortex measured with magnetoencephalography. Neuroimage. 2012;59(2):1904-11. Senkowski D, Herrmann CS. Effects of task difficulty on evoked gamma activity and ERPs in a visual discrimination task. Clin Neurophysiol. 2002;113(11):1742-53. Sheridan MA, Hinshaw S, D'Esposito M. Efficiency of the prefrontal cortex during working memory in attention-deficit/hyperactivity disorder. J Am Acad Child Adolesc Psychiatry. 2007;46(10):1357-66. Sheridan PL, Hausdorff JM. The role of higher-level cognitive function in gait: executive dysfunction contributes to fall risk in Alzheimer's disease. Dement Geriatr Cogn Disord. 2007;24(2):125-37.

Stolarova M, Keil A, Moratti S. Modulation of the C1 visual event-related component by conditioned stimuli: evidence for sensory plasticity in early affective perception. Cereb Cortex. 2006;16(6):876-87. Tallon-Baudry C, Bertrand O, Delpuech C, Pernier J. Stimulus specificity of phaselocked and non-phase-locked 40 Hz visual responses in human. J Neurosci. 1996;16(13):4240-9. Tallon-Baudry C, Bertrand O, Pernier J. A ring-shaped distribution of dipoles as a source model of induced gamma-band activity. Clin Neurophysiol. 1999;110(4):660-5. Vanderploeg RD, Brown WS, Marsh JT. Judgments of emotion in words and faces: ERP correlates. Int J Psychophysiol. 1987;5(3):193-205. Viola FC, Thorne J, Edmonds B, Schneider T, Eichele T, Debener S. Semi-automatic identification of independent components representing EEG artifact. Clin Neurophysiol. 2009;120(5):868-77. Wagener K, Brand T, Kollmeier B. Entwicklung und Evaluation eines Satztests in deutscher Sprache III: Evaluation des Oldenburger Satztests. Z Audiol. 1999;38(3):86-95.

Wittfoth M, Schroder C, Schardt DM, Dengler R, Heinze HJ, Kotz SA. On emotional conflict: interference resolution of happy and angry prosody reveals valence-specific effects. Cereb Cortex. 2010;20(2):383-92. Yordanova J, Kolev V, Demiralp T. Effects of task variables on the amplitude and phase-locking of auditory gamma band responses in human. Int J Neurosci. 1997a;92(3-4):241-58. Yordanova J, Kolev V, Demiralp T. The phase-locking of auditory gamma band responses in humans is sensitive to task processing. Neuroreport. 1997b;8(18):3999-4004. Yordanova J, Kolev V, Rosso OA, Schurmann M, Sakowitz OW, Ozgoren M, et al. Wavelet entropy analysis of event-related potentials indicates modality-independent theta dominance. J Neurosci Methods. 2002;117(1):99-109.

Chapter 4 Overall Discussions The work presented in this thesis has investigated the prosody (neutral, happy, angry) recognition abilities of NH and CI users. In addition to examining processing of prosody across subject groups, this work sought to: (i) specify the role of different acoustic cues and (ii) evaluate the interaction between prosodic processing abilities and the CI strategies. I. Appropriateness of CI simulations to investigate strategy differences It was of particular interest to clarify the extent to which NH subjects were able to differentiate between the two types of simulation and make a comparative judgment. Thus, the first study (manuscript I) investigated the ability of NH individuals to recognise prosodic stimuli on ACE and MP3000 simulations. To achieve this, simulations were created using vocoders, whereby amplitude cues were preserved while spectral information was removed, enabling the parametric investigation of the role of spectral cues in the processing of prosody. Furthermore, simulations of ACE and MP3000 algorithms were used which differed from each other in several aspects, in particular, the dynamic range. The results of the work presented in this thesis reflect that, although the vocoders degraded the speech signals, it was possible for participants to differentiate between the two strategies. Participants reported MP3000 simulations to be superior to ACE simulations using behavioural as well as ERP measures. This implies that simulations are useful in comparing strategies since they mimic the limited spectral resolutions and unresolved harmonics of speech processing strategies. Thus, since these simulations are analogous to the perceptions through CI, they are the best alternative to compare strategies.

62 Chapter 4. Overall Discussions II. Ability of CI users to perceive emotional prosody After the success of first study, it was necessary to investigate prosody perception in CI users with ACE and MP3000 strategies. Thus, the second study (manuscript II) focused on the ability of CI users to differentiate between prosodies using EEG. Behavioural data revealed significant differences in prosody recognition between NH controls and CI users. Within the CI users, systematic variations were observed in accuracy rate for each emotion. Similarly, a significant P 200 peak was observed in CI users with increased latency and decreased amplitude but with similar morphology to that observed in healthy controls for different prosodies. Similar to the reports in literature (JOHNSON et al. 2009), the findings described in this thesis imply that the central auditory system can process consistently certain aspects of verbal information, independent of whether the stimuli are processed through a healthy cochlea or mediated by a cochlear prosthesis. III. Reliability of ERP measures to explore prosody recognition In contrast to behavioural studies, ERPs allow for the investigation of ongoing stimulus processing at different levels, from perceptual to cognitive stages. Using ERPs makes it possible to determine if the responses to prosodic changes for NH participants and CI users are due to lower-level processing of acoustic stimulus features reflected in obligatory components (such as the adult N1-P2 complex (NAATANEN 1988)). The present work revealed that N 100 and P 200 peaks are rational components reflecting emotional prosody recognition in NH as well as altered hearing individuals through CI. Thus, from the literature studies (PAULMANN et al. 2007; PINHEIRO et al. 2011) and the two studies presented in this thesis, it can be concluded that ERPs are indeed a reliable tool to investigate prosody perception, and that initial peaks clearly reflect the acoustic parameter processing. IV. Acoustic modulation of gamma band response to prosodic stimuli: temporal and topographic characteristics Since ERPs are averaged signals reflecting synchronous neuronal activity, information

63 regarding inter-trial variability is unobtainable, making it impossible to draw reliable conclusions regarding experimental effects and sources of variance (ZHAO et al. 2005). Hence, the second study (manuscript II) applied spectral analysis to explore further the inter-trial variability. It was demonstrated that induced gamma activity in response to affective auditory stimuli is modulated as a function of emotional stimulus properties, both in terms of amplitude and topographical distribution. It was found that the gamma power increased in particular in MP3000 users in both the early and late window during the presentation of happy prosodic stimuli, confirming the concept that gamma activity reflects both the bottom-up and top-down mechanism of emotion recognition. Gamma activity was maximum for recognition of happy prosody, indicating that this phenomenon might be related to fast and rough categorisation of stimuli into angry, happy or neutral stimuli. Similarly, increased induced gamma band power was observed for the late time window. It seems likely that the second burst of long-range gamma synchrony reported in the late time window may be related to the maintenance or refinement of the attention network established by the first burst. According to POEPPEL (2003), perceptual unit formation evolves on a 200 ms time-scale (i.e. the theta range), a time constant characteristic of the syllabic unit of speech, while feature aspects are processed on a 30 ms time-scale (i.e. the gamma range). In the work of this thesis, theta (4-7 Hz, 200 ms period) power was observed to be initially larger, while gamma power (35-55 Hz, 25 ms period) was more pronounced for emotion identification in MP3000. Neural underpinnings for such differences evolve on two different time-scales: a coarse temporal resolution (theta) and a fine temporal resolution (gamma). This difference is reflected in the global power of the theta and gamma bands. Specifically, emotion identification is associated with a higher power in the gamma band while the stimulus differentiation is associated with a higher power in the theta band. As suggested in literature, induced gamma power also shows a direct relation to the vigilance (MAY et al. 1994) as well as voluntary allocation of attention resources (LANDAU et al. 2007). Hence, it seems reasonable to assume that enhanced activity in the gamma range reflects a higher

64 Chapter 4. Overall Discussions activation of processing resources for task-relevant stimuli in MP3000 users. This is an important finding as it (i) shows the relevance of gamma activity for prosody perception performances, and (ii) reflects the role of gamma activity in understanding higher-order prosody recognition mechanisms in CI users. Overall, the findings suggest that the MP3000 strategy indeed has a perceptual advantage over ACE on binding of finer acoustic features important for emotional differentiation. V. MP3000 vs ACE for prosody perception The work presented in this thesis was designed as a systematic comparison of two speech coding strategies in post-lingual hearing impaired adult CI users. The intention was to investigate possible differences in performance between strategies and to determine the optimal strategy for the majority of patients based on objective measures of EEG. The comparison of emotion recognition with ACE and MP3000 strategies showed significant differences in performance. The best results were obtained with MP3000, as reflected by improved accuracy and larger EEG amplitude. Thus, these results can be regarded as a reliable predictor of performance superiority using the MP3000 strategy in the selected participant group. Furthermore, the MP3000 strategy is advantageous to ACE in emotion prosody perception, reflecting that larger gamma power amplitude may favour a strategy that has a wider spectral resolution. This increased dynamic range is a result of application of the psychoacoustic masking model, which in turn masks the redundant information, allowing the relevant signal to pass through. Consequently, based on evidence in this thesis, it seems reasonable to consider the MP3000 strategy as an initial choice in the clinical setting. In summary, the research described in this thesis confirms that the MP3000 strategy is superior to ACE. Moreover, it assures the functional roles of ERPs and gamma band activity in generally investigating the auditory perception of CI users as well as in comparing CI speech coding strategies.

4.1. Conclusion and future directions 65 4.1 Conclusion and future directions The aim of the work presented in this thesis is to investigate prosody perception in hearing impaired individuals. With the help of EEG, the present work examined the neurophysiology processes underlying prosodic perception in NH and CI users. The results provide evidence that CI users are able to differentiate between and recognise prosodic information that reflects basic emotions. In addition, the present work clearly highlights the importance of the MP3000 strategy in the recognition of emotional prosody in comparison with the ACE strategy. The results confirm that ERPs and gamma band activity are valid measures for generally investigating CI users as well as comparing the outcomes of CI speech coding strategies. It was also observed that gamma oscillations reflected the superiority of one strategy over the other on prosodic cue detection, especially when the differences were pitch-related. The results revealed that the EEG indeed reflected differences in various emotional prosodies at an early stage of processing. Importantly, it was shown for the first time in CI users that specific oscillations do not just correlate with functions, but have a profound influence on actual neuronal processing and subsequent behaviour. Since emotional prosody is considered as a link between speech and music, the present thesis is of prime importance for research that seeks to investigate speech and/or music perception in CI users. The field of emotion recognition investigated with EEG is still in its infancy. Clearly the work presented in this thesis is just an initial step in investigating fully prosody perception in CI users with the help of EEG; a full investigation requires more than is possible in a single doctoral study. Nevertheless, the current findings show that investigation of the understanding of the potential factors that might influence emotional prosody processing is possible and important. In particular, it has been shown that the specific speech coding strategies play extremely important role in prosody recognition. Thus, future research should focus on different coding strategies to extract the most important factors irrespective of the make and models of CI.

Chapter 5 Summary Prosody Perception in Cochlear Implant Users: EEG evidence Deepashri Agrawal Cochlear implant (CI) devices provide an opportunity for hearing impaired individuals to perceive sounds through electrical stimulation of the auditory nerve. One feature of oral communication is semantics, while another feature, emotional prosody, encodes the emotional state of the speaker. It is currently unclear whether CI users are able to identify verbal emotions effectively. The main objective of this thesis is to compare two CI speech coding strategies, ACE (advanced combination encoder) and MP3000, on emotional prosody perception. This was achieved through the use of behavioural tasks and electroencephalography (EEG). Semantically neutral sentences spoken with three prosodic variations (neutral, angry and happy) served as stimuli. The aim of the first study was to investigate the performance of normal-hearing (NH) participants on the perception of emotional prosody with vocoded ACE and MP3000 stimuli. Perception of emotional prosody was analysed through simulations, showing that NH listeners achieved near-perfect performance with original stimuli compared to simulations. For simulations, improved recognition for happy and angry prosody was observed compared to the neutral prosody. A significantly larger P 200 event related potential was observed for happy prosody after sentence onset than for the other two emotions. Furthermore, the amplitude of P 200 was significantly more positive for MP3000 strategy use compared to the ACE strategy. Thus, these results emphasise the importance of vocoded simulation to better understand the prosodic cues which CI users may be utilising.

68 Chapter 5. Summary The second study investigated the ability of CI users to recognise emotional prosody with ACE and MP3000 strategies. In addition to behavioural tasks, EEG gamma band powers were also calculated. Results were similar to the those from the first study, whereby CI users implemented with the MP3000 strategy showed an improved recognition of prosodic information compared to the ACE strategy users. These ERP results demonstrated that emotional prosody elicited significant N 100 and P 200 peaks across strategies. Furthermore, the P 200 amplitude in response to happy prosodic information was significantly more positive for the MP3000 strategy compared to the ACE strategy. In addition, significant gamma band activities were observed only with the use of the MP3000 strategy, most likely reflecting better top-down cognitive control for prosody recognition. Taken together, the results presented in this thesis suggest that the MP3000 strategy is better than ACE with regard to emotional prosody perception, as confirmed by behavioural and electrophysiological responses. The P 200 peak is an indicator of active differentiation and recognition of emotional prosody. It was shown that time frequency analysis is a useful tool that can reveal differences between two Cl processing strategies in their recognition of prosody specific features of language. Furthermore, it provided several new insights, especially regarding the reflection of top-down processes through gamma band activity as a binding process and as an effective tool for the understanding of the prognostic outcome of CI speech coding strategies.

Chapter 6 Zusammenfassung Prosody Perception in Cochlear Implants Users: an EEG evidence Deepashri Agrawal Cochlear Implantate (CI) bieten eingeschränkt hörfähigen Menschen die Möglichkeit Klänge durch elektrische Stimulation des Hörnervs wahrzunehmen. Ein Merkmal von mündlicher Kommunikation ist Semantik, während ein anderes Merkmal die emotionale Verfassung des Sprechers übermittelt (emotionale Prosodie). Es ist immer noch unklar ob CI Träger verbale Emotionen effektiv wahrnehmen können. Der Hauptgegenstand dieser These ist der Vergleich von zwei CI Sprachstrategien, der ACE (Advanced Combination Encoder) und der MP3000 im Hinblick auf die Wahrnehmung von emotionaler Prosodie. Um dieses Ziel zu erreichen wurden Verhaltentest und EEG Messungen genutzt. Semantisch neutral gesprochene Sätze mit drei variierenden Prosodien (neutral, verärgert und fröhlich) dienten als Stimuli. Das Ziel des ersten Experiments war die Untersuchung der normalhörenden (NH) Teilnehmer, wenn die emotionale Prosodie durch einen Vocoder zu ACE und MP3000 simuliert wurde. Die Analyse der emotionalen Prosodiewahrnehmung für diese modifizierten Stimuli zeigte, dass NH Teilnehmer eine nahezu perfekte Erkennung für unmodifizierte Prosodien hatten. Für Simulation zeigte sich eine bessere Erkennung für fröhliche und verärgerte Prosodien im Vergleich zur neutralen Prosodie. Ein signifikant grö eres P200 (Ereignis-korreliertes Potential) wurde nach Satzanfang für die fröhliche Prosodie im Vergleich zu den beiden

70 Chapter 6. Zusammenfassung anderen Emotionen gefunden. Verglichen mit der ACE Strategie war die Amplitude der P200 signifikant positiver unter Benutzung der MP3000 Strategie. Daher unterstreichen unsere Ergebnisse die Wichtigkeit von vokodierter Simulation um ein besseres Verständnis der prosodischen Merkmale zu bekommen und wie diese von CI Trägern genutzt werden. Ein zweites Experiment, welches sich mit den Fähigkeiten von CI Trägern beschäftigte emotionale Prosodien unter ACE und MP3000 zu erkennen, wurde durchgeführt. Zusätzliche zu den Verhaltensdaten wurde die EEG Gamma band Aktivität berechnet. Wir stellten fest, dass kongruent zu unserem ersten Experiment, CI Träger, welche mit der MP3000 Strategie versorgt waren einen höhere Rate von korrekt identifizierten prosodischen Informationen aufzeigten, als die ACE Strategienutzer. Unsere ERP Ergebnisse demonstrieren, dass emotionale Prosodie signifikante N100 und P200 Peaks hervorrufen. Die P200 Amplitude in Bezug auf fröhliche prosodische Information war signifikant positiver für die MP300 Strategie, als für ACE Strategie.In der Spektrale Energie Analyse zeigte sich signifikante Gamma bandaktivität nur für die MP3000 Nutzer, vermutlich aufgrund der besseren kognitive top-down Kontrolle in diesen Nutzern. Zusammengefasst lassen unsere Ergebnisse vermuten, dass die MP300 Strategie im Vergleich zur ACE Strategie in Hinblicke auf die emotionale Prosodiewahrnehmung besser funktioniert. Dies konnten wir durch Verhaltens- und EEG- Daten nachweisen. Die P200 ist ein Indikator für aktive Unterscheidung und Erkennung von emotionaler Prosodie. Weiterhin konnten wir zeigen, dass die Zeit-Frequenz Analyse ein sinnvolles Werkzeug ist, welches Unterschiede zwischen CI Verarbeitungstrategien in Bezug auf prosodiespezifische Merkmale von Sprache deutlich machen kann. Desweiteren zeigen wir neue Erkenntnisse im Bezug auf die Reflektion von top-down Prozessen in Gamma bandaktivität, sowie dessen effektive Nutzung um Prognosen für CI Spachkodierungsstrategien zu erstellen.

CHAPTER 7 Reference BASAR, E. u. B. GUNTEKIN (2008): A review of brain oscillations in cognitive disorders and the role of neurotransmitters. Brain Res 1235, 172-193 BOSTANOV, V. u. B. KOTCHOUBEY (2004): Recognition of affective prosody: continuous wavelet measures of event-related brain potentials to emotional exclamations. Psychophysiology 41, 259-268 BOZIKAS, V. P., M. H. KOSMIDIS, D. ANEZOULAKI, M. GIANNAKOU, C. ANDREOU u. A. KARAVATOS (2006): Impaired perception of affective prosody in schizophrenia. J Neuropsychiatry Clin Neurosci 18, 81-85 DAVID, O., J. M. KILNER u. K. J. FRISTON (2006): Mechanisms of evoked and induced responses in MEG/EEG. Neuroimage 31, 1580-1591 DE ZUBICARAY, G., K. MCMAHON, M. EASTBURN, A. PRINGLE u. L. LORENZ (2006): Classic identity negative priming involves accessing semantic representations in the left anterior temporal cortex.

Neuroimage 33, 383-390 DESMEDT, J. E. u. J. DEBECKER (1979): Wave form and neural mechanism of the decision P350 elicited without pre-stimulus CNV or readiness potential in random sequences of near-threshold auditory clicks and finger stimuli. Electroencephalogr Clin Neurophysiol 47, 648-670 DONCHIN, E. u. E. F. HEFFLEY (1979): The independence of the P300 and the CNV reviewed: a reply to Wastell. Biol Psychol 9, 177-188 ERBER, N. P. (1972): Speech-envelope cues as an acoustic aid to lipreading for profoundly deaf children. J Acoust Soc Am 51, 1224-1227 FJELL, A. M. u. K. B. WALHOVD (2003): Effects of auditory stimulus intensity and hearing threshold on the relationship among P300, age, and cognitive function. Clin Neurophysiol 114, 799-807 GALAMBOS, R. u. S. MAKEIG (1992): Physiological studies of central masking in man. II: Tonepip SSRs and the masking level difference. J Acoust Soc Am 92, 2691-2697 HAGOORT, P., C. M. BROWN u. T. Y. SWAAB (1996): Lexical-semantic event-related potential effects in patients with left hemisphere lesions and aphasia, and patients with right hemisphere lesions without aphasia. Brain 119 ( Pt 2), 627-649 HAJCAK, G., A. MACNAMARA u. D. M. OLVET (2010): Event-related potentials, emotion, and emotion regulation: an integrative review. Dev Neuropsychol 35, 129-155 HILLYARD, S. A., R. F. HINK, V. L. SCHWENT u. T. W. PICTON (1973): Electrical signs of selective attention in the human brain. Science 182, 177-180

HILLYARD, S. A. u. T. F. MUNTE (1984): Selective attention to color and location: an analysis with event-related brain potentials. Percept Psychophys 36, 185-198 HOUSE, W. F. (1994): Cochlear implants: it's time to rethink. Am J Otol 15, 573-587 HUGDAHL, K., T. HELLAND, M. K. FAEREVAAG, E. T. LYSSAND u. A. ASBJORNSEN (1995): Absence of ear advantage on the consonant-vowel dichotic listening test in adolescent and adult dyslexics: specific auditory-phonetic dysfunction. J Clin Exp Neuropsychol 17, 833-840 JOHNSON, J. M. (2009): Late auditory event-related potentials in children with cochlear implants: a review. Dev Neuropsychol 34, 701-720 KEIL, A., M. M. BRADLEY, O. HAUK, B. ROCKSTROH, T. ELBERT u. P. J. LANG (2002): Large-scale neural correlates of affective picture processing. Psychophysiology 39, 641-649 KOCHANSKI, G. u. C. SHIH (2003): Prosody modeling with soft templates. Speech Communication 39, 311-352 KOTZ, S. A., M. MEYER u. S. PAULMANN (2006): Lateralization of emotional prosody in the brain: an overview and synopsis on the impact of study design. Prog Brain Res 156, 285-294 KOTZ, S. A., B. OPITZ u. A. D. FRIEDERICI (2007): ERP effects of meaningful and non-meaningful sound processing in anterior temporal patients. Restor Neurol Neurosci 25, 273-284 KRISTEVA-FEIGE, R., B. FEIGE, S. MAKEIG, B. ROSS u. T. ELBERT (1993): Oscillatory brain activity during a motor task.

Neuroreport 4, 1291-1294 KUTAS, M. u. S. A. HILLYARD (1980): Event-related brain potentials to semantically inappropriate and surprisingly large words. Biol Psychol 11, 99-116 LANDAU, A. N., M. ESTERMAN, L. C. ROBERTSON, S. BENTIN u. W. PRINZMETAL (2007): Different effects of voluntary and involuntary attention on EEG activity in the gamma band. J Neurosci 27, 11986-11990 LIU, T. Y., J. C. HSIEH, Y. S. CHEN, P. C. TU, T. P. SU u. L. F. CHEN (2012): Different patterns of abnormal gamma oscillatory activity in unipolar and bipolar disorder patients during an implicit emotion task. Neuropsychologia 50, 1514-1520 LUTZENBERGER, W., F. PULVERMULLER, T. ELBERT u. N. BIRBAUMER (1995): Visual stimulation alters local 40-Hz responses in humans: an EEG-study. Neurosci Lett 183, 39-42 MAKEIG, S. (1993): Auditory event-related dynamics of the EEG spectrum and effects of exposure to tones. Electroencephalogr Clin Neurophysiol 86, 283-293 MAKEIG, S., S. DEBENER, J. ONTON u. A. DELORME (2004): Mining event-related brain dynamics. Trends Cogn Sci 8, 204-210 MEISTER, H., M. LANDWEHR, V. PYSCHNY, M. WALGER u. H. VON WEDEL (2009): The perception of prosody and speaker gender in normal-hearing listeners and cochlear implant recipients. Int J Audiol 48, 38-48 MOST, T. u. M. SHURGI (1993): The effect of listeners' experience on the evaluation of intonation contours produced by hearing-impaired children. Ear Hear 14, 112-117

NAATANEN, R. (1988): Implications of ERP data for psychological theories of attention. Biol Psychol 26, 117-163 NOGUEIRA, W., A. BUECHNER, T. LENARZ u. B. EDLER (2005): A Psychoacoustic "NofM"-type Speech Coding Strategy for Cochlear Implants. Journal on Applied Signal Processing, Special Issue on DSP in Hearing Aids and Cochlear Implants, Eurasip 127, 3044-3059 PANTEV, C., T. ELBERT, B. ROSS, C. EULITZ u. E. TERHARDT (1996): Binaural fusion and the representation of virtual pitch in the human auditory cortex. Hear Res 100, 164-170 PAUL, R., A. AUGUSTYN, A. KLIN u. F. R. VOLKMAR (2005): Perception and production of prosody by speakers with autism spectrum disorders. J Autism Dev Disord 35, 205-220 PAULMANN, S. u. S. A. KOTZ (2008a): Early emotional prosody perception based on different speaker voices. Neuroreport 19, 209-213 PAULMANN, S. u. S. A. KOTZ (2008b): An ERP investigation on the temporal dynamics of emotional prosody and emotional semantics in pseudo- and lexical-sentence context. Brain Lang 105, 59-69 PAULMANN, S., D. V. OTT u. S. A. KOTZ (2011): Emotional speech perception unfolding in time: the role of the basal ganglia. PLoS One 6, e17694 PAULMANN, S., M. D. PELL u. S. A. KOTZ (2008): Functional contributions of the basal ganglia to emotional prosody: evidence from ERPs. Brain Res 1217, 171-178 PINHEIRO, A. P., S. GALDO-ALVAREZ, A. RAUBER, A. SAMPAIO, M. NIZNIKIEWICZ u. O. F. GONCALVES (2011): Abnormal processing of emotional prosody in Williams syndrome: an event-related potentials study.

Res Dev Disabil 32, 133-147 PINHEIRO, A. P., S. GALDO-ALVAREZ, A. RAUBER, A. SAMPAIO, M. NIZNIKIEWICZ u. O. F. GONCALVES (2012): Abnormal processing of emotional prosody in Williams syndrome: an event-related potentials study. Res Dev Disabil 32, 133-147 ROTH, W. T., B. S. KOPELL, J. R. TINKLENBERG, G. E. HUNTSBERGER u. H. C. KRAEMER (1975): Reliability of the contingent negative variation and the auditory evoked potential. Electroencephalogr Clin Neurophysiol 38, 45-50 SANDMAN, C. A. u. J. V. PATTERSON (2000): The auditory event-related potential is a stable and reliable measure in elderly subjects over a 3 year period. Clin Neurophysiol 111, 1427-1437 SAUTER, D. A. u. M. EIMER (2010): Rapid detection of emotion from human vocalizations. J Cogn Neurosci 22, 474-481 SCHRODER, C., J. MOBES, M. SCHUTZE, F. SZYMANOWSKI, W. NAGER, M. BANGERT, T. F. MUNTE u. R. DENGLER (2006): Perception of emotional speech in Parkinson's disease. Mov Disord 21, 1774-1778 SHARMA, A. u. M. F. DORMAN (1999): Cortical auditory evoked potential correlates of categorical perception of voice-onset time. J Acoust Soc Am 106, 1078-1083 SHIBASAKI, H. u. M. MIYAZAKI (1992): Event-related potential studies in infants and children. J Clin Neurophysiol 9, 408-418 SPRECKELMEYER, K. N., M. KUTAS, T. URBACH, E. ALTENMULLER u. T. F. MUNTE (2009): Neural processing of vocal emotion and identity.

Chapter 8 Acknowledgement It would not have been possible to write this doctoral thesis without the help and support of the kind people around me, to only some of whom it is possible to give a particular mention here. This thesis would not have been possible without the help, support and patience of my principal supervisor, Prof. Reinhard Dengler. For guidance and direction I am indebted to my supervisor Dr Matthias Wittfoth, who has been accessible and helpful with every aspect of my research, and who has given careful criticism of various drafts of the thesis. I am grateful to Professor Stefan Debener for constructive comments at different stages of this work and his constant encouragement and support especially in EEG analysis and related issues; I extend my thanks to Prof. Andrej Kral and Prof. Elke Zimmermann for invaluable suggestions throughout this project. I would like to thank Dr. Filipa Viola, Dr. Jeremy Thorne and Dr. Pascal Sandmann for all helpful suggestions and comments to make this research a success. My thanks are also due to Mr. Armin Tagipor for helping us in creating CI simulations without which the first paper would not have been possible. I am thankful to Mr. Andreas Niesel for technical assistance in arranging the lab, providing all the softwares and all small technical helps. I would also like to thank Dr Thorsten Schwizer for helping me understand all the deadlines, important paper work without which I never would have turned in things on time. Dr Carolina Frömke for help with statistics and comments on a draft of the thesis. Sincere thanks to all the acousticians in the Hearing centre MHH for mapping different CI programs for the patients. Without their help this entire project would not have been possible. I am very grateful to the participants and their families who participated enthusiastically in this study. I would like to acknowledge the fi-

78 Chapter 8. Acknowledgement nancial, academic and technical staff members of ZSN for all their help and support. I am also grateful to Government of Lower Saxony, Germany for providing me the prestigious Georg Christoph Lichtenberg Scholarship for my PhD. Heartfelt thanks to my fellow postgraduate students in the ZSN and in Department of Neurology MHH, for the efforts made by them in promoting a stimulating and welcoming academic and social environment. I would also like to thank my colleagues and friends from PhD hearing program for all the discussions and fun we had in summer school. I would also like to thank Dr. Olivier Commowick for his wonderful Latex template that made my life so easy while writing the thesis. I sincerely thank the external examiner for careful evaluation of my thesis. My special thanks to my husband Dr. Mahesh Kakde, for his personal support and great patience at all times. Without your encouragement and unconditional love I would have not been able to see this day. I love you. Finally, my parents, my in-laws, brother and sister have given me their unequivocal support throughout, as always, for which my mere expression of thanks likewise does not suffice. Thank you one and all...

Declaration I herewith declare that I autonomously carried out the PhD-thesis entitled Prosody Perception in Cochlear Implant Users : EEG Evidence. No third party assistance has been used. I did not receive any assistance in return for payment by consulting agencies or any other person. No one received any kind of payment for direct or indirect assistance in correlation to the content of the submitted thesis. I conducted the project at the following institution: Department of Neurology, Hannover medical School, Hannover, Germany The thesis has not been submitted elsewhere for an exam, as thesis or for evaluation in similar context. I hereby affirm the above statements to be complete and true to the best of knowledge. Date, Signature