AN INSTRUMENT FOR THE MULTIPARAMETER ASSESSMENT OF SPEECH
|
|
|
- Frederick August Stewart
- 9 years ago
- Views:
Transcription
1 AN INSTRUMENT FOR THE MULTIPARAMETER ASSESSMENT OF SPEECH by Paul Dean Sharp A thesis submitted for the Degree of Doctor of Philosophy in Electronic Engineering at the University of Kent at Canterbury 2000
2 ABSTRACT Speech is the result of a highly complex and versatile system of co-ordinated muscular movements, and although perceptual assessments contribute valuable information to the process of diagnosing speech disorders, instrumental observation and measurement offer significant advantages. Increasingly, clinicians are beginning to appreciate the considerable benefits of instrumental analysis, which provides quantitative, objective data on a wide range of different speech parameters. In addition, such measures are becoming increasingly important as the need to prove efficacy grows. Although current instruments are extremely useful, giving excellent measures of individual articulatory function, few are able to measure the co-ordination of the main articulators. The instrumentation described in this thesis, namely SNORS+, encompasses several speech assessment techniques in a single PC based clinical system. Both hardware and software provide a user-friendly interface that allows the simultaneous measurement of five key speech parameters: respiration, larynx excitation, velopharyngeal closure, tonguepalate contact and speech outcome. In addition, audio playback provides accurate identification of the recorded multiparameter data, and a synchronised video input enables simultaneous use with established imaging techniques. The results of a small trial conducted on 40 subjects, considered by the author to exhibit normal speech, are presented. The outcome of this trial has produced a series of baseline parameters that may be used to compare normal speech with pathological speech. To allow comparison with pathological data, two case studies are presented. The first study examines the hypernasal speech production of a cleft palate subject, and second investigates the speech production of a young boy with lateral misarticulations. The development of SNORS+ has given clinicians the unique ability to assess the contributory and co-ordinated effects of the main articulators on speech production. As a result, the system has proved to be extremely valuable in the assessment and treatment of various speech disorders. ii
3 TABLE OF CONTENTS 1 Introduction Speech and Language Speech Production Disorders of Speech and Language Speech Assessment Techniques Project Overview Thesis Structure Speech Speech Production Speech Organs Articulatory Phonetics Place and Manner of Articulation Consonants Vowels Disorders of Speech and Language Developmental Disorders Cleft Lip and Palate Acquired Dysarthria Acquired Apraxia and Dyspraxia Aphasia Speech and Language Therapy Assessment of Articulatory Speech Disorders Treatment of Articulatory Speech Disorders Instrumental Assessment Techniques Aerodynamic Assessment The Super Nasal Oral Ratiometry System (SNORS) iii
4 3.2 Electrolaryngography Laryngograph Electropalatography Linguagraph Imaging Techniques Videofluoroscopy Endoscopy Acoustic Analysis Oscillographic Displays FFT Displays Spectrograms Summary Project Specification System Specification EN and EC directive 93/42/EEC Technical Specification Operating System Data Acquisition Card Sound Card Video Acquisition Card PC Specification Hardware Overview External Module Connection Enveloped Lx Signal High and Low Waveform Resolution Switching Fundamental Frequency Derivation from Laryngograph Dual Channel EPG Audio Signal Conditioning Automatic Module Detection Auxiliary Channel iv
5 4.4 Software Overview C and the Multiple Document Interface The Main Application Window Real-Time Windows Test Protocol Test Analysis Windows File Handling Printing Help SNORS Hardware Implementation Linguagraph Interface Linguagraph Overview Input Buffers Dual Channel Multiplexers The Clock Generator Signal Conditioning Lx Envelope Generator High-Pass Filter Half Wave Rectifier and Low-Pass Filter Offset and Gain Adjustment Fx Generator Voltage Level Shifter and Frequency to Voltage Converter Low-Pass Filter and Gain Adjust Waveform Resolution Switching Automatic Module Detection Audio Signal Conditioning Power Supply PCB Design v
6 6 Biofeedback Software Implementation The Multithreaded Environment Scheduling The Thread Architecture The DAS-1202 Data Acquisition Thread Initialisation Data Acquisition Data Notification Thread Termination The Real-Time Bar Window High-Level Function Low-Level Function The Real-Time Scope Window High-Level Function Low-Level Function The Real-Time EPG Window High-Level Function Low-Level Function The Wave Data Acquisition Thread Initialisation Data Acquisition Data Notification Thread Termination The Real-Time Wave Window High-Level Function Low-Level Function The Real-Time FFT Window High-Level Function Low-level Function The Real-Time Spectrogram Window High-Level Function vi
7 6.9.2 Low-Level Function The Real-Time Video Window High-Level Function Low-Level Function Analysis Software Implementation Test Protocol Word List The Word Display Period Sample Frequency Parameter Selection Display Options Setup The Test Data Acquisition Thread Initialisation Data Acquisition The Test Analysis Windows The Test Scope Window Test Analysis Child Windows Results and Analysis Qualitative Analysis of Multiparameter Data Analysis of Combined Acoustic, Airflow, Voicing and EPG Data Analysis of Combined Airflow and Videofluoroscopy Data Quantitative Analysis of Multiparameter Data The Trial Analysis Procedure Results A Single Aerodynamic Case Study Analysis of Electropalatography Data Qualitative Analysis of Electropalatography Data Electropalatography Data Reduction vii
8 8.3.3 Quantitative Analysis of Electropalatography Data A Single Electropalatography Case Study Conclusions and Further Work Clinical Evaluation Clinical Measurements Relating Speech Mechanism to Outcome Assessment of Velopharyngeal Incompetence Identification of Tongue-Palate Configurations Further Work Novel Clinical Applications Hardware and Software Enhancements Bibliography Appendices viii
9 PREFACE The research documented in this thesis was funded by the Engineering and Physical Sciences Research Council (EPSRC), and conducted within the Medical Electronics Research Group, Electronic Engineering Laboratory, University of Kent at Canterbury. During the research period, aspects of the work contained in this thesis have been published in the Journal of Medical Engineering and Physics (Sharp et al., 1999), and presented at numerous conferences, symposiums and study days. In addition, eleven SNORS+ systems are now in regular clinical use in the UK, Sweden and Iran. ix
10 ACKNOWLEDGEMENTS I wish to express my particular appreciation towards my supervisor, Mr Steve Kelly, for his constant support and encouragement throughout the course of this research. I would like to extend my thanks to all those subjects who consented to participate in the clinical trail, and to the many clinicians who s constructive comments have made a significant contribution to this research. Particular mention must be made of: Ms. Alison Main, Speech and Language Therapist, for her rigorous clinical evaluation of the user interface. Mr. John Boorman, Plastic Surgeon, and Ms. Denise Dive, Speech and Language Therapist, for their collaboration on the technique of multiparameter assessment with combined videofluoroscopy. Dr. Graham Manley, Dental Surgeon, for introducing the work on speech assessment techniques to the Medical Electronics Research Group some 13 years ago, and for his collaboration throughout the course of this project. I also acknowledge the debt I owe to all past and present members of the Medical Electronics Research Group for their help and assistance. Lastly and most importantly, I would like to thank my family and close friends for their patience and support during the last three years. x
11 CHAPTER 1 INTRODUCTION 1.1 Speech and Language The human ability to produce and understand speech is often taken for granted, and little thought is given to its nature and function. It is not surprising, therefore, that many people overlook the great influence of speech on the development and normal functioning of human society (Denes and Pinson, 1968). Speech is the conversion of a language into sound (Borden and Harris, 1984). A particular language is a rule governed communication system composed of meaningful elements, which can be combined in many ways to produce sentences. Wherever human beings cohabit they develop a spoken language with which to communicate; even people in the most primitive societies use speech as a means of communication. The most important feature of human language, that which differentiates it from every other known mode of animal communication, is its flexibility, subtlety and infinite range of meanings. To a great extent, the development of human civilisation is made possible by man s ability to share experiences, to exchange ideas and to transmit knowledge from one generation to another. Man has developed many systems with which to communicate, such as Morse code, semaphore, or the written word. Unquestionably, however, man has found speech to be the most efficient and convenient form of communication. An example of the overwhelming importance of speech in human society is a comparison of the social attitudes of the blind to those of the deaf. Generally, blind people tend to integrate well with their fellow human beings despite their handicap. But the deaf, who can still read and write, often feel cut off from society. Deaf people, deprived of their primary means of communication, tend to withdraw from the world and live within themselves (Denes and Pinson, 1968). When most people stop to consider speech, they think only in terms of lip and tongue movement. In reality, speech is the result of a highly complex and versatile system of coordinated muscular movements (Borden and Harris, 1984). 1
12 Chapter 1: Introduction 1.2 Speech Production Speech is produced by an air stream originating in the lungs, which is propelled upwards by the diaphragm through the trachea (the windpipe), oral cavity and nasal cavity. During its passage, various organs of speech (the articulators) modify this air stream to produce different speech sounds. Speech production may be divided into four separate but interrelated processes (Giegerich, 1992): The air stream generated in the lungs to power the speech process. Its phonation in the larynx through the operation of the vocal folds. Its direction by the velum into either the oral or nasal cavity. And finally its articulation, primarily by the tongue and lips in the oral cavity. The speech production process is illustrated in Figure 1.1. Velum Nasal Cavity Nasal Speech Lungs Vocal Folds Pharynx Tongue Combined Speech Oral Cavity Lips Oral Speech Passive Articulators Figure 1.1: The speech production process, taken from Sharp et al., (1999). When it is considered that the average rate of speech is up to four syllables per second, each of which may contain anything up to seven consonants and a vowel sound, the complexity of articulatory movement becomes apparent. It has been estimated that over 100 muscles are involved in the speech process (Lenneberg, 1967) and their controlled coordination requires around 140,000 neuromuscular events every second (Darley et al., 1975). If the timing and/or position of the articulators are not properly controlled, abnormal speech may occur. 2
13 Chapter 1: Introduction 1.3 Disorders of Speech and Language Disorders of speech and language refer to problems in communication and related areas such as articulatory function. These range from simple sound substitutions to the inability to understand language or control the speech production mechanism. Speech disorders can be either developmental or acquired, and may be either physical or neurological. Causes include: Mislearning where the speech mechanism is physically unaffected, but the individual is still unable to produce adequate speech. A classic example is the lisp. Sensory impairment where there is an impairment in the interaction of speech and other senses; for example in a profoundly deaf subject who cannot hear the speech they produce. Neurological disorders a common factor amongst these disorders is the lack of control over the speech production mechanism. For example, acquired dysarthria reduces muscular control of the articulators, and may result from Parkinson s disease, motor neurone disease and stroke (Darley et al., 1975). Structural defects where there is a physical defect in one or more of the speech production organs, making it physically impossible to generate the appropriate speech sounds. In cleft palate speech, for example, the ability to achieve adequate velopharyngeal closure during oral sounds is often affected. 1.4 Speech Assessment Techniques Assessment of speech defects is initially subjective, relying on the clinical judgement of the speech and language therapist. This will involve both assessment of the intelligibility and quality of the patient s speech, and observation of the visible aspects of articulation (e.g. lip and some tongue movement). However, the majority of the articulators are not visible during speech. Additionally, there is a growing need for evidence-based intervention. Therefore, objective quantitative assessment is increasingly important (Sharp et al., 1999). A number of individual instruments are available that can achieve this: Videofluoroscopy - records moving X-ray images onto videotape. This provides a view of the velum and tongue during speech, and yields useful dynamic information. 3
14 Chapter 1: Introduction Nasendoscopy - utilises an endoscope, passed through the nares and nasal cavity to image the velum during speech. The vocal folds can also be viewed in this way. Electroglottography - measures vocal fold activity. This is achieved by placing a set of electrodes on the patient s neck, either side of the thyroid cartilage. By passing a small electric current through the vocal folds and measuring impedance changes, it is possible to detect their vibration as well as simple movements of the glottis. Nasal anemometry - allows the position of the velum to be inferred by measuring nasal airflow during speech. Electropalatography - determines tongue-palate contact by using a special artificial palate containing an array of electrodes embedded on its tongue-facing surface. A small electrical signal, fed to the patient, is conducted through the tongue to any touched electrodes and thence, via the electronics unit, to a computer where the tonguepalate contact is displayed. Whilst the above techniques yield useful information about the individual articulatory function they provide little or no information relating to the synchronisation of the articulators. This is considered a major limitation when assessing speech disorders involving more than one articulator (Main, 1998). 1.5 Project Overview The instrumentation described in this thesis encompasses all of the above techniques in a single PC based clinical system. Both hardware and software provide a user-friendly interface that allows the simultaneous measurement of five key speech parameters: Respiration. Larynx excitation. Velopharyngeal closure. Tongue-palate contact. Speech outcome. These parameters may be displayed as trend waveforms over time, or as two-dimensional dynamic images. Data from the various instruments are synchronously combined within an interface unit, which facilitates a single connection to the host computer s data 4
15 Chapter 1: Introduction acquisition card. Audio playback provides accurate identification of sound elements within the featured waveforms. Synchronised video input allows simultaneous use with established and respected imaging techniques such as videofluoroscopy and nasendoscopy. In addition, the inclusion of spectral analysis software allows the rapid variations in the acoustic signal to be visualised and hence a measure of speech outcome to be obtained. A simplified block diagram of the system is illustrated Figure 1.2. Microphone Palate Electropalatography Unit Airflow Transducer Anemometry Unit Interface PC Electrodes Electroglottography Unit Video Source Figure 1.2: A block diagram of the multiparameter system. The result is a system capable of the simultaneous measurement of four major speech organs: the lungs, larynx, velum and tongue. Together with the resultant speech outcome this presents the clinician with a comprehensive view of the speech production process. 1.6 Thesis Structure The general structure of the thesis is detailed below. Chapter 1: Chapter 2: Presented a general overview of the project. Describes the physiological process of speech production, giving a detailed account of each major speech organ. Articulatory phonetics, which is the study of the individual speech sounds, is also discussed. The chapter then describes several disorders affecting speech and language, and concludes with a discussion on the role of speech and language therapy in the assessment of these disorders. 5
16 Chapter 1: Introduction Chapter 3: Chapter 4: Chapter 5: Chapter 6: Chapter 7: Chapter 8: Chapter 9: Introduces a selection of instrumental techniques commonly used in the assessment of disordered speech. Outlines the main user requirements of a multiparameter speech workstation. It then provides a full technical specification and concludes with a system overview in terms of both hardware and software. Discusses the technical aspects of the hardware, giving a detailed account of the individual elements that comprise the interface unit. Introduces the concepts of real-time data acquisition under the Windows operating system, and explains how these have been implemented in the biofeedback software. Describes the test protocol used to conduct formal speech assessment. Techniques for the synchronised acquisition of multiple source data are then discussed. Finally, the methods used to format, display and analyse the synchronised multiparameter data are described. Presents a series of test results acquired using the multiparameter system. This chapter is divided into three main sections: qualitative analysis of multiparameter data, quantitative analysis of multiparameter data (excluding lingual parameters), and the analysis of electropalatograpy data. Draws conclusions from the work presented in the thesis, and suggests areas of further research. 6
17 CHAPTER 2 SPEECH This chapter describes the physiological process of speech production, giving a detailed account of each major speech organ. Articulatory phonetics, which is the study of the individual speech sounds, is also discussed. The chapter then describes several disorders affecting speech and language, and concludes with a discussion on the role of speech and language therapy in the assessment of these disorders. 2.1 Speech Production On choosing to speak an individual must initially arrange their thoughts, decide on the message content, and then convert it into linguistic form (Denes and Pinson, 1968). The conversion is achieved by selecting the necessary words and phrases to express the meaning of the message, and by placing them in the correct order as dictated by the grammatical rules of the language. This process is associated with brain activity, and it is here that the appropriate instructions, in the form of impulses along the motor nerves, are generated and transmitted to the muscles of the vocal organs. These nerve impulses set the vocal muscles into motion, which in turn produce minute pressure changes in the surrounding air. The resultant sound waves produce similar pressure changes within the listener s ear, activating the hearing mechanism. Consequently, nerve impulses are produced which travel along the acoustic nerve to the brain where the original message is reconstructed. In addition, to ensure the resultant speech approximates to the speaker s original intention, he or she must continually listen to themselves whilst speaking and make any necessary adjustments, such as pitch level or voice intensity. In engineering terms the speech production process represents a closed loop feedback system Speech Organs The gross components of the human speech production mechanism are: the lungs (air supply) the trachea (windpipe) the larynx (vocal cords) 7
18 Chapter 2: Speech the pharyngeal cavity (throat) the oral cavity (mouth) the nasal cavity (nose) These are illustrated in Figure 2.1. Soft Palate (Velum) Hard Palate Nasal Cavity Nostril Oral Cavity Pharyngeal Cavity Larynx Lip Tongue Teeth Esophagus Jaw Trachea Lung Diaphragm Figure 2.1: Schematic view of the human speech production mechanism, taken from Rabiner (1993). Generally, the pharyngeal and oral cavities are grouped into one unit referred to as the vocal tract, which begins at the output of the larynx (or glottis), and terminates at the input to the lips. The shape of the vocal tract can be varied extensively by moving the active articulators such as the tongue, lips and jaw. The nasal cavity is often called the nasal tract, which begins at the velum and ends at the nostrils. When the velum is lowered the nasal tract is acoustically coupled to the vocal tract to produce nasal sounds. During speech, the lungs and associated muscles produce the air source required to power the vocal mechanism. The muscle force pushes air out of the lungs and through the trachea. 8
19 Chapter 2: Speech When the vocal cords are tensed, the airflow causes them to vibrate producing so-called voiced speech sounds. When the vocal cords are relaxed, in order to produce a sound, the airflow must pass through a constriction in the vocal tract and thereby become turbulent, producing so-called unvoiced sounds. Alternatively, the air can build up pressure behind a point of total closure within the vocal tract and cause a brief transient sound when the pressure is abruptly released. The sections that follow detail the individual speech organs and outline their respective role in the speech production process The Respiratory System The lungs are masses of spongy, elastic material contained within the rib cage. They supply oxygen to the blood and dispose of waste products such as carbon dioxide. The intercostal muscles, abdomen and diaphragm control the act of respiration. At rest a balance of forces exists between the lungs and thoracic cavity, and the pressure within the lungs (pulmonary pressure) is at atmospheric level. During inspiration, the diaphragm and external intercostal muscles contract which increase lung volume and hence decrease the pulmonary pressure. The reduction of pressure draws air into the lungs until atmospheric pressure is again reached. At the end of inhalation a state of equilibrium is reached thus preventing further airflow. On relaxation of the inhalation muscles, equilibrium is lost and the elastic forces of the lung and thoracic cavity contract the lungs. This increases the pulmonary pressure above atmospheric and draws air out of the lungs to produce an exhaled air stream. On completion of the exhalation phase, rest is again reached and the cycle repeats itself (refer to Figure 2.2). 9
20 Chapter 2: Speech Figure 2.2: The respiratory cycle, taken from Tortora and Grabowski (1993). Normal respiration rate is around fifteen times per minute, with the inspiration and expiration phases approximately equal in duration. However, during speech it is possible to influence the respiration rate in accordance with the length of the sentence or phrase. Since all English speech sounds are initiated by an egressive air stream (Giegerich, 1992), the duration of exhalation is increased and inhalation decreased. The respiration rate may reduce to as little as four times per minute during speech (Fry, 1994). 10
21 Chapter 2: Speech The Larynx The air from the lungs flows through the trachea towards the larynx, which is a cartilaginous tube connecting the trachea and the pharynx (refer to Figure 2.1). Its main function is to protect the airway by closing off during swallowing, and by expelling anything that enters the larynx by coughing. As illustrated in Figure 2.3, the larynx contains two horizontal folds of tissue (the vocal folds), which extend from the arytenoid cartilage to the thyroid cartilage (Adam s Apple). The gap between the vocal folds, through which the air stream passes upward into the oral cavity, is called the glottis. Arytenoid Cartilage Thyroid Cartilage (a) (b) Vocal Folds Figure 2.3: Diagram of the glottis shown from above with (a) vocal folds open and (b) vocal folds closed, taken from Plant (1999). During breathing the arytenoid cartilages are held outward, pulling the vocal folds to the side, thus keeping the glottis wide open. However, for many of the speech sounds, the vocal folds are used to interrupt the flow of air causing periodic pulses of sound or phonation (Main, 1998). This is achieved by moving the arytenoid cartilages inward to bring the vocal folds to a position of adduction. The lungs are caused to contract, generating a pressure head below the glottis. If the resulting force is sufficient, it overcomes the elastic force holding the vocal folds together, thereby causing the glottis to open and air to flow out. The vocal folds then close rapidly due to a combination of factors, including their elasticity, laryngeal muscle tension and the Bernoulli effect. Pressure below the glottis then builds up and the events repeat themselves. 11
22 Chapter 2: Speech The mass and length of the vocal folds vary; for example in men they are substantial and mm in length, whereas in women they are finer and mm in length. The differing mass and length lead to different fundamental frequencies of vibration: around 125 Hz in men, 200 Hz in women and 300 Hz in children (Borden and Harris, 1984). As mentioned, the vocal folds can be manipulated by the speaker and brought into a variety of different positions, thus altering the shape of the glottis. At least three such positions are linguistically significant: Closed glottis. The vocal folds are brought close together so that no air can pass between them. The resulting speech sound from the closure of the glottis and subsequent release is called the glottal stop and is sometimes heard in English preceding a forcefully pronounced vowel (as in Out!). Narrow glottis. When the vocal folds are brought together in such a way that makes them vibrate, the resulting sound waves characterise the voiced sounds of speech. All vowels are voiced, as are sounds like /m/, /l/, /v/, /b/ etc. Open glottis. The glottis assumes this state in normal breathing as well as in the production of voiceless sounds. The vocal folds are spread and do not vibrate; the glottis is of sufficient width as to allow the air stream to pass through without obstruction. Voiceless sounds are, for example, the /st/ sequence in stone the rest of the word is voiced. The resultant sound waves may be further modified by the configuration of the vocal tract The Pharynx The pharynx is the section of the vocal tract nearest the glottis. It is a muscular tube connecting the trachea and oesophagus to the oral and nasal cavities (refer to Figure 2.1). For speech, the pharynx serves as an acoustic filter that suppresses the passage of sound at certain frequencies while allowing its passage at other frequencies. The resonant properties of the pharynx and other cavities together modify the voice source and help characterise the individual speech sounds (phonemes). The overall shape, length and volume of the pharynx determine the transfer function of the filter. The flexibility of the human vocal tract, in which the articulators can easily adjust to form a variety of shapes, results in the potential to produce a wide range of sounds. 12
23 Chapter 2: Speech The Velum Having passed through the larynx and pharynx, the air stream may flow through the oral and nasal cavities. In normal breathing the air stream will usually pass through the nasal cavity. However, during many speech sounds the nasal cavity is blocked and the air stream is directed into the oral cavity. This is achieved with the velum, a muscular structure extending from the posterior border of the hard palate (refer to Figure 2.1). The nasal cavity is also obstructed during swallowing to prevent food or liquids from being forced through the nose. The velum, which may be adjusted, has two linguistically significant positions: Elevated. When raised and pressed against the back of the pharynx, the velum prevents the entry of air into the nasal cavity. Since the air stream emerges through the oral cavity the speech sounds produced in this manner are called oral sounds. The vast majority of consonants in the English language are produced in this manner. Lowered. When the velum is lowered the air stream has access to the nasal cavity. If at the same time the oral cavity is occluded, causing the entire air stream to pass through the nasal cavity, the result is a nasal sound. A purely nasal escape of this type occurs in the nasal consonants /m/, /n/ and //. The occlusion of the oral cavity is different for each of these sounds thus altering the dimensions of the resonating cavity. The velum has five pairs of muscles, which control its position within the pharynx. The lateral and posterior pharyngeal walls are made up of various pharyngeal muscles, and can move medially and anteriorly respectively to vary the cross sectional area of the pharynx. To achieve velopharyngeal closure the velum is elevated and moves posteriorly towards the walls of the pharynx, such that the mass of the velum blocks much of the cross section of the pharynx. In addition, the pharyngeal walls move medially. The movement of the velum and pharyngeal walls combine to form a closure of the velopharyngeal valve, required for most speech sounds. Borden and Harris (1984) suggest that, in general, a small gap in velopharyngeal closure (c. 20 mm 2 ) will not effect the resultant sound. However, larger gaps will cause audible nasal resonance The Tongue The most versatile of the articulators is the tongue, which is involved in the production of all vowels and the vast majority of consonants (Crystal, 1989). Different sounds require 13
24 Chapter 2: Speech different tongue configurations. By altering tongue position and shape, the size of the oral cavity and therefore its resonating characteristics are changed. Besides making movements required for speech, the tongue mixes food with saliva during chewing, forms the food into a bolus and initiates swallowing. The tongue covers the majority of the oral cavity floor. It consists primarily of connective tissue and muscle, covered with a mucous membrane. Both intrinsic and extrinsic skeletal muscle fibres form the tongue. The intrinsic muscles are confined to the tongue and are unattached to bone. They include bundles of fibres that run in three planes: longitudinal, transverse and vertical. These enable the fine, rapid control of tongue shape and positioning which are necessary for the articulation of speech sounds. The extrinsic muscles extend from the tongue to their points of origin on the hyoid, skull and mandible. Their purpose is to alter the gross tongue position within the mouth; they protrude, retract and elevate it. The extrinsic muscles also form the body of the floor within the mouth and hold the tongue in position. The tongue is divided by a midline connective tissue septum and each half contains identical groupings of the intrinsic and extrinsic muscles. In order to speak, the complex movements made by the tongue must be co-ordinated with the controlled movement of the other articulators. Any errors in co-ordination, speed of movement, shape, or place of tongue contact will cause articulation to be distorted and speech intelligibility to be affected (Crystal, 1989) The Lips The lips (labia) are composed mainly of the orbicuslaris oris muscle covered by skin on the outside and a mucous membrane on the inside. Many other muscles are attached to the orbicuslaris oris muscle and together they work to create the movements and power required for speech, eating and forming facial expressions. For speech production the lips have three functions: A place of closure. By closing and subsequently opening the lips, sounds such as the plosives /p/ and /b/ are produced. A resonance modifier. Variations in the size and shape of the resonating cavities can be achieved by altering lip shape. For example, lip rounding and protrusion lengthen the oral cavity, as in the articulation of the sound //. 14
25 Chapter 2: Speech A sound source. Where the lips are held sufficiently close together so that friction occurs between them. For example, during the sound /f/, the lower lip is raised against the upper incisors and air passing through the gap under pressure causes friction The Teeth and Hard Palate The speech organs that are not mobile are called passive articulators; these include the teeth (or more precisely, the incisors) and hard palate. The hard palate forms the anterior portion of the roof of the mouth and is divided into two sections: The alveolar ridge a hard ridge that can be felt behind the upper incisors. The palate a hard bony structure in the front part of the roof. Whilst not regarded as active articulators, the teeth and hard palate do contribute to the articulation of many speech sounds (Main, 1998). 2.2 Articulatory Phonetics Articulatory phonetics is the study of the individual speech sounds by identifying how and where they are articulated. To enable the various speech sounds to be accurately transcribed, the International Phonetic Association has developed the International Phonetic Alphabet (IPA). 15
26 Chapter 2: Speech Table 2.1: International Phonetic Alphabet, taken from Roach (1991). The following sections introduce the major classes into which speech sounds are divided according to the IPA system Place and Manner of Articulation The distinction between place and manner of articulation is particularly important for the classification of consonants. The manner of articulation is defined by a number of factors: Voiced vs. Voiceless: Whether there is vibration of the vocal cords. For example /v/, /d/ and /g/ are voiced, whereas /s/, /t/ and /k/ are unvoiced. Consonant vs. Vowel: Whether there is obstruction of the air stream at any point above the glottis. 16
27 Chapter 2: Speech Nasal vs. Oral: Whether the air stream passes through the nasal cavity in addition to the oral cavity. In English the only nasal consonants are /m/, /n/ and //. All other speech sounds are described as oral. Non-Lateral vs. Lateral: Whether the air stream passes through the middle of the oral cavity or along the sides. For example /s/ is non-lateral, whereas /l/ is lateral. The place of articulation is the point at which the air stream is obstructed, as illustrated in Figure 2.4. In the majority of cases it is possible to characterise the place of articulation in terms of the passive articulators involved (Giegerrich, 1992). Alveolar Palatal Velar Uvular Bilabial Labio-Dental Pharyngeal Dental Glottal Figure 2.4: Places of articulation, taken from O Connor (1991). The place of articulation can be any of the following: The lips (bilabials), examples include /p/ and /b/. The teeth (dentals), examples include // and //. The lips and teeth (labio-dentals), examples include /f/ and /v/. The alveolar ridge (alveolar articulations), examples include /t/, /d/ and /l/. The hard palate (given its large size, it is possible to distinguish between palatoalveolars and palatals), for example // and /j/ respectively. The soft palate (or velum - velar articulations), examples include /k/, /g/ and //. 17
28 Chapter 2: Speech The uvula (uvulars), for example //. The pharynx (pharyngeals), for example //. The glottis (glottals), examples include /h/ and // Consonants Consonants my be further categorised as: Stops: A stop sound is produced by completely blocking the airflow within the oral cavity. For example, in the sound /t/ air pressure builds up in the oral cavity behind the tongue and its subsequent rapid release causes an explosive sound. For this reason stops are often referred to as plosives. Fricatives: A fricative sound is produced when two articulators are placed in close proximity generating turbulent noise when air passes between them. For example, the /s/ sound is produced when a groove is present between the tongue and the hard palate creating a hissing sound as air passes between them. Other examples include /z/, /f/ and /v/. Affricates: The production of an affricate can be characterised by a stop closure followed by a fricative-like release of air. An example of this sound is // in church. The initial sound begins as a stop but ends like a fricative. Nasals: A nasal is a sound made with the velum lowered so that air flows through the nasal cavity. Examples include /m/, /n/ and //. Liquids: In the production of these sounds there is some obstruction to the airflow within the oral cavity, but it is not sufficient to cause any real constriction or friction. In English the only liquids are /l/ and /r/ (Fromkin and Rodman, 1993). Glides: These sounds lie somewhere in between a consonant and a vowel, since they provide almost no obstruction to the airflow within the oral cavity. Glides such as /w/ and /y/ are also called semivowels Vowels The main characteristic of vowels is the freedom with which the air stream, once out of the glottis, passes through the speech organs. The acoustic quality of the vowel is dependent on the size and shape of the oral cavity, which acts as a resonator. The shape of the oral 18
29 Chapter 2: Speech cavity is largely determined by the general position of the tongue within the mouth. This divides the vowels into three classes: Front vowels: Tongue body in the pre-palatal region, for example //. Central vowels: Tongue body in the medio-palatal region, for example //. Back vowels: Tongue body in the post-palatal or velar region, for example /a/. Although by no means exhaustive, the lists of terms described in this section allow the characteristics of individual speech sounds to be accurately described. 2.3 Disorders of Speech and Language Disorders of speech and language refer to problems in communication and related areas such as articulatory function. These range from simple sound substitutions to the inability to understand language or control the speech production mechanism. Disorders can be either developmental or acquired and may be either physical or neurological. Causes include hearing impairment, brain injury, mental retardation, Parkinson s or motor neurone disease, drug abuse, cleft lip or palate and vocal abuse or misuse. Frequently, however, the cause is unknown. The following section introduces the developmental speech and language disorders associated with children, placing particular emphasis on cleft lip and palate. The text then details four commonly acquired disorders of speech and language: dysarthria, apraxia, dyspraxia and aphasia Developmental Disorders There are many potential causes of speech and language disorders in children, including hearing impairment, cognitive impairment, autism, lack of stimulation and structural abnormalities such as cleft lip and palate. A child's communication is considered delayed when they are noticeably behind their peers in the acquisition of speech and/or language skills. Disorders of speech refer to difficulties in the production of speech sounds. These may be characterised as: Dysfluency - an interruption in the flow or rhythm of speech, such as stuttering. Articulatory - difficulties associated with the way sounds are formed. 19
30 Chapter 2: Speech Phonation - problems associated with pitch level, volume and voice quality. A language disorder is an impaired ability to understand and/or use words in context, both verbally and non-verbally. Children may have receptive language impairments (understanding), expressive language impairments (speaking) or both. Characteristics of language disorders include: Improper use of words and their meanings. Inability to express ideas. Inappropriate grammatical patterns. Reduced vocabulary. Inability to follow directions. A child affected by language learning disabilities or developmental language delay may exhibit one or a combination of the above characteristics. Since all communication disorders carry the potential to isolate individuals from their social and educational surroundings, it is essential to find appropriate and timely intervention Cleft Lip and Palate A cleft lip is a developmental defect that occurs in the womb in the fourth to sixth week of gestation. As can be seen in Figure 2.5 (left), the defect results in a separation of the two sides of the upper lip, often including the bones of the upper jaw and/or upper gum. A cleft lip may be unilateral (affecting one side of the mouth) or bilateral (affecting both). A cleft palate is a birth defect that occurs in the eighth to twelfth week after conception. It is an opening in the roof of the mouth where the two sides of the palate have failed to fuse, refer to Figure 2.5 (right). As described in section 2.1.1, the roof of the mouth is divided into two parts, the hard palate and the soft palate. In mild forms of cleft palate there may be only a slight notching of the soft palate. However, most defects involve both the soft and hard portions of the roof of the mouth. 20
31 Chapter 2: Speech Figure 2.5: A unilateral cleft lip (left) and a cleft palate (right), taken from the American Society of Plastic Surgeons official Web site. Because the lip and palate develop separately, it is possible for the child to have a cleft lip, cleft palate, or both. Errors in articulation are common in cleft palate patients, especially those involving affricates and fricatives (Hegde, 1996). Other errors affect stops, glides, and nasal semivowels. Nasal air emissions during the production of pressure sounds are often associated with velopharyngeal incompetence, which is an impairment of the velopharyngeal valving mechanism (Haapanen, 1992). Velopharyngeal competence is an important determinant of articulatory performance in cleft palate speech. It has been estimated that 75% of patients achieve velopharyngeal competence following primary cleft palate surgery, increasing to 90-95% with directed secondary procedures (Quinn, 1998) Acquired Dysarthria Dysarthria is the most common of the acquired disorders of speech and language (Enderby and Emerson, 1996). It is a neuromotor speech disorder resulting from lesion or lesions in the nervous system, which cause movement or postural disturbances that affect the strength, timing and/or tone of the muscle functions used for speech (Netsell and Daniel, 1979). These muscular aberrations may result in the following: Articulatory difficulties due to interference with lip, jaw, and tongue musculature. Phonation problems due to laryngeal and respiratory musculature involvement. Inappropriate resonance when the muscles of the soft palate and pharynx are affected. 21
32 Chapter 2: Speech Malfunction of the peripheral speech mechanism is often the first and perhaps the only symptom of neurological disease. Acquired dysarthria may be caused by: Vascular diseases, including intracerebral haemorrhage, thromboses and embolisms. Infectious diseases, such as meningitis. Metabolic diseases, including blood diseases such as sickle-cell anaemia and leukaemia, or disorders of amino acid or carbohydrate metabolism. Tumours, including those within the brain, the spinal cord, and the cranial and spinal nerves. Trauma, such as closed-head injuries or depressed-skull fractures, which result in concussion, contusion or laceration. Toxins, including metabolic toxins associated with diphtheria, tetanus and botulism; inorganic metals, such as lead or mercury poisoning; and organic substances, such as barbiturates and carbon monoxide. Neurological disorders, such as Parkinson s and motor neurone disease. There is a cause and effect relationship between the location of damage within the nervous system and the type of dysarthria that results. Types of dysarthria may be classified in to six specific subgroups: flaccid, spastic, ataxic, hypokinetic, hyperkinetic and mixed (Darley et al., 1975) Flaccid Dysarthria Flaccid dysarthria results from lesions of the peripheral nervous system or the lower motor neuron system. Flaccidity refers to flabbiness and implies weakness, lack of normal muscle tone and reduced or absent reflexes associated with paresis or paralysis. These conditions occur when a lesion is present anywhere along the motor unit. Depending on the affected area, the larynx, pharynx, tongue and soft palate may show flaccid impairment. Speech symptoms include breathy, harsh or weak voice, hypernasality, nasal air emission and distorted consonants (Darley et al., 1975) Spastic Dysarthria Symptoms include muscular weakness, greater than normal muscular tone, slow movements, limited range of motion, and hyperactive reflexes. Spastic dysarthria results 22
33 Chapter 2: Speech from damage to the upper motor neuron system (Love and Web, 1996). In unilateral involvement, the dysarthria is usually not severe and has significant consequences only for the lips, lower face, and tongue. Bilateral lesions cause more severe speech deviations, where all components of the speech mechanism may be involved and muscles on both sides of the body are affected Ataxic Dysarthria Damage to the cerebellum (usually bilateral) causes difficulties in regulating force, speed, range, timing, and direction of voluntary movements (Love and Web, 1996). There is also lower than normal tone in the muscular system, essentially normal reflexes and tremor during voluntary efforts. The respiratory, laryngeal and articulatory system are involved to varying degrees but the velopharyngeal mechanism is rarely affected. In severe cases there may be major respiratory problems and sudden changes in pitch and loudness. In mild cases only articulation may be affected Hypokinetic Dysarthria Hypokinesia causes slow movements, movements of limited extent, abnormal posturing, loss of automatic movements, increased muscle tone and rhythmic resting tremor in different structures. Patients often have difficulty starting and stopping movements and seem to move rigidly. Parkinsonism is the primary syndrome involving hypokinesia and speech involvement may encompass the entire speech musculature, affecting pitch, loudness and voice quality (Adams, 1997). Severity may range from mildly imprecise articulation to almost complete unintelligibility Hyperkinetic Dysarthria Hyperkinesia results mainly from lesions of the extrapyramidal system that cause abnormal involuntary movements ranging from slow to very fast, which are difficult or impossible to inhibit (Love and Web, 1996). Symptoms can be either unilateral or bilateral. The soft palate is commonly abnormal but phonation is the major speech problem, including momentary interruptions of phonation as a result of involuntary movements in the larynx or diaphragm. These interruptions are apparent on vowels but usually have no significant affect on intelligibility. 23
34 Chapter 2: Speech Mixed Dysarthria Mixed dysarthria may involve any combination of the above, and usually results from diffuse neurological damage. The greater the degree of diffusion, the greater the number of motor components involved Acquired Apraxia and Dyspraxia. Darley (1982) describes apraxia of speech as a disorder in which the patient has trouble speaking because of a cerebral lesion that prevents them executing, voluntarily and on command, the complex motor activities involved in speaking, despite the fact that muscle strength is undiminished. The disorder may be characterised by a variety of abnormalities, including: Slow speaking rate with prolonged transitions, steady states and inter-syllable pauses. Reduced movement of articulators. Uncoordinated voicing with other articulations. Initiation difficulties. Errors of selection or sequencing of segments. Where impaired muscle strength is present however, the condition is known as dyspraxia. The difficulties in speech, which result from conditions such as articulatory dyspraxia are inconsistent and often accompanied by struggle behaviour as the speaker attempts to execute the appropriate movements (Main, 1998). Each attempt at the same sound may be different. Stressed words and initial sounds have been found to give the most difficulty, with the same sounds being clearly articulated in other word positions. Articulatory dyspraxia affects only volitional speech. Automatic speech, for example counting or reciting a well-known phrase may be unaffected (Darley et al., 1975). These characteristics are used to differentiate dyspraxia from dysarthria Aphasia The term aphasia (also known as dysphasia) denotes a class of acquired language disorders caused by damage to the cerebral hemispheres of the brain. The nature and severity of aphasia is dependent on the site and extent of damage (Main, 1998). Aphasia may be classified into the following categories: 24
35 Chapter 2: Speech Anomic aphasia is characterised by an inability to recall proper names. Speech is fluent and grammatical. Broca's aphasia is named after the French neurologist Paul Broca, who in the nineteenth century identified an area within the frontal lobe that is important in the control of speech. Broca's aphasia results from damage to this area and is characterised by slow, laborious, hesitant speech, with little intonation and obvious articulation difficulties. Also identified by impairments in word order. Generally, it is interpreted as a difficulty in sequencing the units of language. Conduction aphasia is characterised by an impaired repetition of speech, accompanied by naming difficulty and comprehension impairment. Global aphasia is a linguistic disorder in which spontaneous speech, repetition and naming are all severely impaired. Transcortical aphasia is characterised by various impairments consistent with the inability to repeat words. Wernicke's aphasia is caused by damage to the left temporal lobe just posterior to the primary auditory cortex and is characterised by a deficiency in speech comprehension and meaningless but somewhat grammatical speech. 2.4 Speech and Language Therapy The speech and language therapist is concerned with the study, assessment and treatment of communicative disorders. Specialist areas include developmental language disorders, neurogenic speech and language, fluency, voice, articulation, swallowing and alternative communication methods. Assessment of an individual with a communication disorder may involve a wide variety of diagnostic procedures, some of which are detailed in this section. Treatment procedures also vary, and may involve group or individual approaches, instrumentation for biofeedback or alternative/augmentative communication, and the use of dental prostheses Assessment of Articulatory Speech Disorders As a basis for diagnosis and treatment planning, the exact nature of each articulatory speech disorder should be understood; beginning with the anatomic and physiological changes created in the vocal tract and their movement patterns during speech. This 25
36 Chapter 2: Speech information should be available in order to interpret the acoustic and perceptual characteristics of the disorder. It is often apparent that more than one vocal tract gesture can produce the same acoustic or perceptual result (Logemann, 1985). In addition to a basic understanding of the nature of each disorder, its natural course must be documented in order for therapists to diagnose and treat it optimally. Many acquired speech disorders result from surgery or trauma (e.g., stroke or head injury), from which some recovery can be anticipated. Others are related to progressive neurological disease (e.g., multiple sclerosis or Parkinson s disease) which will cause increasing degradation in articulatory function. Thus, in order to determine the anatomic and physiologic changes that can be anticipated with recovery or with degeneration, data collected on the nature of the disorder must document its entire course. Finally, with detailed information on the nature of the disorder, and its natural course, therapists can design optimal treatment strategies, and determine the best time for their introduction in the recovery or degenerative process. In order to monitor patient progress (or degeneration) and permit baseline comparisons, assessment techniques must be repeatable. These may include: Measurement of articulatory strength, function and co-ordination. Perceptual assessment of speech intelligibility. Instrumental assessment Measurement of Articulatory Strength, Function and Co-ordination Measurement of the strength and function of individual articulators is often useful, particularly in the diagnosis of the aetiology of impairment (Netsell 1983). For example, assessment of the range and strength of repetitive tongue movement, commonly part of dysarthria assessment should help the therapist to discriminate between spastic and flaccid, unilateral and bilateral weakness (Main, 1998). This approach could however, oversimplify the complexities of the speech mechanism. Due to compensation of other systems, the severity of articulatory impairment may not always effect the speech outcome as expected (Netsell, 1983). For example, an apparently severe unilateral spastic weakness of the tongue, shown in repetitive movement, may have a minimal effect on articulation. It has also been recognised that patients not achieving 26
37 Chapter 2: Speech velopharyngeal closure during speech often use the tongue to support the velum or to fill space between and posterior pharyngeal wall (Buck and Harrington, 1949). This improves velopharyngeal competence but limits the tongue s articulatory capacity (McWilliams, 1966). For the above reasons, measurement of the co-ordinated movement of the articulators is considered to be of great use and importance (Warren et al., 1997) Perceptual Assessment of Speech Intelligibility Meaningful speech production is a vital part of any speech disorder assessment, since it involves the entire motor control system (Netsell 1983), and includes important aspects of speech, such as prosody and rate. Formal methods used for assessing speech intelligibility include the use of standardised phonetic, phonological and language assessment techniques. These assessment protocols require the patient to undertake well-defined speech tasks that are recorded using high quality audio equipment. Trained listeners assess the resulting speech samples using transcription techniques, and properties such as nasal emission, resonance and quality of articulation are rated subjectively. For example, hypernasality could be rated as being not present, mild or severe. These qualitative perceptual criteria are converted, by the therapist, into point scores (defined by the test protocol) to provide a form of quantitative data. An example of such a system is the Frenchay Dysarthria Assessment where a five point rating scale, from no difficulty to unable is provided for each individual task. It has been recognised, however, that motor speech disorders often cause distortions of speech which are difficult to assess and quantify perceptually using phonetic transcription methods (Forrest and Weismer 1997, Yorkston and Beukelman 1980) Instrumental Assessment Although perceptual assessments contribute valuable information to the process of diagnosing speech disorders, instrumental observation and measurement of speech offers significant advantages over unaided perceptual judgements (Baken, 1987). By including the use of instrumental procedures in the process of diagnosing speech disorders, therapists are able to extend their senses and objectify their perceptual observations (Perterson and Marquardt, 1981). In particular, instrumentation has given the clinician the ability to determine the contributory effect of the various defective components on speech 27
38 Chapter 2: Speech production. Modern instrumentation enables the therapist to assess and obtain information about the integrity and functional status of the muscle groups at each stage of the speech production process. The process of diagnosing and understanding speech disorders can, therefore, only benefit from the use of instrumentation. (Thompson-Ward and Murdoch, 1998). The current emphasis in the management of speech disorders has been placed on improving objective measures of speech production (Abbs and DePaul, 1989). Increasingly therapists are beginning to appreciate the considerable advantages of instrumental analysis, which provides quantitative, objective data on a wide range of different speech parameters, far beyond the scope of an auditory based impressionistic judgement (Hardcastle et al., 1985). Instrumental assessment can enhance the abilities of the therapist in all stages of clinical management including: Increasing the precision of diagnosis through a more valid specification of abnormal speech functions. The provision of positive identification and documentation of therapeutic efficacy. Short-term assessment and long-term monitoring of the speech production mechanism. The expansion of therapy modality options, including the use of instrumentation as a biofeedback tool (Baken, 1987). However, the disadvantages of current instrumental assessment techniques include the difficulty in measuring a number of parameters simultaneously, and that measurements tend to be of individual articulators, not their co-ordinated use Treatment of Articulatory Speech Disorders Traditionally, treatment sessions for speech and language disorders are held immediately after the initial assessment is made (Hegde, 1985). The selection of treatment procedures is determined by the physiologic nature of the affected speech organs. For example, malfunction of a given articulator (or articulators) could be due to: Reductions, or excesses in muscle strength. Reductions, or excesses in muscle tone. Abnormalities in the timing of muscle contractions. 28
39 Chapter 2: Speech Extremely different procedures would be used in the treatment of the above conditions. The sequencing of treatment procedures is determined, to a large extent, by which components are malfunctioning and the severity of the problem (Netsell and Daniel, 1979). Treatment may be administered in the form of articulatory exercises, purposeful activity, self-monitoring techniques, biofeedback techniques, prosthetic intervention or a combination of these Articulatory Exercises Compensation for muscle weakness often takes the form of exercises to improve strength and stamina in the affected muscles (Main, 1998). In tongue weakness, for example, repetitive tongue protrusion, such as lip licking is often recommended (Cannito and Marquardt, 1997). Resistance exercises, such as pushing against a spatula, may be introduced as strength improves. Generally, work on strengthening the articulators precedes work on articulation (Robertson and Thompson, 1993) Purposeful Activity and Self-Monitoring Techniques Purposeful activity and self-monitoring are often combined. Slowing down the rate of speech is a fairly simple behavioural change, which enables articulatory gestures to be carried out with a full range of motion. Such gestures are often difficult to achieve at normal speaking rates. Exaggerated articulation has a similar effect (Netsell and Rosenbek, 1986) Biofeedback Techniques It has been observed that most dysarthric patients improve control of their articulatory function when given biofeedback (Netsell and Daniel, 1979). Biofeedback allows the individual to focus upon the key elements of a more general problem by providing an instantaneous and simplified comparison between their articulatory functions and those that are considered normal. Numerous instances have been documented where biofeedback has assisted in the return of a particular function after the individual has plateaued with more conventional forms of therapy (Daniel and Guitar, 1978; Hanson and Metter, 1980; Netsell and Cleeland, 1973). Techniques used for biofeedback range in complexity. For example, with the simple use of a mirror, the patient can monitor movement of facial muscles. The use of a camcorder 29
40 Chapter 2: Speech connected to a TV monitor provides a more sophisticated means of achieving the same, but with the added advantage of being able to replay and pause the sequence. Pacing boards, where the patient must deliberately move their fingers from one marked area to another as they utter each word, may be helpful in reducing the rate of speech (Robertson and Thomson 1993). Oscilloscopic displays have been used as more sophisticated biofeedback systems to achieve the same results (Netsell and Rosenbek 1986) Prosthetic Intervention Prosthetic techniques are increasingly being used as intervention strategies for patients with articulatory disorders (Logemann, 1985). These may include: Palatal lift prostheses used to elevate the soft palate to facilitate velopharyngeal closure. Palatal reshaping prostheses used to lower the hard palate to assist in tongue-palate contact. In each case, the prosthesis is designed to increase the functional capacity of a particular articulator within the vocal tract. In addition, researchers in Sweden have recently investigated the use of artificial palates with built-in stimulation and exercise devices to encourage tongue mobility. The use of similar devices with increasing degrees of resistance may provide a means of combining strengthening exercises with articulatory practise (McAllister, 1998). 30
41 CHAPTER 3 INSTRUMENTAL ASSESSMENT TECHNIQUES This chapter discusses several instrumental techniques commonly used for the assessment of speech disorders. These techniques were identified in a survey, conducted on existing speech assessment techniques, to verify the need for a multiparameter system. The survey included a review of current literature, extensive World Wide Web searches, and discussions with clinicians and others working in relevant fields. These searches revealed a total of 72 speech assessment techniques in current use. Of these six are generic, general-purpose methods that have application in speech assessment, with the remaining 66 instruments designed specifically for speech assessment. From this vast array of techniques, three distinct categories emerged: Instrumental techniques that measure, either directly or indirectly, the mechanism of the major speech organs, i.e. the lungs, larynx, velum and tongue. Imaging techniques that allow direct visualisation of the articulators and their coordinated function. Acoustic analysis, which measures the combined effort of the articulators and their effect on the perceived speech outcome. From the categories listed above, this chapter introduces the instrumental techniques considered relevant to a multiparameter system. However, since a detailed description of each technique is beyond the scope of this thesis, only those applicable to the project are discussed in detail. 3.1 Aerodynamic Assessment Aerodynamics is a branch of mechanics that deals with the motion of air and other gases, and with the effect of that motion on bodies in the air (Flexner, 1987). The modification of the air stream generated in the lungs gives rise to the acoustic events that are perceived as meaningful speech utterances (Zajac and Yates, 1997). Anatomic and/or functional abnormalities, which may involve the lower airways, the larynx or the supralaryngeal structures, may adversely affect this process. Aerodynamic assessment methods, therefore, provide valuable information relative to the entire vocal tract. Airflow measurements 31
42 Chapter 3: Instrumental Assessment Techniques associated with respiration, phonation and articulation are vital to an understanding of both normal and pathological speech. The survey revealed ten systems currently used to measure airflow during speech: Aerophone II E.V.A. Physiologica Perci-SARS Exeter Anemometer Airflow 3 C-Scape Rothenburg Mask NORS SNORS Aerophone II, E.V.A., Physiologica and Perci-SARS all use the invasive pressure-flow technique to measure the size of an orifice, in this case the velopharyngeal port. While inherently accurate, this technique is expensive, requires frequent calibration and is invasive. Also, since it requires a nasal cannula, it modifies the very parameter under investigation, thus limiting its accuracy in practice. The Exeter Anemometer and Airflow 3 are both simple nasal anemometers. These provide a direct measure of nasal airflow, giving an indication of velopharyngeal closure. However, since nasal airflow depends on both velopharyngeal closure and speech volume, accurate comparative measures of closure cannot be obtained. C-Scape, or See-Scape, is a simple variable gap flowmeter, which is connected to the nares via a tube. It provides a simple visual indication only and also exhibits the limitations of the above nasal anemometers. The Rothenburg Mask measures both nasal and oral airflow during speech, which eliminates the volume factor and provides an accurate measure of velopharyngeal closure. However, since pneumotachographs are used as flow sensors, this design can only be regarded as an extension to a generic technique that is expensive and cumbersome to use. 32
43 Chapter 3: Instrumental Assessment Techniques NORS and its successor SNORS, measure nasal and oral airflow using small, inexpensive, solid-state sensors that do not require frequent calibration. Unlike pneumotachographs, these systems utilise flexible, transparent masks that have a minimal effect on speech production. While the response of NORS is slow, SNORS overcomes this limitation and so provides a simple, accurate assessment of velopharyngeal function. Due to cost, simplicity and overall accuracy, SNORS was selected from the above instrumentation to provide the airflow element on the multiparameter system. Also, since SNORS is developed and manufactured at the University of Kent, technical details such as system operation and circuit board layout are easily obtained The Super Nasal Oral Ratiometry System (SNORS) SNORS is a commercially available system that measures both nasal and oral airflow, allowing measurement of the rapid movement of the velum and also respiratory effort during speech. The airflow sensors are housed in a two-chamber facemask, which is held over the nose and mouth. In addition to measuring airflow, both nasal and oral speech sounds are detected using small microphones inserted in the mask. Signals derived from these microphones are presented on a computer screen as envelope waveforms, which provide a clear indication of speech intensity over time. The airflow signals appear on the screen below the speech intensities, allowing correlation of the airflow with the resultant sound. An estimation of velopharyngeal closure, which is independent of speech intensity, is achieved by calculating values of aerodynamic nasalance. This is the percentage of the total positive airflow that is nasal and is given by: Nasal Airflow Aerodynami c Nasalance 100 (3.1) Nasal Airflow Oral Airflow Aerodynamic nasalance should not be confused with acoustic nasalance, which is the percentage of the total acoustic energy that is nasal. SNORS can be used for objective measurement, requiring the subject to utter a number of words selected to demonstrate velopharyngeal function, and for feedback, using a simple real-time display of nasal and oral airflow. The following sections give a detailed description of the SNORS system. 33
44 Chapter 3: Instrumental Assessment Techniques Airflow Transducers The airflow transducers implemented in the SNORS mask are AWM3300V microbridge mass airflow sensors, manufactured by Honeywell Inc. The AWM series of transducers operate by measuring the rate of relative heat transfer from a heater to either of two temperature sensors, placed on each side of the heater. The heat transfer is proportional to the mass airflow. These transducers exhibit a considerably faster response time than conventional thermistor techniques. This is due to the substantially reduced mass of the sensing element within the transducer, which is made possible by thin-film fabrication techniques. The transducers are lightweight, robustly packaged and bi-directional, allowing measurement of both magnitude and direction of flow. The AWM3300V transducer has a typical bandwidth of 500 Hz and a flow range of up to ls -1 (1000 sccm) 1. Signal conditioning circuitry is incorporated within the package to produce a high-level, low-noise output signal. The flow path of the transducer is of a small diameter and exhibits high resistance to flow. To overcome this limitation a bypass method has been adopted, where only a portion of the air stream within the flow path passes through the sensor (refer to Figure 3.1). The bypass facility provides an effective dynamic flow range of 0 to approximately 0.8 ls -1, which has been found by experimentation to be adequate for this application (McLean at el., 1997). Bypass Section Airflow Sensor Side Elevation Front Elevation Figure 3.1: Diagram of the airflow transducer housed in the bypass section. 1 sccm: standard cubic centimetres per minute 34
45 Chapter 3: Instrumental Assessment Techniques The response of the AWM3300V transducer is non-linear in nature and this is exaggerated by the inclusion of the bypass section, which introduces some turbulence. However, the inclusion of a data lookup table embedded within the software compensates for this Mask Design The mask used in the SNORS design is based on a commercially available resuscitation mask. In its unmodified state, the mask consists of a flexible silicone cuff and a rigid polysuflon body, which is similar to Perspex. The silicon cuff ensures that a good airtight seal is present between the face and mask. To facilitate air escape, an outlet port exists at approximately mouth level; this also houses the oral airflow transducer. In order to isolate the nasal and oral airstreams, the mask has been partitioned into two chambers using a silicone rubber separator. A second outlet port, at approximately nose level, has been included to facilitate nasal air escape, and also to house the nasal airflow transducer. To enable the simultaneous acquisition of the acoustic signals emitted from the nose and mouth, a small electret microphone has been bonded to the casing of each airflow transducer. The modified mask design is illustrated in Figure 3.2. Figure 3.2: Photograph of the mask, showing outlets and attached airflow transducers. Note: the transducers shown here are fitted with red dust caps. 35
46 Chapter 3: Instrumental Assessment Techniques Electronic Subsystems Figure 3.3 shows a block diagram of the SNORS system, which can be divided into flow subsystems and microphone subsystems. SNORS System Airflow Transducer Microphone Nasal Airflow Subsystem Nasal Microphone Subsystem PC Airflow Transducer Microphone Oral Airflow Subsystem Oral Microphone Subsystem Figure 3.3: Block diagram of SNORS and its subsystems. The outputs from the airflow sensors are sampled in two different formats: a full-band signal (with a 1 khz anti-aliasing filter) and a 35 Hz low-pass filtered signal. The 35 Hz filter removes the unwanted acoustic component of voiced speech, while retaining the respiratory information relating to the velopharyngeal mechanism. The additional unfiltered signals are provided for future development and are not implemented in this system. To provide a reference for the interpretation of speech, the computer also acquires the acoustic signals derived from both microphones. The developers (McLean at el., 1997) felt it beneficial to provide only the speech envelope to convey this information, providing a less cluttered display. The use of enveloped signals also permits a lower sampling frequency, eliminating the problems of both sampling and screen aliasing. The envelope generator has been implemented using an active half-wave rectifier followed by a 35 Hz low-pass filter. In addition, to facilitate direct speech recordings to tape, both microphones supply an amplified audio output. This is useful for comparable speech intelligibility assessment. 36
47 Chapter 3: Instrumental Assessment Techniques Software The SNORS software has been specifically designed for the IBM or compatible PC running a DOS operating system. The software can be used for therapy, assessment and for the measurement of outcome. For therapy, real-time visual feedback indicates the nasal and oral components of airflow on a simple bar display, allowing patients to visualise their nasal air escape during speech (refer to Figure 3.4). The upper bar reflects the amount of nasal airflow, which moves upwards away from the centre. The lower bar indicates oral airflow and moves downward away from the centre. In addition, maximum airflow indicators and target markers are also provided. Figure 3.4: The SNORS real-time therapy Bar. To objectively measure change, SNORS can also record and then compare data acquired at different sessions. The standard assessment technique consists of the patients saying the words: begin, type, fight, seat, cheese, shoot, smoke, king, missing, end as prompted on the computer screen, typically two seconds apart. These words have been specifically chosen to demonstrate the efficiency of velopharyngeal closure. Each word contains either oral obstruents, requiring velopharyngeal closure, or a combination of nasal and oral 37
48 Chapter 3: Instrumental Assessment Techniques consonants, requiring co-ordinated opening and closing (Ellis et al., 1978). However, a customised word list can be used if required. Limited analysis can be carried out on the resulting data, which is displayed in graphical form at the end of each test, see Figure 3.5. Figure 3.5: A typical SNORS analysis display. The SNORS software partitions the screen into three sections. The top section displays the entire test, with the speech envelope at the top, followed by the nasal and oral airflows. The x-axis represents time, with a full-scale value of 20 seconds. The y-axes depict amplitude, which are optimally scaled to provide a clear trace. To aid comparability, the nasal and oral airflow scales are always identical. A zoom view is provided in the bottom section of the screen, and to allow ease of interpretation, both nasal and oral airflows are shown on the same axis. As can be seen, the nasalance value relating to the displayed airflow is presented at the bottom of the screen. The time scale for the zoom view is variable and dependent on the width of the zoom rectangle in the top section. Finally, the third section to the right of the screen displays basic user functions and brief patient related data. 38
49 Chapter 3: Instrumental Assessment Techniques Interpretation of SNORS Data Interpretation of the traces presented within the SNORS analysis screen is relatively simple if it is considered that the only nasal consonants in English are /m/, /n/ and //, and that vowels should be largely non-nasal unless occurring adjacent to a nasalised sounds. Figure 3.6 shows the airflow patterns associated with the word cheese, for a normal subject, and a subject suffering from motor neurone disease (MND), prior to and after prosthetic intervention. The subject, assessed by a dental surgeon, was considered to have very little voluntary movement of the soft palate. Referring to Figure 3.6a, there is little nasal air emission associated with a normal subject uttering the word cheese. It can be seen from Figure 3.6b however, that the nasal and oral airflows obtained from the MND subject prior to intervention are almost equal in magnitude. To produce the initial affricate // in cheese, a pressure build up in the oral cavity is required. This is difficult if the patient is unable to achieve velopharyngeal closure, since air will leak through the nasal cavity, as in this case. The fricative /s/ at the end of the utterance is also effected in a similar manner. A palatal lift was used to artificially elevate the soft palate in an attempt to facilitate adequate velopharyngeal closure. As can be seen in Figure 3.6c, after insertion of the palatal lift, nasal airflow has been substantially reduced. In addition to the airflow trace analysis, this example also demonstrates the effectiveness of aerodynamic nasalance values. The normal subject achieved an aerodynamic nasalance value of 2%, whereas the MND subject had a relatively high nasalance of 46% prior to prosthetic intervention, reducing to 28% with the dental prosthesis fitted. In trials conducted by Main et al., (1997) normal aerodynamic nasalance values for the word cheese were found to range between 0 and 10%. If more detailed calculations are required, the SNORS analysis package NASE can be used. This allows accurate measurement of aerodynamic nasalance, speed of velar movement and duration of closure. 39
50 Chapter 3: Instrumental Assessment Techniques Nasalance = 2% Oral Airflow Nasal Airflow (a) Normal Subject 2 seconds Oral Airflow Nasal Airflow Nasalance = 46% (b) Motor Neuron Disease subject - no lift 2 seconds Oral Airflow Nasal Airflow Nasalance = 28% (c) Motor Neuron Disease subject - with lift 2 seconds Figure 3.6: Airflow traces for the utterance cheese for (a) a normal subject, (b) an MND subject, prior to prosthetic intervention and (c) the MND subject with a dental prosthesis fitted. 40
51 Chapter 3: Instrumental Assessment Techniques 3.2 Electrolaryngography It is of considerable theoretical interest and practical value to be able to monitor, measure and display aspects of larynx activity in both normal and disordered speech (Abberton and Fourcin, 1997). Endoscopic imagining techniques are frequently used to examine the vocal folds during voiced speech. This is achieved by introducing a rigid endoscope through the mouth, then positioning it just above the larynx. The use of stroboscopic illumination allows the rapid movement of the vocal folds to be visualised. Although the resultant image quality is good, this technique is expensive, invasive and often poorly tolerated by the patient. Electrolaryngography is a non-invasive technique of monitoring vocal fold vibration by means of passing a small electrical current through two electrodes placed on the neck, and measuring any impedance changes. Fabre (1957) first introduced the notion of electrical impedance monitoring in his glottograph. The speech instrumentation survey revealed four systems currently employing electrolaryngography to measure vocal fold activity: Aerophonoscope DR Portable Electroglottograph E.V.A. Laryngograph Of these, Laryngograph was considered the most established system, being used in many voice clinics and research laboratories around the world. In addition, the system is developed and manufactured locally by Laryngograph Ltd who have links with the University of Kent. For these reasons, the Laryngograph was chosen to provide the voicing component on the multiparameter system Laryngograph The Laryngograph is used to provide qualitative and quantitative information on vocal fold vibration, and also forms the basis of PC-based interactive voice therapy (Abberton and Fourcin, 1997). It operates by sensing the electrical conductance between two electrodes placed on the neck, either side of the thyroid cartilage (refer to Figure 3.7). Each goldplated electrode consists of an inner disk surrounded by an outer guard-ring, and is held in 41
52 Chapter 3: Instrumental Assessment Techniques position by means of an elastic neckband. On application of a constant voltage, the Laryngograph measures the varying electrical conductance between the electrodes in terms of the current flowing between them. Its output waveform Lx (larynx excitation) depicts this current flow as a function of time, which will be at a maximum when the vocal folds are in contact and at a minimum when they are apart. Guard-Ring Inner Disk PC Electrode (contact side) Laryngograph Processor Figure 3.7: Laryngograph system configuration Interpretation of the Lx Waveform During voiced sounds the vocal folds close and open many times per second, resulting in a quasi-periodic Lx waveform, as shown in Figure
53 Chapter 3: Instrumental Assessment Techniques Figure 3.8: The Lx waveform with stroboscopic views of the vibrating vocal folds, taken from McGlashen (1998). The above screen dump was taken from Laryngograph s LxStobe software. The images are of the actual vocal folds, taken with a rigid endoscope positioned just above the larynx. The waveforms beneath each image are derived from the Laryngograph processor. The red vertical lines indicate the point on the Lx waveform where the endoscopic image was taken. The four main features of the Lx waveform can be seen quite clearly and are characterised as: I. Closing Phase - a steep rising edge. II. III. IV. Maximum Closure - a maximum peak. Opening Phase - a shallow falling edge. Open Phase - a trough. The relative length of these phases can be changed with voice quality. The phases I to III are often referred to collectively as the closed phase since the vocal folds are in contact, to some degree, from the start of the closing phase (I) to the end of the opening phase (III). As can be seen from Figure 3.8, there is no vocal fold contact in the open phase (IV). The 43
54 Chapter 3: Instrumental Assessment Techniques time interval between successive closures also allows a direct measure of the fundamental period and hence the fundamental frequency Fx (fundamental frequency of excitation). The Fx frequency extracted from the Lx waveform is extremely reliable, as the waveform is unaffected by vocal tract resonance and environmental noise. Figure 3.9 shows speech (Sp) and Lx waveforms for three phonation types, each being produced by a male adult and taken from the vowel in the word dart. In each example, the upper Sp waveform represents the acoustic output from a microphone and the lower Lx waveform is derived from the simultaneous output of a standard Laryngograph. 44
55 Chapter 3: Instrumental Assessment Techniques Sp Lx (a) Creaky Voice 200 ms Sp Lx (b) Falsetto Voice 200 ms Sp Lx (c) Breathy Voice 200 ms Figure 3.9: A selection of voice quality types. Figure 3.9a shows the Sp and Lx waveforms produced by a creaky voice. Although the Lx waveform steps (I-IV) can be identified, the cycles are clearly irregular, giving the voice its creaky quality. As creaky voice occurs at the lower end of the fundamental 45
56 Chapter 3: Instrumental Assessment Techniques frequency range, the time taken for each cycle is generally longer than that found in modal voice (refer to Figure 3.8). Figure 3.9b illustrates the waveforms of a falsetto voice. This type of voice occurs at the higher end of the fundamental frequency range and hence the cycle time is considerably shorter. As the vocal folds in falsetto voice only make contact at their upper edges, the shape of each Lx cycle is rather different from those of a modal voice. There is little contact in the vertical plane, resulting in the generation of a smaller acoustic excitation pulse (Abberton et al., 1989). Thus in terms of vocal fold contact information, falsetto contrasts with modal and creaky voice in that it has approximately equal open and closed phases. The gradient of the Lx waveform at closure is also less steep and approximately equal to the opening phase. Finally, an example of breathy voice is shown in Figure 3.9c. Breathy voice does not always have an oscillatory Lx waveform associated with it, since the vocal folds can vibrate without making contact (referred to as flapping in the breeze by Catford, 1977). In these cases there will be no increase in current flow between the electrodes, since there is no change in soft tissue contact area. However, when breathy voice does involve vocal fold contact there is an associated oscillatory Lx waveform. Fourcin (1978) describes such a quality as vigorous breathy voice, which is characterised by small closure peaks, as the open phase in each cycle is extended allowing more air to escape giving the subjective breathy quality. Several quantitative techniques have been developed for the measurement of voice quality. These include: Closed Quotient - The percentage of each cycle during which the vocal folds are closed. Jitter factor - The percentage frequency variation of vocal fold vibration, over time. Shimmer factor - The percentage amplitude variation of vocal fold vibration, over time. 46
57 Chapter 3: Instrumental Assessment Techniques 3.3 Electropalatography The tongue is an important organ for the production of speech, and it is of both theoretical and practical interest to the speech and language therapist to record details of tongue activity in both normal and pathological speech (Hardcastle and Gibbon, 1997). Techniques for measuring tongue placement during speech have been employed for many years. Abercrombie (1957) examined tongue-palate contact by coating the surface of the roof of the mouth with a dark powder consisting of charcoal and chocolate. The powder is removed when the tongue makes contact with the hard palate, giving an indication of tongue-palate contact. However, since the powder is removed with each subsequent tongue contact only single gestures can be observed. In addition, a permanent record of the contact pattern can only be obtained by photographing the hard palate. Modern imaging techniques have been used to monitor the tongue and other articulators during speech. These include Computer Tomography (CT), Magnetic Resonance Imaging (MRI), X-ray microbeams and Electromagnetic Articulography (EMA). Although CT and MRI scans allow accurate visualisation of tongue placement, they cannot record dynamic tongue movement due to a lengthy image acquisition period. X-ray microbeams overcome this limitation by using a narrow beam of radiation to track the movement of small gold pellets attached to the midline of the tongue. Unfortunately, this technique is invasive, exposes the patient to ionising radiation, and the resulting image is only a two-dimensional view of the tongue s midline. EMA gives similar information to X-ray microbeams, but without the radiation. Pellets are again placed on the midline of the tongue and their movements tracked. But in this case the pellets are transducers and the subject is placed within a magnetic field. As the transducers move, a voltage is induced and recorded. However, this technique is cumbersome to use and is considered suitable for research only. A technique known as electropalatography (EPG) offers considerable advantages over the above methods. This technique records spatial and temporal details of tongue contact with the hard palate during continuous speech, thus providing qualitative and quantitative data on the place of articulation. Electropalatography determines tongue-palate contact by using a special artificial palate containing an array of electrodes embedded on its tonguefacing surface. A small electrical signal, fed to the patient, is conducted through the tongue to any touched electrodes and thence, via an electronics unit, to the display device where the tongue-palate contact is shown. 47
58 Chapter 3: Instrumental Assessment Techniques The speech instrumentation survey revealed 7 electropalatograpy systems currently in use: Palatometer Palatograph EPG.4 Linguaview EPG.2 EPG.3 Linguagraph The Palatometer and Palatograph, developed in America and Japan respectively, have had limited use in the UK. This can be attributed to difficulties in obtaining the customised palates, which do not conform to standards adopted in the UK and the rest of Europe. EPG.4 and Linguaview are electropalatograpy systems manufactured in the UK, which use a well-established palate developed at the University of Reading. These systems provide a real-time view of tongue-palate contact and are used for biofeedback only. EPG.2, EPG.3 and Linguagraph are the full assessment and therapy versions of the preceding two systems. They all provide real-time biofeedback and comprehensive offline analysis. Again, these systems use the Reading palate making them suitable for European use. The principles of operation are also very similar and they all lend themselves to multiparameter adaptation. However, since Linguagraph is developed and manufactured at the University of Kent, it was considered the most appropriate instrument for this application Linguagraph Linguagraph was developed to meet the need for a clinical, user friendly, low-cost electropalatography system (McLean, 1997). The system consists of an artificial palate and an electronics unit, which interfaces to a digital I/O card installed within a standard PC. Custom software, running under DOS, provides visual real-time displays and off-line analysis screens. 48
59 Chapter 3: Instrumental Assessment Techniques Artificial Palate To ensure compatibility with other European EPG systems, Linguagraph utilises the artificial palate developed at the University of Reading. The palate is created from a dental impression of the subject s mouth and is made from a thin acrylic plate (approximately 0.8 mm), which is moulded to fit the hard palate. A total of 62 miniature silver electrodes are embedded within the tongue-facing surface of the palate. To allow inter-subject comparability, the 62 electrodes are placed according to strictly defined anatomical landmarks, such as the front incisors and the junction between the hard and soft palates (Hardcastle and Gibbon, 1997). Individual insulated copper wires provide the electrical connection to each electrode, which are bundled together and exit via the corners of the mouth. Figure 3.10 illustrates the Reading EPG plate. Figure 3.10: The Reading EPG palate, fitted (left) and showing lead-out wires and connector (right) Electronics Unit The Linguagraph electronics unit, illustrated in Figure 3.11, contains multiplexing, threshold, isolation and voltage limiting circuits. 49
60 Chapter 3: Instrumental Assessment Techniques Pseudo-Palate PC Linguagraph Unit Palate Wires Tongue Comparators Multiplexers Isolation Barrier Sensitivity Signal Limiting Signal Path Body Clock Signal Figure 3.11: The Linguagraph system. The signals obtained from the Linguagraph system are derived from the 62 electrodes embedded within the artificial palate. These electrodes act as switch contacts when touched by the tongue, which conducts a voltage limited 800 Hz body clock signal. The body clock, generated by the PC s digital I/O card, is opto-isolated and applied to the patient via an electrode worn on the wrist. Due to the large number of signal lines obtained from the artificial palate, it is impractical to sample each one simultaneously. Therefore, to limit the required number of sample lines, eight 8-1 analogue multiplexers have been employed that present only eight signals to the digital I/O card at any one time. The multiplexer select lines are again generated by the PC s digital I/O card. The resulting signal acquired by the computer is a digital representation of eight contacts on the artificial palate. A total of eight scans are required to capture one frame of Linguagraph data, which results in 100 display frames per second. Each 8-Bit scan is uniquely identified by a 3-Bit address derived from the multiplexer select lines. Using this address the software can accurately reconstruct each frame. The analogue signals are converted into digital form using eight voltage comparators, which differentiate between the returned body clock signal and background noise. However, the quality of the body clock signal obtained from the palate is largely dependent on the individual s resistively and is therefore variable. To overcome this variability, the comparator s voltage threshold may be adjusted externally by a potentiometer. 50
61 Chapter 3: Instrumental Assessment Techniques Software The Linguagraph software has been specifically designed for the IBM or compatible PC running a DOS operating system. The software can be used for both therapy and assessment. In therapy mode a real-time display of the subject s tongue-palatal contact is used as a biofeedback tool, allowing patients to visualise tongue movement during speech. Segments arranged in palatal zones and rows depict the individual electrode contacts. To allow the study of a particular contact pattern, the display may be paused with a single key press. An additional Linguagraph channel can also be attached to the system, and simultaneously viewed in real-time. This allows the therapist to produce model contact patterns for the patient to mimic. Figure 3.12 shows the Linguagraph software in dual channel mode. Figure 3.12: The real-time dual channel Linguagraph display. Optionally, an envelope of the speech signal may be recorded. This is used as a reference to articulation and enables identification of the individual words and phonemes. To enable off-line assessment, the tongue-palate contact patterns together with the speech envelope may be recorded to disk. The Linguagraph analysis package allows the measurement of several important parameters relating to tongue-palate contact. These are listed below. 51
62 Chapter 3: Instrumental Assessment Techniques Alveolar. The amount of tongue-palate contact in the front two rows of the palate. Palatal. The amount of tongue-palate contact in the middle three rows of the palate. Velar. The amount of tongue-palate contact in the rear three rows of the palate. Left lateral. The amount of tongue-palate contact in the left two columns of the palate. Right lateral. The amount of tongue-palate contact in the right two columns of the palate. Midline. The amount of tongue-palate contact in the centre four columns of the palate. Centre of Gravity. The linear centre of gravity of the total contact region, specified as a row number, from front (row 1) to back (row 8) of the palate. Balance. The balance of tongue-palate contact from left to right. Weight. The number of tongue-palate contacts over the entire palate. The analysis package depicts the above parameters as waveforms plotted over time. A snapshot of tongue-palate contact may be displayed by moving a cursor along the waveform axis. In addition, the numerical representation of the parameters, calculated at the cursor position, are shown alongside the trace and to the right of the screen (refer to Figure 3.13). 52
63 Chapter 3: Instrumental Assessment Techniques Figure 3.13: The Linguagraph analysis screen Interpretation of Linguagraph Data A number of idealised tongue-palate contact patterns that have been found to occur during normal speech production are illustrated in Figure Hardcastle and Gibbon (1997) characterise these contact patterns in the following manner: (a) The alveolar stop pattern typically occurs during the closure phase of alveolar plosives and nasals. It is characterised by contact along the lateral margins of the palate and complete closure across the first two or three rows. (b) The velar stop pattern occurs during the closure phase of velar plosives and nasals in the environment of back open sounds. This pattern has minimal contact along the margins of the palate and complete contact across the posterior row. (c) The palatal stop pattern exhibits more excessive lateral contact than in the velar stop pattern, with some contact extending as far forward as rows 2 and 3. Central contact occurs in the posterior two or three rows. This pattern occurs during the closure phase 53
64 Chapter 3: Instrumental Assessment Techniques of velar plosives and nasals in the environment of close front vowels, for example in key. (a) Alveolar Stop (b) Velar Stop (c) Palatal Stop (d) Double Alveolar-Velar (e) Alveolar Grooved (f) Palatal Grooved (g) Apical Figure 3.14: Idealised EPG patterns found in normal speech. (d) The double alveolar-velar pattern occurs during velar-alveolar or alveolar-velar consonant sequences, such as in the word catkin. (e) The alveolar grooved pattern is typical of the stricture during an /s/ or /z/. Contact is complete along both lateral margins and there is a narrow grooved configuration in the anterior two or three rows. The amount of side contact varies with the phonetic content, for example there is more side contact when /s/ is followed by a close front vowel such as in see. (f) The palatal grooved pattern is typical of stricture during // or // and in comparison with /s/ or /z/ has a wider and more posteriorly placed groove. (g) The apical pattern occurs during a /l/ in an open or back vowel environment and is characterised by minimal anterior, central contact. 54
65 Chapter 3: Instrumental Assessment Techniques With the above analysis it is possible to clearly identify the typical tongue-palate contacts patterns associated with normal spontaneous speech. 3.4 Imaging Techniques There are several major advantages of using imaging techniques in the assessment of speech disorders. For example, the need to infer the position or activity of the various anatomical structures during speech is removed; a limitation often associated with other assessment techniques. Rather than focusing on a single area, an image of the entire oropharynx or oral cavity can often be obtained. In addition, imaging techniques can provide real-time pictorial representations of the actual oropharyngeal and laryngeal anatomy during motion (Sonies, 1991). Techniques such as computer tomography and X- rays are preferable when imaging bones and hard tissue structures, whereas soft tissues are best examined on magnetic resonance imaging and ultrasound. The capabilities for viewing the oropharyngeal structure during speech differ with each technique. There is often a trade-off between safety and image quality or among resolution and speed of image acquisition. The inclusion of a synchronised video input in the multiparameter system was considered useful, if only to instil confidence, by allowing established imaging techniques to be viewed alongside the less established multiparameter data. However, since many imaging techniques show only a two-dimensional view of the articulators, the additional parameters may help to reveal phenomena not evident in the image alone. A detailed discussion on all the major imaging techniques is beyond the scope of this thesis, and therefore only the frequently used techniques of videofluoroscopy and endoscopy will be discussed here. A good reference to other imaging techniques is given in Sonies and Stone (1997) Videofluoroscopy Videofluoroscopy records moving X-ray images onto videotape. This provides a view of the lips, jaw, velum and tongue during speech, and yields useful information on the dynamics of these articulators. Typically lateral, frontal and basal views are performed. Figure 3.15 illustrates a lateral videofluoroscopic image of the velopharyngeal mechanism and tongue. 55
66 Chapter 3: Instrumental Assessment Techniques Figure 3.15: Lateral videofluoroscopic image of the velopharyngeal mechanism and tongue (left), and an identical image with both the velum and tongue outlined (right). As can be seen from Figure 3.15 (left) the skeletal structures of the spine, skull and mandible are clearly visible. Slightly less clear but still visible are the structure of the soft palate, pharyngeal wall and tongue. To aid interpretation, an identical image with the soft palate and tongue outlined is also shown (right). It should be noted that the circular disc positioned near the larynx is a Laryngograph electrode. For assessment purposes, videofluoroscopy is often performed during defined speech tasks and the movement of the structures are interpreted in relation to the expected outcome attached to that task (McLean, 1997). The perceived motion qualities of the structures are rated by the clinician to provide a qualitative diagnosis of articulatory function. Therefore, the assessment technique relies on the clinician s ability to visually interpret the images. In addition to motion studies, frame-by-frame analysis is also used to assess articulatory function. This is often combined with synchronised audio to identify acoustic phenomena. For many purposes the standard video capture rate of 25 frames per second is satisfactory. However, articulatory movements can occur between frames recorded at this speed. Björk (1961) found the velum to move as much as 3 mm between frames recorded at 50 frames per second. For research purposes frame rates as high as 150 frames per second have been used. Disadvantages of this imaging technique include: The use of videofluoroscopy is limited by the need to employ ionising radiation. 56
67 Chapter 3: Instrumental Assessment Techniques Perceptual judgement is notoriously subject to bias and the ability to share information between clinicians is often hampered by the general vagueness with which the basic parameters are defined (Fletcher, 1970). Videofluoroscopy presents only a two-dimensional image of three-dimensional structures. This can be misleading when analysing certain structures. The image quality of soft tissue structures is low due to limitations inherent in the radiographic technique. Videofluoroscopy is an expensive procedure and may not be available to all clinical centres Endoscopy Endoscopes are optical instruments consisting of a viewing lens, a rigid or flexible shaft, and an eyepiece that can also be attached to a camera. They require a powerful light source to illuminate internal structures and are often used in conjunction with a television monitor and video recording equipment. Endoscopes can be passed through body openings and pathways to reach internal organs that cannot otherwise be directly visualised. These instruments have a wide application in medicine and are especially useful in viewing the velopharyngeal valve and the larynx. For velopharyngeal examination a flexible endoscope is typically passed through one of the nostrils and positioned above the velopharyngeal structures (refer to Figure 3.16). Figure 3.16: Three nasal-endoscopy images of the velopharyngeal mechanism. In the above example, the velum appears in the lower left-hand quadrant of each image, and in consecutive frames moves posteriorly, while the lateral pharyngeal walls move together. As mentioned, the interpretation of radiographic images can be notoriously 57
68 Chapter 3: Instrumental Assessment Techniques difficult, but the direct endoscopic image of the velopharyngeal valve greatly clarifies the clinical diagnosis of velopharyngeal incompetence. This allows the clinician or surgeon to define the dynamics of velopharyngeal closure and decide if surgery is appropriate. To image the vocal folds, a rigid endoscope is often introduced through the mouth and positioned just above the larynx (refer to Figure 3.8). When imaging the larynx during phonation, the rapid opening and closing of the vocal folds will appear blurred on a continuously illuminated endoscopic image. However, stroboscopic illumination, during which, sort intense bursts of light are presented at regular intervals, can give the illusion of a freezing or slow-motion effect. If the phonation is perfectly regular, and the frequency of the stroboscopic illumination matches the frequency of vocal fold vibration, then the vibration will appear frozen. If there is a slight mismatch in frequency, then a slow motion effect is produced. This technique allows the clinician to clearly examine the motion of the vocal folds during phonation. The disadvantages of endoscopy include: The procedure is uncomfortable, requiring a local anaesthetic to enable its use. This makes the technique only suitable for patients who can tolerate endoscopic insertion. Due to the physical dimensions of the endoscope, the technique is not suitable for small children. Mucosa and other obstructions may obscure the images. Positioning of the endoscope is difficult during imaging, requiring a high level of skill. The technique is invasive and interferes with the production of certain speech sounds. The procedure is expensive and may not be available to all clinical centres. 3.5 Acoustic Analysis The field of acoustics involves the study of the generation, transmission and modification of sound waves (Minifie et al., 1973). Since acoustic analysis is relatively inexpensive and non-invasive its clinical use has become widespread. Acoustic assessment can provide the clinician with important information relating to the speech production process such as laryngeal function, articulatory function and intersystem co-ordination. In addition, it can highlight aspects of the speech signal that may be contributing to the perception of deviant speech production (Thompson-Ward and Theodoros, 1998). 58
69 Chapter 3: Instrumental Assessment Techniques A number of systems specifically designed for the acoustic analyses of speech are commercially available: CSpeech CSRE DSP Sona-Graph ILS-PC MacSpeech Lab II MSL CSL Signalyze The above systems differ in features such as cost, range of capabilities and speed, but they all apply digital signal processing techniques to the acquired speech signal. Rather than focus on a particular system, this section introduces the features common to all, namely the oscillographic, FFT and spectrographic displays. These displays are also relevant to a multiparameter system because they yield information on the speech outcome rather than mechanism alone Oscillographic Displays An oscillographic display is a two-dimensional waveform, plotting amplitude on the y-axis and time on the x-axis. Produced by an oscilloscope or computer, the oscillographic display is used to measure acoustic energy and classify speech into defined phonetic units. With this type of acoustic display, sounds produced with a relatively open vocal tract and vibrating vocal folds are easily distinguished from unvoiced sounds produced with a constricted vocal tract. This allows clear distinctions to be made between the voiceless consonants, voiced consonants and vowel sounds produced during an utterance (Weismer, 1984). Figure 3.17 illustrates an oscillographic display of the utterance missing. 59
70 Chapter 3: Instrumental Assessment Techniques Amplitude 1 Second Figure 3.17: Oscillographic display of the utterance missing. Oscillographic displays are easy to generate and can provide information on a variety of acoustic parameters such as word duration, amplitude, fundamental frequency and the presence of some articulatory functions such as voice onset time and voiced-voiceless distinctions. However, this type of display yields little or no information on the resonant frequencies of the vocal tract or how these frequencies change over time FFT Displays A fast Fourier transform (FFT) is based on the theorem that complex periodic waveforms can be decomposed into a series of sinusoidal components of certain amplitude and phase. Each sinusoidal component derived from a complex periodic waveform is an integer multiple of the fundamental frequency. Fourier s theorem permits the transformation of a waveform into a spectrum where the amplitude of each component frequency is represented. On a typical FFT display the frequency components are plotted on the x-axis and their relative amplitude on the y-axis. The importance of this theorem for speech analysis is its ability to extract the fundamental frequency and associated harmonics, together with an approximation of the vocal tract resonances. Figure 3.18 illustrates an FFT derived from the utterance king. 60
71 Chapter 3: Instrumental Assessment Techniques F1 Amplitude (db) F2 F3 0 Hz 2.5 khz 5 khz 7.5 khz 10 khz Figure 3.18: An FFT derived from the utterance king. The sharp spectral peaks evident in Figure 3.18 represent the harmonics of the fundamental frequency that are associated with vocal fold vibration. However, some spectral peaks have greater amplitude than others since they lie near a vocal tract resonance, which further amplifies these frequencies. The regions of the spectrum where a group of harmonics exhibits relatively greater amplitude are known as formants, and are labelled F1, F2 and F3 in Figure It should also be noted that the formant structure appears to break down above 5 khz as the noise introduced by turbulent airflow through the vocal tract begins to dominate. Although FFT s yield useful information relating to fundamental frequency and the resonant frequencies of the vocal tract, they provide no information on how these frequencies change with time Spectrograms A spectrogram provides an alternative method of visually representing an acoustic signal. Again, it is created by applying an FFT to the sampled acoustic signal, where it is separated into sinusoidal waveforms of different frequency. However, the resultant frequencies are represented vertically on the spectrogram with time plotted horizontally. Amplitude, or loudness, is depicted by grey scale or colour intensities. 61
72 Chapter 3: Instrumental Assessment Techniques It is the spectrogram s ability to display the rapid variations in the acoustic signal that makes it such a valued instrument. Spectrograms provide analysis of the frequency components of the acoustic signal, either in terms of the harmonics it comprises or of the peaks of resonance that it contains. The relative degree of darkness also conveys information about the signal strength. Generally, two forms of the spectrogram are used, namely wideband and narrowband Narrowband Spectrograms These are useful for making measurements on fundamental frequency and intonation. A prominent characteristic of the narrowband spectrogram is the narrow horizontal bands, which represent the harmonics of the glottal source (refer to Figure 3.19). The darker bands represent harmonics that are closest to the peaks of resonance in the vocal tract. The lighter bands represent harmonics whose frequencies are further away from the resonance peaks. Narrowband spectrograms exhibit good frequency resolution at the expense of time resolution, and are therefore not suitable for making temporal measurements such as voice onset time. The actual bandwidth of this type of spectrogram is usually somewhere between 30 and 50 Hz (Borden and Harris, 1984). 8 khz 4 Seconds Figure 3.19: A narrowband spectrogram of the utterance speech and hearing science, taken from Borden and Harris (1984) Wideband Spectrograms Measurements of the changing resonance of the vocal tract (formants) are generally made with this type of spectrogram. The most noticeable feature of the wideband spectrogram is 62
73 Chapter 3: Instrumental Assessment Techniques the relatively broad bands of energy that depict the formants (refer to Figure 3.20). The centre of each energy band is taken to be the frequency of the formant and the range of frequencies occupied by the band is taken to be the bandwidth. Although adjacent harmonics may appear smeared together, wideband spectrograms do exhibit good time resolution. Therefore, information relating to the timing of changes in the vocal tract resonance is more reliably obtained than with narrowband spectrograms. The bandwidths used to generate wideband spectrograms are generally between 200 and 500 Hz (Borden and Harris, 1984). 8 khz 4 Seconds Figure 3.20: A wideband spectrogram of the utterance speech and hearing science, taken from Borden and Harris (1984). 63
74 Chapter 3: Instrumental Assessment Techniques 3.6 Summary From this discussion on existing instrumental techniques, it is evident that many speech and language therapists would benefit from the proposed multiparameter system. Although current instruments are extremely useful, giving excellent measures of individual articulatory function, few are able to measure the co-ordination of the main articulators. In addition to providing this crucial function, a multiparameter system offers several other major advantages. For example: The relationship between mechanism and outcome can be established. The need to learn a variety of individual systems is removed, significantly reducing training overheads. A modular design allows the system to be tailored to suit individual requirements. Data from the various modules share the same file format, thus offering 100% compatibility. Archiving time is significantly reduced because of single media data storage. From the vast array of instruments assessed the following were chosen to provide the core components of the multiparameter system: SNORS measuring respiration and velopharyngeal closure. Laryngograph measuring larynx excitation. Linguagraph measuring tongue-palate contact. Since both acoustic analysis and synchronised video input can be achieved with standard computer interface cards, their provision requires no specialised instrumentation. 64
75 CHAPTER 4 PROJECT SPECIFICATION Chapter 3 concluded by stating the multiparameter system would be based on SNORS, Linguagraph and Laryngograph, with provision for acoustic analysis and direct video input. This chapter outlines the main user requirements of such a system, derived from discussion with clinicians and others working in relevant fields. It then provides a full technical specification, and concludes with a system overview in terms of both hardware and software. 4.1 System Specification As part of a feasibility study, conducted by members of the Medical Electronics Research Group, a questionnaire was sent out to clinicians in five different countries: France, Greece, Holland, Sweden and the United Kingdom. The questionnaire addressed multiparameter issues such as: Parameters to be measured. Perceived usefulness. Main areas of use. The majority of clinicians consulted felt that the parameters intended to be measured (i.e. respiration, larynx excitation, velopharyngeal closure, tongue-palate contact and speech outcome) were the most useful. The inclusion of a direct video input was also considered beneficial, not only for imaging techniques such videofluoroscopy and nasendoscopy, but also for monitoring lips, jaw, posture and facial grimaces during speech. Although one or two people initially questioned the clinical usefulness of multiparameter measurement, the vast majority of clinicians were very much in favour of such a system. They commented on problem patients, where conventional therapy had failed to identify the exact nature of the condition. Indeed, most felt that large numbers of patients on their caseloads could potentially benefit from such a system. Furthermore, they felt that multiparameter assessment and biofeedback could potentially save clinician s time, improve targeting of treatment, improve therapy and could well improve outcome. However, virtually all clinicians expressed concern that the system should be easy to use 65
76 Chapter 4: Project Specification and affordable. They mentioned that the extensive capabilities found on researchorientated equipment detract from the clinical usefulness and usability of such instruments. From questionnaire feedback, and discussion with clinicians and other members of the Medical Electronics Research Group, the following system specification was complied: Comprehensive yet easy to use. Clinician friendly. Patient friendly. Non-invasive. Inexpensive. Modular. Allow the measurement of multiple articulatory functions. Offer ease of correlation between the various parameters. Allow objective assessment of articulatory interaction. Permit real-time displays for biofeedback. Feature high quality audio/video recording and playback. Capable of permanent data storage and hard copy printouts. On-line help and full documentation should be available. Comply with EN and EC directive 93/42/EEC. The above system specification provides a clear account of the main user requirements. However, the following section discusses EN and EC directive 93/42/EEC in greater detail EN and EC directive 93/42/EEC Since the multiparameter system is intended for clinical use in European Hospitals and Universities, it must comply with EN and EC directive 93/42/EEC. EN addresses the general requirements for the safety of medical electrical equipment. It aims to ensure that adequate protection for the user, patient and environment 66
77 Chapter 4: Project Specification exists without restricting the normal function of the instrument. According to EN (section 2.2), the multiparameter system is classified as: Class I Equipment in which protection against electric shock does not rely solely on the basic insulation, but also the connection of all accessible conductive parts to the protective earth of the mains supply. This ensures that these parts cannot become live in the event of basic insulation failure. Type B - Equipment that is suitable for external and internal connection to patients, excluding direct cardiac applications. The EC directive 93/42/EEC sets minimum harmonised safety standards for medical devices and their accessories sold in the EC. Since June 1998, only devices that meet these standards and carry a CE mark may be sold in the EC. Products conforming to these standards are entitled to affix a CE mark, and will be able to be sold throughout Europe without restriction. Devices which are custom-made, or which are intended for clinical investigation need not bear a CE mark but may still be sold throughout Europe without restriction. However, equally stringent procedures and documentation apply. The directive relates to any instrument, apparatus, appliance, material or other article, whether used alone or in combination for the purpose of: Diagnosis, prevention, monitoring, treatment or alleviation of disease. Diagnosis, monitoring, treatment or alleviation of or compensation for an injury or handicap. Investigation, replacement or modification of the anatomy or of a physiological process. Control of conception. The minimum safety standards are shown in Annex I (Medical Devices Directive, 1993). They include general requirements for safe product manufacture, design and construction. Devices are divided into four classes, depending on the risks they pose to humans. Since the multiparameter system in minimally invasive it falls within Class 1, the lowest risk classification. In this case the manufacturer must follow the procedure referred to in Annex VII (Medical Devices Directive, 1993) and draw up the EC declaration of conformity required before placing the device on the market. 67
78 Chapter 4: Project Specification 4.2 Technical Specification In addition to the hardware units already introduced (i.e. SNORS, Linguagraph and Laryngograph), the multiparameter system requires a wealth of other supporting modules to function. These include a computer, operating system, data acquisition card, sound card and a video acquisition card. This section outlines the technical specification for each of these modules Operating System Since interaction with the system is via computer, the operating system and user interface form a vital component of the multiparameter system. The major requirements, as outlined in the above system specification, are that the system should be easy to use, flexible and user friendly. There are a number of computer operating systems available such as DOS, Window 95/98/NT, Unix and Linux. However, of these Microsoft Windows has emerged as the most popular graphical user interface environment (Petzold, 1996). Programs written for Windows have a consistent appearance and command structure. They are often easier to learn and use than conventional DOS programs, which are now becoming obsolete. In addition, Windows provides a wealth of built-in routines, that allow the use of menus, dialogue boxes, scroll bars, and other components of a friendly user interface. Windows programs also run on a variety of hardware configurations allowing the programmer to treat peripherals such as the keyboard, mouse, video display and printer in a device independent manner. Certain aspects of the Windows operating system also lend themselves to multi-channel data acquisition. Since a number of Microsoft products, such as Word and Excel, are already well established in speech and language clinics, and based on the properties described above, Windows 95/98 was considered the most appropriate operating system for this application Data Acquisition Card The computer uses a data acquisition card to sample the various analogue and digital signals produced by SNORS, Linguagraph and Laryngograph. There are numerous multifunction data acquisition cards commercially available, such as: DAQCard-700, National Instruments. DAS-1202, Keithley Metrabyte. 68
79 Chapter 4: Project Specification AD1200, Brain Boxes. PCI-ADC, Blue Chip Technology. They all offer multiple analogue inputs, extensive digital I/O capabilities and programmable timers, which are essential features for this type of application. However, the Medical Electronics Research Group have used the DAS-1202 card in many of their projects, including SNORS. Made by Keithley Metrabyte, this multifunction analogue and digital interface board installs directly into an ISA expansion slot on the host computer. The DAS-1202 offers 16 single ended analogue inputs with 12-Bit resolution at up to 100 ksamples/s. A three channel programmable timer provides timing for the analogue to digital conversions. Data transfer may take place in one of three ways: program control, interrupt service routine (ISR) or direct memory access (DMA). However, DMA mode is considered the most appropriate for use with pre-emptive operating systems such as Windows 95/98, and it also allows maximum data transfer rates. In addition to the analogue channels, the DAS-1202 features 32 digital I/O lines that can be used to control peripheral devices such as multiplexers, and read or set external status lines. Not only does the established DAS-1202 offer the speed and flexibility required for this project, its use also allows backward compatibility with current users of the SNORS system Sound Card The high quality audio required for playback and detailed waveform analysis is captured via a standard 16-Bit stereo sound card. Sound cards use DMA data transfer techniques to record audio at sample rates up to 44 khz per channel. However, they can also be programmed to record at rates of 22 khz, 11 khz and 8 khz, which is often useful when limited storage space is available. The sound card may be controlled in a number of ways; for example the Windows Media Control Interface (MCI) provides standard commands for recording and playing multimedia resource files. Using commands such as MCI ensures system compatibility with any 16-Bit sound card supported under the Windows operating system Video Acquisition Card. There are various ways of acquiring video images with a standard PC. The so-called WinTV cards offer a low-cost method of obtaining large amounts of video data at up to 69
80 Chapter 4: Project Specification 25 frames a second. This is achieved by incorporating hardware compression techniques and transferring data directly to disk. Unfortunately, many of these compression algorithms remove subtle image changes between consecutive frames. The resulting distortion is acceptable when viewing TV broadcasts, but the subtle changes considered essential in videofluoroscopic interpretation could be lost. Frame grabber cards provide an alternative method of acquiring video data. Rather than transferring images directly to disk, they are efficiently stored in memory. Although this technique requires a considerable amount of memory, it does permit higher frame rate acquisition without data compression. For this reason, the frame grabber option was considered more suitable for this application. From the frame grabbers available, the Matrox Meteor was chosen to acquire the video data. The Meteor, which installs directly into a PCI expansion slot on the host computer, is capable of recording colour or monochrome images at rates of up to 30 frames a second. Image data is transferred directly to the host computer s memory using interrupt service routines. The card supports the following camera formats: PAL, SECAM, CCIR, NTSC and RS170, which meet the majority of clinical requirements. The Matrox Imaging Library (MIL) package is used to program the Meteor, which enables customised software to quickly grab, display and enhance images PC Specification What is now considered to be an entry level IBM or compatible PC is more than adequate to support a multimedia application of this type. At present, the specification for this system is as follows: Pentium II 400 MHz processor. 64 MB RAM (expandable to 256 MB for video option). 10 GB Hard Disk Drive. 8 MB AGP graphics card supporting 24-Bit true colour. 17 inch SVGA colour monitor. Removable storage device, such as a PD drive or CD writer. 16-Bit, stereo sound card. 70
81 Chapter 4: Project Specification A spare ISA expansion slot (for DAS-1202 data acquisition card). A spare PCI expansion slot (for optional Meteor frame grabber card). 4.3 Hardware Overview With the key modules introduced it is now possible to revise the system block diagram illustrated in chapter 1. Palate Wrist Strap Linguagraph 1 Palate Wrist Strap Linguagraph 2 PC Sound Card Mask SNORS Interface DAS-1202 Meteor Electrodes Laryngograph Auxiliary Input Video Source Figure 4.1: A revised system block diagram. In terms of hardware, the main design issues address the interface module, which provides the following functions: External module connection point. Enveloped Laryngograph Lx signal. High and low waveform resolution switching. Fundamental frequency derivation from Laryngograph. Dual channel EPG. Audio signal conditioning. 71
82 Chapter 4: Project Specification Automatic module detection. Auxiliary channel input. The sections that follow give an overview of each function listed above (full technical details are discussed in chapter 5) External Module Connection With the exception of audio and video, all signal and control lines are transferred to and from the computer via a 37-way connector mounted on the DAS In addition, this connector provides the power requirements for the interface unit, SNORS and Linguagraph. The power for Laryngograph is derived from its own internal battery. These signal, control and power lines are made available to the individual modules through the interface unit, which connects to the DAS-1202 via a multi-core cable. With the exception of video, the interface unit provides a common termination point for all external modules and facilitates a single connection to the host computer s data acquisition card. All data signals transferred to the interface unit are routed through additional circuitry to the analogue input lines of the DAS-1202, where they are sampled under software control Enveloped Lx Signal As discussed in chapter 3 (section 3.1), in addition to the unmodified waveforms, SNORS also produces an envelope of the speech signals. This provides a measure of waveform intensity over time, which results in a less cluttered display that is often considered to ease data interpretation. Due to the wealth of information available, this method of data reduction is especially relevant to the multiparameter system. To take advantage of this feature and also to provide compatibility with SNORS data, an enveloped version of the Lx waveform is generated within the interface unit. The resulting voice intensity waveform is useful for measuring parameters such as voice onset time and voice duration High and Low Waveform Resolution Switching Although envelope detection and low-pass filtering are extremely useful data reduction methods, they do remove any high frequency components that may be of interest to the clinician. Therefore, to enable them to choose between the original waveform and its envelope or filtered counterpart, a switching mechanism has been incorporated within the design. Table 4.1 identifies the channels concerned and their relative switch positions. 72
83 Chapter 4: Project Specification High Resolution Nasal speech waveform Oral speech waveform Combined speech waveform Lx waveform Nasal airflow waveform Oral airflow waveform Combined airflow waveform Low Resolution Nasal speech envelope Oral speech envelope Combined speech envelope Lx envelope Low-pass filtered nasal airflow Low-pass filtered oral airflow Low-pass filtered combined airflow Table 4.1: Switchable data channels Fundamental Frequency Derivation from Laryngograph Speech and language therapists often use the measure of fundamental frequency in the assessment of voice disorders. Although this can be achieved with acoustic analysis, the fundamental frequency (Fx) derived from the Laryngograph is extremely reliable as the waveform is unaffected by vocal tract resonances and environmental noise. The inclusion of a frequency to voltage circuit, trigged by the Lx waveform, provides a simple means of generating this important parameter Dual Channel EPG As described in chapter 3 (section 3.3), dual channel EPG allows the therapist to produce model contact patterns for the patient to mimic. This useful feature has been implemented by incorporating two Linguagraph sockets that route EPG data through a digital multiplexer. This technique enables the acquisition of both Linguagraph channels using a single set of data lines Audio Signal Conditioning The unmodified nasal and oral speech signals, derived from the SNORS microphones, are combined using a resistor network that is optimised to match the input impedance of the sound card. The resulting signal is fed directly to the left channel of the sound card, and used to record high quality audio for playback and acoustic analysis. The right sound card channel is used to record an impedance matched Lx waveform. Again, this can be used for playback but is primarily intended for waveform analysis and the derivation of additional voicing parameters, such as shimmer, jitter and closed quotient. 73
84 Chapter 4: Project Specification Automatic Module Detection Ease of operation is an essential design consideration for any instrument intended for clinical use. Therefore, to automate the parameter selection process, the connection status of SNORS, Laryngograph, and both Linguagraphs are made available to the digital inputs of the DAS The software is then able to interrogate these status lines and enable (or disable) the various parameters as appropriate Auxiliary Channel To add a limited amount of system flexibility, the provision for a single auxiliary channel input has been include in the design. The auxiliary channel is fully synchronised and can accommodate analogue signals in the range of ±2.5 V. Useful additional parameters could include: Intra-oral pressure. Lip movement. Jaw movement. 4.4 Software Overview The software forms the most complex element of the multiparameter system. Its primary function is to allow formal assessment of articulatory function during defined speech tasks, such as those performed in conventional speech assessment. The facility to analyse the results of the assessment is also desirable, as is the ability to save, retrieve and print them. A further requirement is the provision of biofeedback through the use of real-time visual displays. This section gives an overview of the program structure and discusses how the above requirements have been implemented. A more in-depth technical description of the software is given in chapters 6 and C and the Multiple Document Interface Written in C, the software has been specifically designed for Windows 95/98. The program interacts directly with Windows via the Application Programming Interface (API) which is the most fundamental, versatile and powerful way to program Windows (Petzold, 1996). User interaction is accomplished via a Multiple Document Interface (MDI), which 74
85 Chapter 4: Project Specification is a specification that defines the standard user interface for applications written for Microsoft Windows. An MDI application enables the user to work with more than one document at a time. Each document is displayed in a separate child window within the client area of the application s main window. This allows the clinician to simultaneously view a large number of parameters in a variety of formats. The example illustrated in Figure 4.2, shows three child windows displaying a selection of speech parameters. The window to the right of the display features the following waveforms (top to bottom): speech intensity, nasal airflow, oral airflow, voicing intensity and fundamental frequency. A single EPG frame showing tongue-palate contact is illustrated in the bottom left-hand window. Finally, nasal and oral airflow intensities are shown in the top left-hand window. Figure 4.2: The Multiple Document Interface. The software can be partitioned into seven main groups: Main application window. Real-time windows. 75
86 Chapter 4: Project Specification Test protocol. Test analysis windows. File handling. Printing. Help. Figure 4.3 illustrates the interaction of these groups. Main Application Window File Handling Real-Time Windows Test Protocol Help Printing Test Analysis Windows Figure 4.3: High-level software structure The Main Application Window The main application window is launched at the start of program execution and is responsible for much of the system initialisation. However, its primary function is to provide a means of initiating the various child windows and providing a general workspace for them to function. A simple to use menu bar allows access to all available program options. As the user navigates the mouse over the various menu options, helpful text prompts appear in the status bar at the bottom of the window. A toolbar shortcut is also available for the more frequently used options. 76
87 Chapter 4: Project Specification Real-Time Windows In therapy mode, biofeedback displays are used to view the various speech parameters in real-time. Instantaneous two-dimensional displays, showing single or multiple parameters, are generally the most useful for therapy as they are uncluttered and easy to interpret. They provide clear visual feedback to the patient, allowing the effect of any speech corrections to be observed. The multiparameter system provides the following instantaneous displays: Bar speech intensity, airflow, larynx excitation and fundamental frequency. EPG tongue-palate contact. FFT speech intensity, fundamental, harmonic and formant frequencies. Video lip movement, jaw movement, facial grimace, posture, etc. In addition to these instantaneous displays, trend displays are also available. These show information about the dynamics of articulation. Traces scan across the display from left to right in a fashion similar to that of an oscilloscope. These are often useful in therapy because they allow patients to observe the dynamics of their speech during the utterance of complete words or phrases. The traces may be frozen at any point during the scan, allowing the clinician to discuss the display with the patient. The following trend displays are available: Scope speech intensity, airflow, larynx excitation and fundamental frequency. Wave speech intensity and larynx excitation. Spectrogram speech intensity, fundamental, harmonic and formant frequencies. All of the above displays execute within their own child window, and can be launched quickly from the application window s toolbar Test Protocol To conduct a formal assessment, it is necessary to define a protocol that allows comparable measurements to be performed. The test protocol adopted by the multiparameter system is similar to that of SNORS, which requires the patient to utter a number of words as prompted by the computer. At the beginning of each new test the user is presented with a 77
88 Chapter 4: Project Specification dialogue box that contains a variety of options. Using these options the user can tailor each test to suit the individual patient requirements. The available options include: Word list selection. Word display period adjustment. Sample frequency adjustment. Parameter selection. Airflow sensor calibration. Sound card configuration. Video frame grabber configuration. Patient information data entry. Customised analysis display options Test Analysis Windows Depending on the parameters selected in the test protocol, at the end of each test a multiple window display is generated (refer to Figure 4.2). Analysis is performed on the main Test Scope window by positioning a cursor over areas of interest. A variety of waveform analyses, relating to cursor position, can be displayed to the right of the window and also below on the application window s status bar. Test analysis versions of the twodimensional and trend displays, described in the real-time section, are also available. As the cursor is positioned over the test waveforms, any active two-dimensional displays change to reflect the data at the current cursor position. Active trend displays show an additional cursor that mimics the main test cursor. In the example illustrated in Figure 4.2, tongue-palate contact and nasal/oral airflow is shown for a cursor positioned 1.05 seconds into the utterance of the word missing. To aid waveform interpretation, audio playback of the displayed waveforms may also be activated. When a cursor is active, audio playback will commence from the cursor position. This is often useful for isolating specific speech sounds File Handling The ability to save and retrieve files is an essential feature of the system, and has been implemented using the standard Windows Save and Open dialogue boxes. These allow the 78
89 Chapter 4: Project Specification user to create meaningful file names and to navigate through the Windows file structure. It is possible to save all real-time and test windows to disk. When saving the contents of a real-time window, the user simply suspends the display prior to initiating the Save dialogue. On file retrieval, the user is presented with a child window that is identical to the original. When saving the main Test window, the layout of the entire application window is attached to the file. Therefore, a retrieved Test file restores the original layout, complete with any child analysis windows that may have been saved with it. In order to prevent accidental data loss, a warning message is generated if the user attempts to close an unsaved test window Printing The standard Windows Print dialogue is invoked to perform this useful function. This allows the user to select the printer, paper size, orientation, number of copies, etc. It is possible to produce a hard copy of all the available child windows. The program has adopted the popular WYSIWYG (what you see is what you get) convention for printing the various windows. Printouts are not only invaluable for clinical records but are often retained by the patient for a personal record of achievement Help An important element of a user-friendly application is readily available online help. A comprehensive help document can be accessed from the Help menu option located on the application window s menu bar. The help document provides the user with guidelines on how to use the system and also defines the parameters measured. The user can access information by either looking through the contents page or typing key words in the search index. 79
90 Chapter 4: Project Specification 4.5 SNORS+ In an effort to reduce the overall system cost and complexity, SNORS and the interface unit have been combined to form a single multipurpose board. In view of this, the author felt that SNORS+ would be a suitable name for the multiparameter system (since it is based on SNORS plus other instruments). A photograph of the complete SNORS+ system is shown in Figure 4.4. Figure 4.4: The SNORS+ system. 80
91 DAS-1202 Connector CHAPTER 5 HARDWARE IMPLEMENTATION A functional description of the project s hardware, namely the SNORS+ interface unit, was given in chapter 4. This chapter discusses the technical aspects of the hardware, giving a detailed account of the individual elements that comprise the interface unit. These elements, and their interconnections, are illustrated in Figure 5.1. Auxiliary Connector Module Status Lines Linguagraph 1 Connector Linguagraph Interface Linguagraph 2 Connector Power Supply Fx Generator Laryngograph Connector SNORS Connector Lx Envelope Generator Waveform Resolution Switching Audio Signal Conditioning Sound Card Connector Figure 5.1: Block diagram of the SNORS+ interface unit. Appendix A contains a compete set of circuit diagrams. 81
92 Chapter 5: Hardware Implementation 5.1 Linguagraph Interface The main advantage of SNORS+ over standalone instrumentation is its ability to measure and display the various speech parameters simultaneously. It is therefore essential to the successful operation of SNORS+ that the acquired signals are properly synchronised, since loss of synchronisation may result in the misinterpretation of data. The primary function of the Linguagraph interface is to ensure that digital EPG data is acquired in complete synchrony with the analogue data. It also provides signal conditioning and facilitates dual channel mode. A block diagram of the Linguagraph interface is illustrated in Figure 5.2. Linguagraph 1 Linguagraph 2 Input Buffers Multiplexer Signal Conditioning EPG Data Common Linguagraph Body Clock & Address Lines Clock Generator Channel Select Sync Pulse Dual Enable Clock Clear DAS-1202 Linguagraph Interface Figure 5.2: Linguagraph interface Linguagraph Overview The signals obtained from Linguagraph are derived from the 62 electrodes embedded within the artificial palate. The electrodes act as switch contacts when touched by the tongue which, depending on the required frame rate, conducts an 800 Hz or 1600 Hz body clock signal that is relayed to the host computer. The body clock, generated by the DAS programmable timer, is opto-isolated and applied to the patient via an electrode worn on the wrist. Using the programmable timer to generate the body clock ensures the Linguagraph data are synchronised with SNORS and Laryngograph data. Due to the large number of signal lines obtained from the artificial palate, it is impractical to sample each one simultaneously. Therefore, to limit the required number of sample lines, eight 8-1 analogue multiplexers present only eight signals to the data acquisition card at any one 82
93 Chapter 5: Hardware Implementation time. The multiplexer select lines are generated within the Linguagraph interface by a 4- Bit synchronous counter attached to the body clock. The resulting signals sampled by the DAS-1202 are a digital representation of eight of the contacts on the artificial palate. A total of eight scans are required to capture one frame of Linguagraph data, which results in 100 (or 200) display frames per second. Each 8-Bit scan is uniquely identified by a 3-Bit address derived from the multiplexer select lines. Using this address the software can accurately reconstruct each EPG frame. The sections that follow describe how the Linguagraph interface synchronously controls both Linguagraph units, and how the resulting data has been adapted for multiparameter use Input Buffers Since the digital signals obtained from each Linguagraph are derived from open collector opto-isolators, pull-up resistors are required to generate the necessary signal voltages. Once generated, these signal voltages are buffered and reshaped by schmitt-triggered inverters before being routed to the digital multiplexers Dual Channel Multiplexers It is expected that future Linguagraph designs will measure tongue-palate proximity. This method should increase the overall sensitivity of the system since the tongue often approaches the palate electrodes but fails to make contact. To facilitate this approach an analogue system would be required and therefore, to ensure future compatibility, analogue sampling techniques have been employed here. However, due to the limited number of DAS-1202 analogue inputs, data multiplexing techniques have been used to reduce the necessary sample lines. The multiplexing and associated control logic is shown in Figure
94 Chapter 5: Hardware Implementation Figure 5.3: Multiplexers and clock generator. Two 74158, 4-Bit, 2-1 multiplexers combine to produce the 8 data lines required for a single Linguagraph channel. Jointly, the multiplexers control the Linguagraph channel presented to the DAS-1202 analogue inputs at any given time. Data from Linguagraph 1 are routed through the multiplexers when their data select pins are low, and from Linguagraph 2 when they are high. In single channel mode the data select pins are held low to ensure that Linguagraph 1 is permanently active. However, in dual channel mode, the select pins are controlled by a signal derived from the clock generator The Clock Generator A pacer clock generated on board the DAS-1202 triggers each analogue conversion scan. The DAS-1202 is programmed to acquire analogue data in burst mode, which ensures the conversion of all 16 analogue channels is initiated with each rising edge of the pacer clock. Therefore, to ensure the Linguagraph data are fully synchronised with SNORS and Laryngograph data, the pacer clock also acts as the EPG body clock. In addition, the pacer clock also triggers a Bit synchronous binary counter. The counter is responsible for generating the 3-Bit address lines that control Linguagraph s internal multiplexers. These are derived from the three least significant bits 84
95 Chapter 5: Hardware Implementation of the counter (refer to Figure 5.3). The fourth bit provides two functions: dual channel enable and synchronisation checking Dual Channel Enable As described in section 5.1.3, the multiplexers facilitate dual channel Linguagraph mode. This is achieved by controlling the multiplexer select pins with the fourth bit of the counter. The timing diagram relating the outputs of the binary counter to the Linguagraph control lines is illustrated in Figure 5.4. Channel 1 Channel 2 Pacer/Body Clock Address Line 0 (QA) Address Line 1 (QB) Address Line 2 (QC) Channel Select (QD) Sync. Pulse Figure 5.4: Timing diagram for the 4-Bit binary counter. As can be seen from Figure 5.4, the Linguagraph data acquired when the fourth bit of the counter is low relates to channel 1 and switches to channel 2 when it is high. Multiplexing the Linguagraph data lines in this manner effectively reduces the frame rate by half. However, in real-time this still represents 50 or 100 frames a seconds, which appear flicker free to the human eye. To permit maximum frame rates when acquiring Linguagraph data in analysis mode, only channel 1 is enabled. This is achieved by clocking the channel select pins via a 2 input AND gate whilst holding one input low, under software control, to inhibit its output (refer to Figure 5.3) Synchronisation Checking In order to reconstruct each EPG frame, the software must establish from which of the eight possible addresses the data was derived. Since the addresses are generated by the counter, which is incremented by the pacer clock, it is possible to predict them in software. 85
96 Chapter 5: Hardware Implementation Although this technique eliminates the need to sample all three address lines, synchronisation may be lost should a spurious clock pulse occur. However, by reading the fourth bit of the counter it is possible to check for synchronisation errors and to correct for them. Recall from section 5.1.1, that a total of eight scans are required to build a single frame of EPG data, with each scan representing eight cycles of the pacer/body clock. As can be seen from the timing diagram in Figure 5.4, the fourth bit of the counter should remain constant during each scan. Therefore, if the software detects a fourth bit transition during a scan, it responds by resetting the counter and reinitiating the sequence. Deriving the address in this manner releases two analogue inputs that are used to sample additional parameters Signal Conditioning The operating principle of Linguagraph relies on its ability to reconstruct the body clock when tongue-palate contact is made, and to relay this information to the host computer. By checking for the presence of this clock, the software can determine whether or not contact has been made. However, on close examination of the data received from Linguagraph, it was evident that the reconstructed body clock was often distorted. This is illustrated in Figure 5.5. Body Clock Reconstructed Body Clock Figure 5.5: The original and reconstructed body clocks. The irregularity in the reconstructed body clock is caused by the low-level signals, obtained from the palate, drifting above and below comparator threshold levels. This contributes to the flicker often observed on the current Linguagraph display, and is caused by the software sampling the distorted body clock during false low periods. To overcome this problem in SNORS+, each Linguagraph data line is routed through to a nonretriggerable monostable multivibrator. When triggered with a rising edge, each monostable produces a 0.5 ms pulse, thus reproducing an undistorted version of the body clock. The monostable arrangement is shown in Figure
97 Chapter 5: Hardware Implementation Figure 5.6: Linguagraph signal conditioning circuit. Finally, as the digital Linguagraph signals are sampled via the analogue inputs of the DAS- 1202, they are buffered using unity gain differential op-amps. This configuration (shown in Figure 5.6) separates the digital and analogue grounds, thus preventing the switching noise, generated by the logic, from contaminating the analogue signals. 87
98 Chapter 5: Hardware Implementation 5.2 Lx Envelope Generator The Lx envelope detection circuit simply produces an outline of the Lx waveform, which gives a measure of voicing intensity over time. As discussed in chapter 4 (section 4.3), this is considered useful for measuring parameters such as voice onset time and voice duration. A block diagram of the Lx envelope generator is shown in Figure 5.7. Lx Envelope Generator High-Pass Filter (75 Hz) Active Half Wave Rectifier Low-Pass Filter (30 Hz) Offset & Gain Adjust Figure 5.7: Block diagram of the Lx envelope generator High-Pass Filter In addition to conveying information on vocal fold vibration, the Lx waveform derived from Laryngograph also tracks gross movement of the larynx. These movements are generally slower than the rapid vibration of the vocal folds and are responsible for much of the baseline drift evident in many Lx waveforms. As such, waveform characteristics relating to the larynx movement are easily distinguishable form those of vocal fold vibration. However, since the parameter of interest here is voicing intensity, this lower frequency information must be removed, as this will also contribute to the intensity of the voicing envelope. At around 125 Hz, the adult male produces the lowest fundamental frequency, which may fall below 100 Hz for a particularly creaky voice. Therefore, a high-pass filter with a cut-off frequency of 75 Hz has been selected to remove the low frequency information relating to laryngeal movement, whilst retaining any voiced components. Since it offers maximum flatness in the pass-band and exhibits a relatively sharp roll-off, an active two-pole Butterworth filter was considered suitable for this application. The filter circuit, taken from Horowitz and Hill (1989), is shown in Figure
99 Chapter 5: Hardware Implementation Figure 5.8: A high-pass Butterworth filter. A simple voltage follower isolates the filter s output from the next stage. The transfer characteristic of the above circuit is plotted below. 2dB/div 0 Hz 250 Hz Figure 5.9: Transfer function of the high-pass filter. 89
100 Chapter 5: Hardware Implementation Half Wave Rectifier and Low-Pass Filter The actual voicing envelope is produced by an active half wave rectifier followed by a low-pass filter. The half wave rectifier simply removes all negative signal components. An active rectifier is utilised in preference to a more conventional passive arrangement, since the diode voltage drop does not contribute to the rectifier s output voltage. The rectifier circuit, also taken from Horowitz and Hill (1989), is show in Figure Again, the inclusion of a voltage follower isolates the rectifier from the next stage and ensures a low circuit output impedance. Figure 5.10: The envelope detector circuit. The low-pass filter forms the next stage (refer to Figure 5.10), which removes the higher frequency ripple from the half wave rectified Lx signal. This produces the smooth intensity contours of the voicing envelope. The filter employed to perform this function is the MAX280 5 th order switched capacitor filter, supplied by Maxim. It has been designed to produce a maximally flat pass band with a 3dB cut-off at 30 Hz. This arrangement is identical to that used in the SNORS system (McLean, 1997), and was chosen to produce an equal phase delay on the resulting signal. The transfer function of this 5 th order filter is plotted in Figure
101 Chapter 5: Hardware Implementation 2dB/div 0 Hz 100 Hz Figure 5.11: Transfer function of 5 th order low-pass filter Offset and Gain Adjustment The circuit illustrated in Figure 5.12 has been designed to remove any residual offset introduced by the previous stages, and also to provide the gain adjustment necessary for calibration. 91
102 Chapter 5: Hardware Implementation Figure 5.12: Offset and gain adjustment. The residual offset can be removed by connecting the input of the envelope circuit (Lx In) to ground and adjusting R4 to produce 0 V at the output of the voltage follower shown in Figure 5.12 (Lx Env). Calibration of the envelope generator is achieved by injecting a 150 Hz, 3 V peak to peak, sine wave and adjusting the gain, via R2, to produce an envelope signal intensity of +2.5 V. This represents full scale on the DAS-1202 analogue input and was found by experimentation to produce optimal voice intensity levels. However, the user can compensate for various intensities by adjusting the gain control mounted on the Lx processor. 5.3 Fx Generator The Fx generator circuit converts a series of pulses, representative of the fundamental frequency, into a linearly proportional DC voltage. This produces an instant measure of fundamental frequency, which is ideal for simple biofeedback applications. A block diagram of the Fx generator is shown in Figure
103 Chapter 5: Hardware Implementation Fx Generator Voltage Level Shifter Frequency to Voltage Converter Low-Pass Filter (30 Hz) Gain Adjust Figure 5.13: Block diagram of Fx generator Voltage Level Shifter and Frequency to Voltage Converter In addition to the Lx waveform, the auxiliary output on the Lx processor provides a TTL compatible pulsed signal with a period that tracks the fundamental frequency. These pulses provide an ideal trigger for the frequency to voltage circuit because they are free from baseline drift and other irregularities characteristic of the Lx waveform. However, because the input signal must cross through zero to trigger the frequency to voltage circuit, the TTL pulses must be offset. This is achieved with the AC level shifter circuit illustrated in Figure Figure 5.14: Level shifter and frequency to voltage converter. The diode clamp ensures the input signal remains within the specified triggering voltage range. The potentiometer R4 can be adjusted to remove any DC offset present in the output signal. The TC9400 frequency to voltage converter generates an output voltage that is linearly proportional to the input waveform s frequency. Each zero crossing at its threshold 93
104 Chapter 5: Hardware Implementation detector s input causes a precise amount of charge (q = C2 x V REF ) to be dispensed into an internal op amp s summing junction. This charge in turn flows through a feedback resistor, generating voltage pulses at the output of the op amp. The capacitor C3 placed cross R8 averages these pulses into a DC voltage, which is linearly proportional to the input frequency. The output voltage is related to the input frequency (F IN ) by the transfer function: V OUT = [V REF x C2 x R8] F IN (5.1) The response time to a change in F IN is equal to R8 x C3. This parameter is crucial to the operation of the Fx generator, since a slow response would obscure the distinction between voiced and unvoiced components of a particular utterance. Selecting values of R8 = 1M and C3 = 1nF ensures a circuit response time of approximately 1 ms, which is considered adequate for simple biofeedback applications Low-Pass Filter and Gain Adjust The amount of ripple on V OUT is inversely proportional to C3 and the input frequency. On construction of the above circuit, the amount of ripple on V OUT was found to be considerable at frequencies within the human voice range. This could be removed by increasing C3, but this reduced the circuit response time to unacceptable levels. Therefore, to reduce the output ripple whilst retaining a good circuit response, an active 5 th order lowpass filter was connected to V OUT (refer to Figure 5.15). 94
105 Chapter 5: Hardware Implementation Figure 5.15: Low-pass filter and gain adjust. With a cut-off frequency of 30 Hz, the filter design is identical to that used in the Lx envelope circuit. At around 300 Hz, a typical child produces the highest fundamental frequency range. Therefore, a maximum frequency of 500 Hz was considered optimum for the Fx generator circuit. This represents a full-scale signal of +2.5 V and corresponds to a convenient 50 Hz per division on the Scope display. Calibration is achieved by injecting a 500 Hz TTL pulse train into the circuit and adjusting R5 on the output stage amplifier (refer to Figure 5.15) to produce the required +2.5 V. 5.4 Waveform Resolution Switching To enable the clinician to choose between the unmodified waveforms derived from SNORS and Laryngograph and their envelope or filtered counterparts, a switching mechanism has been incorporated within the interface unit. Table 4.1 in chapter 4 identified the channels concerned and their relative switch positions. The switching mechanism is shown in Figure
106 Chapter 5: Hardware Implementation Figure 5.16: Waveform resolution switching mechanism. Three miniature double pole changeover relays (K1, K2 and K3) were used to implement the switching mechanism. As can be seen from Figure 5.16, each waveform pair (original and modified) are connected to mutually exclusive relay switch contacts. In their deenergised state each relay connects the envelope or filtered waveforms to the analogue samples lines of the DAS When the relays are energised, by the closure of SW1, the contacts switch to replace the existing waveforms with their unmodified counterparts. The spare contacts on relay K1 facilitate a digital status flag that can be interrogated by the software to determine the position of SW1. This enables the software to automatically select suitable analogue sample rates and to place the correct title beneath each waveform. In its energised state the relay K1 connects the status line (Env/Raw) to 0 V, causing the software to read logic low. When de-energised, the relay contacts switch to open circuit causing the status line, which is held at 5 V by R1, to represent a logic high. The capacitors C1 and C2 suppress any noise spikes caused by the switching mechanism. 96
107 Chapter 5: Hardware Implementation 5.5 Automatic Module Detection To automate the parameter selection process, the connection status of SNORS, Laryngograph and both Linguagraphs are made available to the digital inputs of the DAS The software is then able to interrogate these status lines and enable (or disable) the various parameters as appropriate. A typical module detection circuit is illustrated in Figure Figure 5.17: A typical module detection circuit. The switch (SW1) is actually a wire link fitted within the peripheral device plug. In the absence of a device, the switch is effectively open circuit causing the status point to be pulled-up to 5 V by R1. On connection of a peripheral device, the wire link is effectively placed across C1 and the status line is joined to 0 V via current limiting resistor R2. C1 suppresses any noise spikes caused by the switching mechanism. The status line is coupled to a digital input on the DAS-1202 and is read by the software when attempting to initialise the specified device. The software reads a logic low when a device is connected and high when it is disconnected. 5.6 Audio Signal Conditioning The unmodified nasal and oral speech signals, derived from the SNORS microphones, are combined using a resistor network that is optimised to match the Line input impedance of a standard sound card. The resulting signal is routed directly to the left channel of the sound card and used to record high quality audio for playback and acoustic analysis. The right sound card channel is used to record an impedance matched Lx waveform. Again, 97
108 Chapter 5: Hardware Implementation this can be used for playback but is primarily intended for waveform analysis. The audio signal conditioning circuit is illustrated in Figure Figure 5.18: Audio signal conditioning and sound card connector. 5.7 Power Supply The power source for the interface unit, SNORS and both Linguagraphs is derived from the host computer, via the DAS-1202 data acquisition card. However, since the available supply rail is +5 V, a DC-DC converter has been employed to generate the ±5 V supply necessary to drive the analogue circuits described. In addition, the DC-DC converter limits the effects of switch mode and digital noise, commonly associated with PC power supplies. Figure 5.19: The power supply. 98
109 Chapter 5: Hardware Implementation 5.8 PCB Design The electronic circuits described in this chapter were produced using the schematic design editor WinDraft. Utilising a compatible program known as WinBoard, these designs were then used to generate a PCB layout. Once constructed and thoroughly tested, the interface PCB was combined with the existing SNORS design to form a single multipurpose board. Due to its size and complexity, this board was made professionally by Minnitron Ltd. Since SNORS+ is primarily intended for clinical use, a robust case in which to house the PCB was considered essential. Particular attention was also given to the connectors, which needed to be durable yet easy to use. The following images show the multipurpose PCB housed in a sturdy plastic case together with the front and rear panel assemblies. Figure 5.20: The SNORS+ PCB and casing. 99
110 Chapter 5: Hardware Implementation Figure 5.21: Front panel assembly. Figure 5.22: Rear panel assembly. 100
111 CHAPTER 6 BIOFEEDBACK SOFTWARE IMPLEMENTATION A functional overview of the software was presented in chapter 4. This chapter introduces the concepts of real-time data acquisition under the Windows operating system, and explains how these have been implemented in the biofeedback software. The techniques used to format and display the acquired multiparameter data are also discussed. A block diagram of the biofeedback software structure is given in Figure 6.1. Main Application Window DAS-1202 Acquisition Thread Video Acquisition Thread Wave Acquisition Thread Real-Time Bar Window Real-Time Scope Window Real-Time EPG Window Real-Time Video Window Real-Time Wave Window Real-Time FFT Window Real-Time Spectrogram Window Figure 6.1: Biofeedback software structure. The gross components of the biofeedback software can be divided into three main categories: data acquisition, real-time displays and supporting functions. The structures of the first two categories are illustrated above. The supporting functions, which include menus, toolbars, file handling, printing and online help are not discussed here, as they are considered standard in most Windows applications. However, an excellent reference to these features may be found in Petzol, (1996). 101
112 Chapter 6: Biofeedback Software Implementation 6.1 The Multithreaded Environment The ability of SNORS+ to simultaneously acquire and display a variety of parameters in real-time is made possible by the multithreaded nature of Windows 95/98. In a multithreaded environment, programs can divide themselves into individual sections (called threads of execution) that run concurrently. Using threads, the CPU appears to be executing several program functions simultaneously. By creating an individual thread for each data acquisition routine, it is possible to simultaneously control the data flow from each input device (i.e. DAS-1202, sound card and frame grabber). As can be seen in Figure 6.1, each thread supplies all associated child windows with new data, which concurrently update their displays in real-time Scheduling The operating system s scheduler is responsible for executing the individual threads. When created, each thread is given a priority and placed on a queue to await execution. Depending on priority, the scheduler selects a thread from the queue and executes it for a short period of time (typically 20 ms). After the elapsed time the scheduler then suspends the currently executing thread, places it back on the queue and activates the next awaiting thread (if any). This cycle is continually repeated, thus giving each thread a slice of CPU time. Placing the data acquisition routines in high priority threads ensures they take precedence over many other threads awaiting execution. However, during periods of inactivity, the data acquisition threads relinquish the CPU to allow the execution of other instructions, such as operating system tasks. This ensures that Windows remains responsive even during the busy data acquisition periods The Thread Architecture In terms of computer code, a thread is represented by a simple function, which might also call other functions. Generally, the code executes within an endless loop that terminates only when the thread is destroyed. An outline of a typical thread is given below in pseudo code. 102
113 Chapter 6: Biofeedback Software Implementation Begin Initialise Variables Repeat Perform Required Function Forever End The Windows operating system provides a wealth of functions to create, prioritise, suspend and terminate threads. 6.2 The DAS-1202 Data Acquisition Thread The DAS-1202 data acquisition card samples data from SNORS, Laryngograph and Linguagraph. As shown in Figure 6.1, a single acquisition thread supplies data to the realtime windows of Scope, Bar and EPG, all of which can execute concurrently. When the user initiates any one of these windows, if not already executing, the data acquisition thread is created. This runs continuously in the background, collecting data from the acquisition card and passing it to the various windows. A flowchart reflecting the structure of the DAS-1202 data acquisition thread is illustrated in Figure
114 Chapter 6: Biofeedback Software Implementation Start Initialise DAS-1202 Initialisation successful? No Exit Yes Check acquired number of samples Have samples been acquired? No Yes Pass new data to all real-time windows Figure 6.2: Structure of the DAS-1202 data acquisition thread Initialisation Depending on the type of window (or windows) activated, the initialisation stage configures the DAS-1202 to sample the required number of analogue channels at the appropriate sample rate. If the initialisation stage is unsuccessful the thread self-terminates and the child window is not created. The DAS-1202 analogue input configuration is illustrated in Figure
115 Chapter 6: Biofeedback Software Implementation SNORS Lx Aux Linguagraph Channel Linguagraph Data Scan Linguagraph Channel Select / Sync. Pulse Auxiliary Fundamental Frequency Lx Oral Airflow Nasal Airflow Oral Speech Nasal Speech Figure 6.3: The DAS-1202 analogue input configuration. If the Scope or Bar window is activated, a sample rate of 100 Hz is selected and only analogue channels 0 through to 6 are sampled. All 16 channels are sampled if an EPG window is activated and the sample rate is increased to 800 Hz. However, to improve resolution, these default sample rates may be increased to 200 Hz and 1600 Hz respectively. This is achieved by amending the SNORS+ configuration file, which is interrogated at the start of program execution Data Acquisition Since it allows maximum transfer rates, the DAS-1202 acquires data using direct memory access (DMA) techniques. Additionally, DMA operations execute as background tasks, which allow the application (and operating system) to execute other instructions while the DMA operation is in progress. Since the DMA buffer is managed by the DAS-1202 device driver and is limited to samples, a circular buffer has been implemented. This buffer is progressively filled to capacity then reset, thus continually overwriting any previously stored data. However, during normal operation, data from the circular buffer are copied to the display routines well in advance of being overwritten. 105
116 Chapter 6: Biofeedback Software Implementation Data Notification Once the required number of samples have been captured, the data acquisition thread notifies all active child windows by passing a pointer to the new data. Before returning this pointer, each window copies the data to a local buffer for processing and biofeedback display. The size and structure of the data passed with each new message is dependant on the type of window (or windows) currently active Thread Termination When the last real-time Scope, Bar or EPG window has been terminated the DAS-1202 data acquisition thread is destroyed and the DMA buffer released. 106
117 Chapter 6: Biofeedback Software Implementation 6.3 The Real-Time Bar Window Bar is a very useful biofeedback tool since it provides a clear and simple display that reflects the intensity of various speech parameters. This allows the patient to monitor articulatory function, make corrections, and observe the result. Bar is particularly useful when working with sustained sounds or single phonemes. Figure 6.4 illustrates a typical real-time Bar window. Figure 6.4: The real-time Bar window High-Level Function In this particular window, the upper section reflects the amount of nasal airflow, which moves upwards away from the centre with increased nasal airflow. The lower section indicates oral airflow and moves downwards away from the centre with increased oral airflow. As with all the child windows featured in SNORS+, the displayed data may be 107
118 Chapter 6: Biofeedback Software Implementation saved, retrieved or printed. The user can select a variety of Bar options from the Settings dialogue box illustrated in Figure 6.5. Figure 6.5: The Bar Settings dialogue box Channel Select The Channel Select option allows the user to choose any two of the following parameters: Speech nasal, oral, combined and acoustic nasalance. Airflow nasal, oral, respiration, aerodynamic nasalance and ratio. Voicing larynx excitation and fundamental frequency. Auxiliary examples include lips, jaw or intra-oral pressure. Should the user wish to view more than two parameters simultaneously, a number of realtime Bar windows can be activated, with each displaying a different set of parameters Appearance This group box allows certain features such as text and scale ticks to be turned on or off. Additionally, peak hold markers and target levels may be enabled or disabled. The peak 108
119 Chapter 6: Biofeedback Software Implementation hold markers are narrow bands that persist for a short duration at peak values. This feature is useful for observing small transitional peaks that may otherwise be missed. Targets are adjustable horizontal bars drawn within the active Bar area, which the subject attempts to reach or avoid. This is useful, for example, when attempting to keep nasal air emissions below a certain level Bar The Response of the Bar display reflects how quickly it responds to changes in the input signal. Reducing the response time makes it easier to observe peak values, but small transitional peaks may be lost. The Peak Delay adjusts the persistence of the peak hold markers described in section The Sensitivity option allows gain adjustment of the Bar display. This is useful for low intensity levels that may require amplification Background Colour Finally, the background colour can be changed from grey to black. This provides a greater contrast, which may be useful for patients having reduced visual acuity Low-Level Function On activation, the main Bar window procedure scans the digital status lines to determine what peripheral devices are connected. Depending on the available parameters, two default channels are selected. The window procedure then lies dormant until it receives a data notification message from the data acquisition thread. On receipt of new data, samples from channels 0 through to 6 are extracted and copied to a local buffer. This buffer is passed to various functions that format the data, calculate additional parameters and display the actual intensities. In addition, the digital status lines are again scanned to determine the nature of the sampled signals (i.e. modified or unmodified). If a change is detected, the Bar titles are altered to reflect the new parameters. Once the display has been updated, the window function returns to an idle state awaiting the next notification message. 109
120 Chapter 6: Biofeedback Software Implementation 6.4 The Real-Time Scope Window The real-time Scope window provides information about the dynamics of articulation during the utterance of complete words or phrases. Used as a biofeedback tool, Scope can be executed singularly or in conjunction with another window such as Bar. A five channel Scope window is illustrated in Figure 6.6. Figure 6.6: The real-time Scope window High-Level Function Traces scan across the display from left to right in a fashion similar to that of an oscilloscope. Up to ten traces can be viewed simultaneously, which may be frozen at any point during the scan to allow discussion with the patient. As with a Bar window, the user can select a variety of options from a Settings dialogue box similar to that illustrated in Figure 6.5. Many of the options discussed in section also apply to the Scope window. However, the features unique to Scope are detailed below. 110
121 Chapter 6: Biofeedback Software Implementation Trigger Mode In Continuous mode the traces repeatedly scan across the display, which is refreshed at the end of each scan. However, in Single Shot mode the traces are suspended once the end of the Scope display is reached. This is useful for obtaining complete real-time Scope traces. To reactivate the scan, the user may either toggle the pause key or press the A (Activate) key Trace Using the Speed option, the trace time-base can be varied between 1 and 10 seconds. A default time-base of 5 seconds was considered suitable for most applications. Negative components of the featured waveforms can be revealed using the Offset option. This increases the zero baseline from its default position at the bottom of each trace window. All traces are one pixel thick by default. However, with the Thickness option it is possible to increase this to 10 pixels in one pixel increments. This may be useful for small children or patients having reduced visual acuity Grid To aid interpretation, a combination of horizontal and vertical grid lines can be enabled. Alternatively, a single Zero Line can be selected. This is useful when an offset has been applied and the baseline no longer resides at the bottom of each trace window Channel Select The Channel Select dialogue box allows the user to choose a maximum of ten traces from the parameters listed in section As shown in Figure 6.7, these parameters have been conveniently grouped and allocated their own respective property page. 111
122 Chapter 6: Biofeedback Software Implementation Figure 6.7: The Channel Select dialogue box Low-Level Function The low-level operation of the real-time Scope window is similar to that of the Bar window. However, two functions common to both windows and not previously discussed are airflow calibration and additional parameter derivation Airflow Calibration The response of the AWM3300V airflow transducer used within the SNORS mask is nonlinear in nature. Additionally, the inclusion of the bypass section introduces further nonlinearity and changes the effective dynamic range. In order to allow accurate airflow measurement, compensation for this non-linearity has been implemented in software. However, due to the lack of a theoretical algebraic expression, a model approximating the airflow transfer function was required. A simple wind tunnel similar to that described by McLean (1997) was constructed for this purpose. It employed a standard velocity transducer to produce a measure of flow velocity proportional to flow rate. The AWM3300V airflow transducer, housed in its bypass section, was attached to the end of the tunnel. The flow rate within the tunnel was then incrementally increased over a suitable range, while taking flow velocity and transducer samples at each point. From the acquired data, the mean and standard deviation were calculated. Non-linear regression was 112
123 Chapter 6: Biofeedback Software Implementation then performed on the data, assuming an approximation of the curve by a rational function in the form: n5x n4x n3x n2x n1x f ( x) (6.1) d x d x d x d x Where the n and d terms are coefficients to the polynomial. The software package MATLAB was utilised to calculate the actual coefficients. Once obtained, the polynomial approximation was used to generate a data lookup table that is embedded within the software. A graphical representation of the lookup table is shown in Figure Corrected Data Raw Data Figure 6.8: The airflow calibration curve. The units on the above calibration curve represent the sample values obtained from the DAS-1202 data acquisition card, where 4096 = +2.5 V, 2048 = 0 V and 0 = -2.5 V. All airflow data are mapped through the lookup table before further processing or prior to actual display 113
124 Chapter 6: Biofeedback Software Implementation Additional Parameters The following additional parameters are derived from the speech and airflow channels: Combined Speech represents the total sound emanating from both nasal and oral ports: Nasal Speech Oral Speech (6.2) Acoustic Nasalance is the percentage of the total sound intensity that is nasal: Nasal Nasal Speech 100 Speech Oral Speech (6.3) Respiration represents the total airflow emanating from both nasal and oral ports: Nasal Airflow Oral Airflow (6.4) Aerodynamic Ratio is the ratio of the difference between nasal and oral airflow to the total airflow: Nasal Nasal Airflow Oral Airflow Oral Airflow 100 Airflow (6.5) Aerodynamic Nasalance is the percentage of the total airflow that is nasal: Nasal Nasal Airflow 100 Airflow Oral Airflow (6.6) The above parameters are available in both Scope and Bar windows. 114
125 Chapter 6: Biofeedback Software Implementation 6.5 The Real-Time EPG Window The real-time EPG display representing the subject s tongue-palatal contact is used as a biofeedback tool. This allows patients to visualise their tongue movement during speech. A typical EPG display is shown in Figure 6.9. Figure 6.9: A typical EPG display High-Level Function The left-hand display in Figure 6.9 shows a real-time EPG window. By asking the patient to produce a certain sound or word, the tongue-palate contact pattern can be observed on the display. To encourage the patient to produce certain tongue positions, model contact patterns may be retrieved from disk file. Figure 6.9 illustrates a patient s articulation of /s/ (left) and the file model provided (right). Alternatively, to allow the clinician to produce the model contact pattern, a second real-time EPG window may be activated. To permit the study of a particular contact pattern, either or both displays may be paused with a single key press. The user can select a variety of EPG options from the Settings dialogue box illustrated in Figure
126 Chapter 6: Biofeedback Software Implementation Figure 6.10: The EPG Settings dialogue box Segment Colour The segment colours allow ease of identification between the various EPG palates. By default, red is assigned to channel one, with channel two and the model palate assigned as blue. The Regional option enables colour coding of the EPG segments, i.e. blue for alveolar, green for palatal and red for velar. This has been found to simplify regional targeting, especially in young children Segment Outline Merging or hiding segment outlines can further customise the EPG display. By default, the segment outlines are drawn (refer to Figure 6.9) Channel Select The channel selection process is automatic, with the initial EPG window assigned to channel 1 and the second window to channel 2 (if connected). However, using the Channel Select option these defaults may be overridden. 116
127 Chapter 6: Biofeedback Software Implementation Low-Level Function During initialisation, the EPG window procedure determines the number of Linguagraphs connected then assigns the window to the first available channel. The subsequent EPG window is assigned to the next available channel and so on. On receipt of the data notification message, the window procedure must initially unpack the data. This is contained within a memory block that represents eight scans of all 16 analogue inputs (refer to Figure 6.3). Since analogue sampling techniques are used to acquire the digital EPG data, each sample is converted back to its digital equivalent prior to local storage. Once unpacked, the entire EPG frame is represented by a 64 Bit array. Figure 6.11 illustrates the relationship between the individual bits of the array and the palatal segments Figure 6.11: The EPG palatal map. The above palatal map represents a data lookup table that is embedded within the software. To ensure the data are intended for the active window, the channel number preceding each EPG scan is also examined. This channel number should remain constant for each scan. If not, a synchronisation error has occurred. In this case, the counter is reset and the current data discarded. Once unpacked, the data are mapped to the appropriate EPG segments via the lookup table and displayed. 117
128 Chapter 6: Biofeedback Software Implementation 6.6 The Wave Data Acquisition Thread Windows provides several methods for controlling audio devices such as the standard sound card. These include the media control interface (MCI) and the waveform-audio interface. However, since it offers the greatest possible control over audio devices, the waveform-audio interface was considered most appropriate for this application. As can be seen from Figure 6.1, the wave acquisition thread supplies data to the real-time windows of Wave, FFT and Spectrogram. In a manner similar to that previously described, this thread runs continuously in the background, collecting data from the sound card and passing them to the various child windows. A flow chart reflecting the structure of the wave data acquisition thread is illustrated in Figure Initialisation Before recording audio with a sound card, the following information must be supplied to the waveform-audio interface: Data format type. Number of channels. Sample rate. Bits per sample Format Type Waveform-audio data use the pulse code modulation (PCM) format type. This is the only format category defined for the common.wav file, which is used by many Windows applications to store audio data Number of Channels The number of recorded audio channels is dependent on Laryngograph s connection status. In the absence of a Laryngograph, the sound card is configured in mono mode and speech is recorded on the left channel only. However, if a Laryngograph is detected, the sound card is configured in stereo mode and the Lx signal is recorded on the right channel. 118
129 Chapter 6: Biofeedback Software Implementation Start Initialise audio device Initialisation successful? Yes No Exit Send initial buffer to audio device Suspend thread until audio buffer is full When re-enabled Send next buffer to audio device Pass new data to all real-time windows Prepare next audio buffer Figure 6.12: Structure of the wave data acquisition thread. 119
130 Chapter 6: Biofeedback Software Implementation Sample Rate Sample rates commonly used for the PCM format are 8.0 khz, khz, khz, and 44.1 khz. However, a sample rate of khz was considered appropriate for this application. This enables the acoustic analysis windows to resolve frequencies up to khz, whilst maintaining the real-time response necessary for biofeedback. At higher sample rates, slower computers were unable to process and display the vast amount of data in the allocated time period Bits per Sample As it is supported by the majority of sound cards, and yields discrete sample steps, a resolution of 16-Bits per sample was chosen. If the installed sound card does not support any of the above parameters, the thread selfterminates and the program prevents any wave windows from being created Data Acquisition Once initialised, the application is responsible for supplying the audio device with all necessary data record buffers (referred to as audio blocks). For 16-Bit PCM data, recorded in stereo, each sample is represented by a 16-Bit signed integer. The structure of the audio block is illustrated in Figure Left Channel Right Channel low-order byte high-order byte low-order byte high-order byte Sample 1 Sample 2 Figure 6.13: The PCM audio block structure. Once an audio block is full, the sound card notifies the application to request another block. The waveform-audio interface supports several notification methods. However, the most efficient is the event callback method, which allows the acquisition thread to suspend itself whilst awaiting data notification. This allows other threads and system processes to 120
131 Chapter 6: Biofeedback Software Implementation execute during the DMA controlled recording phase. Once an audio block is full, the data acquisition thread is reactivated to supply the next block and process the new data Data Notification Each audio block contains 100 ms of data and is passed to all applicable real-time windows. Before returning the audio block, each window copies the data to a local buffer for processing and biofeedback display. However, to maintain a real-time response, the audio block must be returned within 100 ms Thread Termination When the last real-time Wave, FFT or Spectrogram window has been terminated, the wave data acquisition thread is destroyed and the audio blocks released. 6.7 The Real-Time Wave Window The real-time Wave window provides information on a variety of acoustic parameters such as amplitude and fundamental frequency. In addition, the presence of some articulatory functions such as voice onset time and voiced-voiceless distinctions may be observed. Figure 6.14 illustrates a typical real-time Wave window High-Level Function In this particular window, the upper section displays the acoustic waveform derived from the SNORS microphones. The lower section displays the Lx waveform obtained from the Laryngograph processor. Many of the real-time Scope options, described previously, also apply to the Wave window. However, to resolve finer detail, the Wave time-base is much shorter, and can be varied from 10 to 100 ms in 10 ms increments. The default time-base is 100 ms. 121
132 Chapter 6: Biofeedback Software Implementation Figure 6.14: The real-time Wave window Low-Level Function When the main window procedure receives the data notification message, it initially unpacks the data from the audio block and stores them in a local buffer prior to display. To display 100 ms of audio data, recorded in stereo at khz, requires 4410 discrete line draw operations. Unfortunately, during initial trials this was found to take approximately 300 ms using direct screen draw techniques. In order to maintain a realtime response, this required two in every three audio blocks to be discarded. However, by constructing the entire waveform in memory then copying the image to the screen in one bitmap operation, the display period was drastically reduced to just 30 ms. This technique was implemented in the display module to allow maximum screen refresh rates without data loss. 6.8 The Real-Time FFT Window The real-time FFT window yields a wealth of acoustic information relating to the sampled speech signal. This ranges from the fundamental frequency and its associated harmonics to the actual resonance of the vocal tract. A typical real-time FFT window is shown in Figure
133 Chapter 6: Biofeedback Software Implementation Figure 6.15: The real-time FFT window High-Level Function A fast Fourier transform (FFT) is based on the theory that complex periodic waveforms can be decomposed into a series of sinusoidal components of certain amplitude and phase. Each sinusoidal component, derived from a complex periodic waveform, is an integer multiple of the fundamental frequency. Fourier s theorem permits the transformation of a waveform into a spectrum where the amplitude of each component frequency is represented. On the FFT display, the frequency components of the acoustic signal are plotted on the x-axis and their relative amplitude on the y-axis. The user can select a variety of FFT options from the Settings dialogue box illustrated in Figure
134 Chapter 6: Biofeedback Software Implementation Figure 6.16: The FFT Settings dialogue box. Many of the available options are similar to those previously described; however, features unique to the FFT window are detailed below Maximum Frequency The maximum frequency that can be represented on the FFT display is restricted by the audio sample rate. The Nyquist sampling theorem states that only sampled data with frequencies up to sample rate / 2 can accurately be reproduced. Since the sound card has been programmed to sample at khz, the maximum displayable frequency is khz. For clarity, this frequency has been rounded down to 10 khz. The user is able to select any frequency range between 1 and 10 khz in 1 khz steps. The default frequency range is 10 khz Window Functions The fast Fourier transform separates the acoustic signal into sinusoidal waveforms of different frequency. Once separated, each sinusoid is placed in its own respective frequency bin. The FFT length determines the number of frequency bins representing the acoustic signal. The relationship between frequency bins and FFT length is given by: frequency bins = FFT length / 2 (6.7) 124
135 Chapter 6: Biofeedback Software Implementation In general, spectral power in one bin may contain leakage from frequency components in neighbouring bins; this is often referred to as spectral leakage. This leakage can be seen as a wide spectral energy smear around the centre frequency, refer to Figure 6.17 (left). Figure 6.17: An FFT response to a 1 khz sine wave, without windowing (left) and with the Hanning window function applied (right). Spectral leakage occurs when a frequency component of the acoustic signal does not slot exactly into one of the frequency bins. By applying a window function to the sampled data this effect can be significantly reduced. Using this technique, data samples are multiplied by a function that tapers towards zero at either end. This ensures that the signal fades in and out rather than starting and stopping abruptly. The effect is to reduce discontinuity at data boundaries and hence the amount of leakage. However, window functions do broaden the spectral peaks of the acoustic signal, making it difficult to distinguish certain frequency components. A number of window functions have been devised. Some are effective in reducing spectral leakage, at the expense of spectral detail, while others try to achieve a compromise. Figure 6.18 illustrates a selection of data windowing functions. 125
136 Chapter 6: Biofeedback Software Implementation Window Functions Hanning Hamming Blackman W(n) n Figure 6.18: A selection of data windowing functions. Figure 6.17 (right) demonstrates the effect of applying a Hanning window function to a 1 khz sine wave. It is apparent that the spectral smearing around the centre frequency has been significantly reduced. The real-time FFT window supports the following windowing functions: square (null), Welch, Bartlett and Hanning Vertical Scaling The acoustic signal intensity may be represented on a logarithmic or linear scale. By default, the intensity scale is logarithmic, which generally boosts the lower intensity high frequency components of the acoustic signal. This makes it useful for examining harmonics of the fundamental frequency or the higher frequency components produced by voiceless fricatives such as /s/. Unfortunately, the logarithmic scale also increases background noise intensity. Linear intensity scaling produces a much cleaner FFT display but low level acoustic information is often lost. Linear scaling is useful when examining the high intensity signals usually associated with fundamental frequency Low-level Function Once the data notification message has been received, the main window procedure must perform an FFT on the audio block and display the result within 100 ms. Failure to achieve this results in the loss of real-time response. The algorithm used to implement the FFT centres around equation 6.8, taken from Press et al., (1992). 126
137 Chapter 6: Biofeedback Software Implementation H n N 1 2ikn/ N hke (6.8) k0 Where N is the number of consecutive samples; n is the sample index and h k the actual sample. This efficient algorithm requires N to be an integer power of 2. Therefore, since each audio block contains 2205 speech samples, it must be rounded down to 2048 samples. This represents the FFT length and determines the overall spectral resolution as indicated by the number of frequency bins (refer to section ). Spectral resolution denotes the level of detail in the resulting frequency spectrum, and is given by: sample rate spectral resolution (6.9) FFT length If the spectral resolution is too coarse, fine detail in the spectrum may be lost. However, if the resolution it too fine the computational time can be unacceptably long. Using equations 6.7 and 6.9 it can be shown that the frequency spectrum resolved by the realtime FFT window contains 1024 frequency components ranging form 0 to khz, at intervals of Hz. This level of detail was considered adequate for biofeedback applications, and takes approximately 10 ms to compute on a Pentium 233 MHz processor. To keep the display time to an absolute minimum, the bitmap screen draw techniques discussed in section have also been implemented here. As a result, the time taken to process each data notification message is approximately 40 ms. This allows SNORS+ to simultaneously display both Wave and FFT windows and still maintain a real-time response. 127
138 Chapter 6: Biofeedback Software Implementation 6.9 The Real-Time Spectrogram Window The real-time Spectrogram provides an alternative method of visually representing an acoustic signal. This window allows analysis of the frequency components contained within the acoustic signal, either in terms of the harmonics it comprises or of the peaks of resonance that it contains. The spectrogram also provides useful information on how these frequencies change with time. A typical Spectrogram window is illustrated in Figure Figure 6.19: The real-time Spectrogram window (with cursor enabled) High-Level Function A spectrogram is produced by repeatedly applying an FFT to the sampled speech signal over a period of time. The resultant frequencies are represented vertically on the spectrogram with time plotted horizontally. Amplitude, or loudness, is depicted by grey scale or colour intensities. Features unique to the Spectrogram window may be selected from the Setting dialogue box illustrated in Figure
139 Chapter 6: Biofeedback Software Implementation Figure 6.20: The Spectrogram Settings dialogue box Trace Window The time-base determines how long it takes the spectrogram trace to travel the width of the window. The default window is 10 seconds but this may be adjusted from 1 to 60 seconds in 1 second steps Bandwidth As discussed in chapter 3 (section 3.5.3), bandwidth affects the appearance of the spectrogram. Narrower bandwidths are useful for resolving finer frequency information but yield poor timing information. Wider bandwidths exhibit the reverse characteristics, displaying good timing resolution at the expense of frequency resolution. The following table summarises the available bandwidths and their relationship to the spectrogram s timing resolution. 129
140 Chapter 6: Biofeedback Software Implementation Bandwidth (Hz) Timing Resolution (ms) FFT Length Table 6.1: Available bandwidths, their respective timing resolution, and FFT length. The user may select any of the tabulated bandwidths. Alternatively, two predefined bandwidths have been included: narrowband and wideband, which have bandwidths of 43 Hz and 172 Hz respectively Trace Appearance Spectrograms commonly employ grey scales to represent the varying speech intensities. The darker the image the greater the intensity, with white areas representing silence. However, an alternative to grey scale is pseudo colour, which represents the maximum intensity as red gradually decreasing through orange, yellow, green and blue. Silence on the coloured spectrogram is represented by black as shown in Figure Figure 6.21: A narrowband spectrogram featuring colour intensity scales. 130
141 Chapter 6: Biofeedback Software Implementation Colour Intensity The colour intensity slider may be used to adjust colour or grey scale levels. This is useful for increasing the intensities of low level acoustic signals that may otherwise be concealed. A gain of up to 30dB is provided by the colour intensity control Noise Reduction A noise reduction slider has also been included which enables the user to minimise the effects of background noise. Noise on the spectrogram is characterised by a snowy background, and whilst generally not presenting a problem, the user may wish to improve image quality by removing this. Up to 20dB of attenuation is provided by the noise reduction control The Crosshair Cursor The crosshair cursor allows quick identification of spectrogram related parameters such as frequency, timing and intensity. Once activated the user simply places the cursor over areas of interest and examines the parameters displayed in the status bar. The crosshair cursor may be activated via the toolbar or menu option. Figure 6.19 illustrates a spectrogram with the crosshair cursor enabled Low-Level Function The spectrogram trace is constructed from a series of individual FFT s. For each data notification message, the number of FFT calculations performed is dependent on the selected bandwidth and the FFT length. The relationship between bandwidth and FFT length is given in Table 6.1. For example, to generate a narrowband spectrogram with a bandwidth of 43 Hz, requires a series of FFT s consisting of 512 samples each. Since each audio block contains 2205 speech samples, it is possible to execute four FFT calculations during each data notification message. For the narrowband spectrogram, the results from each FFT are converted into 256 discrete pixel intensities (each representing a single frequency bin). These are mapped to a 4 x 256 bitmap image which is maintained in memory. Depending on the size of the Spectrogram window, this bitmap is stretched (or shrunk) to accommodate the height and width of the actual display, then copied to the relevant trace position on the screen. This procedure is repeated with each data notification message until the entire Spectrogram image has been constructed. Depending on the selected bandwidth, and hence the number of FFT calculations, it may take between 131
142 Chapter 6: Biofeedback Software Implementation 10 and 20 ms to process each audio block. This allows SNORS+ to simultaneously display Wave, FFT and Spectrogram windows while still maintaining a real-time response The Real-Time Video Window The operation of the real-time video window differs from those previously described. All functions used to control the frame grabber, including the data acquisition thread, are incorporated within the Matrox imaging library (MIL). This library is contained within a dynamic link library (DLL), which SNORS+ launches at run time. Accessing the device in this way isolates the application from many of the low-level programming issues, and enables the frame gabber to be treated in a device independent manner. The real-time video window is a useful biofeedback tool that allows the patient to view facial expressions, posture, jaw extension and lip movements during speech. It is also useful for adjusting brightness and contrast levels prior to recording videofluoroscopic images. As illustrated in Figure 6.22, video images can be viewed alongside other parameters such as voice and speech intensities. 132
143 Chapter 6: Biofeedback Software Implementation Figure 6.22: The real-time video window (left) and real-time Bar (right) High-Level Function The video image displayed in Figure 6.22 was obtained with a standard PAL camcorder, connected directly to the frame grabber s composite video input. The images within the video window are updated up to 25 frames per second (depending on the processor speed). For a closer examination, the display may be paused at any point during the image acquisition. In addition, the user can select a variety of video options from the Settings dialogue box illustrated in Figure
144 Chapter 6: Biofeedback Software Implementation Figure 6.23: The video Settings dialogue box Camera The camera format allows the user to select the appropriate video source. Formats supported by the Matrox frame grabber include PAL, SECAM, CCIR, NTSC and RS170. The default camera format is specified within the SNORS+ configuration file and may be edited by the user Image Resolution Selecting High, Medium or Low from the resolution group box allows the image size to be varied. The actual size is dependent on the selected camera type and is represented in pixels. Resolution also affects image acquisition speed and may be reduced to compensate for slower computers. 134
145 Chapter 6: Biofeedback Software Implementation Zoom The zoom control allows the image magnification to be altered. Whenever the image is larger than the available window area, horizontal and vertical scroll bars appear. These allow the user to view the image section not currently visible in the display area Contrast and Brightness The contrast and brightness sliders can be used to alter the relevant parameters on the displayed image. These controls operate in real-time, allowing the user to monitor the effect as they are applied Appearance The appearance option allows the display of either standard monochrome (default) or pseudo colour images. Pseudo colour images have false colour applied, which highlights edges and contours in some cases. This option is only available for monochrome camera formats Low-Level Function When executed, the SNORS+ application checks to determine whether a Matrox frame grabber is present. If a functional device is detected, it is initialised through the MIL interface. However, if a frame grabber is not present, the creation of the video child window is prohibited. During initialisation, the real-time video window creates a blank display corresponding to the default image size. A handle to this window is then passed to the MIL interface, which it uses to direct all video images to the allocated display area. The actual real-time video grab and display sequences are initiated through a single MIL function call. These operations run continuously in the background and require no additional commands from the main application. Further calls to the MIL interface control additional features such as image zoom and pan. When the window is destroyed, the real-time video grab is cancelled and the MIL system reset. Only a single real-time window is supported by the MIL interface. 135
146 CHAPTER 7 ANALYSIS SOFTWARE IMPLEMENTATION This chapter introduces the test protocol implemented by SNORS+ to conduct formal speech assessment. Having defined a protocol, techniques for the synchronised acquisition of multiple source data are then discussed. Finally, the methods used to format, display and analyse the synchronised multiparameter data are described. The structure of the analysis software is illustrated in Figure 7.1. Main Application Window Test Protocol Test Acquisition Thread Test Scope Window Test Bar Window Test EPG Window Test Video Window Test Wave Window Test FFT Window Test Spectrogram Window Figure 7.1: Analysis software structure. 136
147 Chapter 7: Analysis Software Implementation 7.1 Test Protocol To conduct a formal speech assessment, it is necessary to define a protocol that allows comparable measurements to be performed. The test protocol adopted by SNORS+ is similar to that of SNORS, which requires the patient to utter a number of words as prompted by the computer. At the beginning of each new test the user is presented with a dialogue box that contains a variety of options. Using these options the user can tailor each test to suit the individual patient requirements. The Test Settings dialogue box is illustrated in Figure 7.2. Figure 7.2: The Test Settings dialogue box. During initialisation, the Test Setting dialogue procedure determines what devices are attached to the system and configures the protocol accordingly. In many cases, the user 137
148 Chapter 7: Analysis Software Implementation initiates the test simply by clicking the Start button without modifying the default protocol. However, should the user wish to adjust the default settings, a number of options are available. Once initiated, a series of prompt words appear on the computer screen. As the patient utters each word the recorded data are stored in memory. At the end of the sequence a comprehensive test analysis display is automatically generated Word List The prompt words that appear on the computer screen during a test sequence are contained within a word list file. A number of word lists are available, and can be selected by clicking the Select New List button. The lists contain a variety of words, which aim to emphasise individual articulatory function. The standard list routinely used by the author consists of the words: begin, type, fight, seat, cheese, shoot, smoke, king, missing, end. These contain an assortment of nasalised and non-nasal words (Ellis, et al., 1978). Clicking the Create New List button launches a simple text editor that allows the user to create customised word lists. Each word, sound or sentence is entered on a new line and will appear on the test screen exactly as typed The Word Display Period The display period for each word, and hence the time allowed for the patient to utter the word, can be varied from 1 to 90 seconds. This is often useful when working with young children or stroke patients who may require longer to utter each word. The display period may also be increased to accommodate short passages or sentences Sample Frequency The sample rate of the DAS-1202 data acquisition card can be adjusted using this option. The default sample rate, as specified in the SNORS+ configuration file, may be either 100 Hz or 200 Hz. If EPG is enabled this figure cannot be altered. Without EPG, the sample rate may be adjusted from 100 Hz to 10 khz. Generally, for the low frequency envelope and filtered signals, a sample rate of 100 Hz is sufficient. However, in order to resolve the finer detail when acquiring unmodified signals, a higher sample rate is required. To accommodate this, a sample rate of 2 khz is automatically selected when the hardware is configured in high-resolution mode. 138
149 Chapter 7: Analysis Software Implementation Parameter Selection As shown in Figure 7.2, the available test parameters are SNORS, Lx, EPG (channel 1), auxiliary channel, audio and video. Selecting the appropriate tick box ensures the corresponding parameter is recorded during the test sequence. With the exception of video, all available parameters will have their corresponding tick box selected by default. This automated process is achieved by reading the digital status line of each peripheral device during initialisation. Due to the large amount of system resources required for video acquisition and the resulting file size, this option must always be selected manually. By disabling the appropriate tick boxes, the user is prevented from selecting unavailable parameters Display Options The test display layout is largely dependent on the parameters selected in the test protocol. For example, if SNORS, Lx and EPG are selected, the resulting display contains a five channel Scope window featuring speech, airflow and voicing data, together with a palate window displaying the EPG data. However, by selecting parameters in the Display Options group box, the user can request additional windows to be displayed. These include test versions of Bar, Wave, FFT and Spectrogram. At the end of each test, all windows are automatically arranged to provide optimum visibility Setup These buttons provide access to general set-up functions and also the facility to enter patient information Sensors This option removes any residual offset present in the airflow subsystem. When initiated, the user is prompted to cover both sensor ports and to place the mask on a flat surface. Once this has been acknowledged, the offset level is determined and then subtracted from all subsequent airflow data. Since even a small offset can significantly effect aerodynamic calculations such as nasalance and ratio, the user is automatically prompted to calibrate the system prior to the initial test. This calibration procedure can also be initiated from the real-time Bar and Scope windows. 139
150 Chapter 7: Analysis Software Implementation Audio This button launches the Audio Setting dialogue box, which allows the user to adjust the audio record quality. The default quality, termed Radio, uses a sample rate of khz, which is considered adequate for most purposes. If required, the audio can be increased to CD quality (44.1 khz), or reduced to Telephone quality ( khz). However, high quality audio recordings require large amounts of system resource and also generate large.wav files. An Advanced button executes the Windows audio recording level program. This provides a means for optimising the Windows recording levels so they match the speech and Lx intensity levels Video This button executes the Video Record dialogue box, which enables the user to set the appropriate video record parameters. This dialogue box is similar to that illustrated in Figure The available image resolution and camera formats are identical to those discussed in the real-time video section. However, an additional feature unique to this dialogue box is the frame rate adjustment control. The default rate is 25 frames per second but this may be reduced to suit the computer s capability. Due to memory constraints, only monochrome images are recorded during test mode Information This button invokes the Information dialogue box illustrated in Figure 7.3. The patient s personal details, together with test specific information, may be entered here. This information is attached to the test record and may be modified at a later stage. 140
151 Chapter 7: Analysis Software Implementation Figure 7.3: The Information dialogue box. When the required selections have been made, and the user clicks the Start button, the protocol is passed to the test data acquisition thread. 7.2 The Test Data Acquisition Thread The main advantage of SNORS+ over standalone instrumentation is its ability to synchronously acquire a variety of speech related parameters. It is therefore essential to the successful operation of SNORS+ that the acquired signals are properly synchronised, since loss of synchronisation may result in the misinterpretation of data. Synchronised capture of multiple source data forms the primary function of the test data acquisition thread. An outline of the thread is given below in pseudo code. 141
152 Chapter 7: Analysis Software Implementation BEGIN Initialise input devices; initiate audio acquisition; initiate DAS-1202 data acquisition; initiate first video frame grab display first word in list; REPEAT check number of DAS-1202 samples; if next video frame due initiate new grab; if word data captured display next word; UNTIL all words in list have been displayed; END stop audio acquisition; stop DAS-1202 data acquisition; The above example assumes both audio and video were selected in the test protocol Initialisation The various input devices are initialised in the following manner DAS-1202 Depending on the type of parameters selected, the DAS-1202 is configured to sample the required number of channels at the appropriate sample rate. The analogue input configuration is identical to that illustrated in Figure 6.3. The DAS-1202 acquires data in DMA mode and implements a circular buffer similar to that used in real-time acquisition. Samples from this buffer are transferred to a main test buffer during the acquisition period Audio Since it offers the greatest possible control over audio devices, the waveform-audio interface has been implemented. Again, the 16-Bit PCM data format is used, with the channel number and sample rate obtained from the test protocol. A single audio block, sufficient to hold the complete test, is transferred to the waveform-audio interface prior to data capture. 142
153 Chapter 7: Analysis Software Implementation Video Using the camera format and image resolution obtained in the test protocol, the Meteor frame grabber is programmed (via MIL) to capture video in single frame mode. This allows the data acquisition thread to control the actual frame rate as specified in the test protocol. A large buffer, sufficient to contain the entire video sequence, must be allocated for each test. For example, a 55.3 Megabyte buffer is required for 20-seconds of video containing medium resolution CCIR images (384 x 288) recorded at 25 frames per second. When all input devices have been initialised, the various acquisition routines are executed. The order in which this occurs forms a crucial factor in data synchronisation. Audio followed by the DAS-1202 and then video was found to give the best results. During initial trials, audio and DAS-1202 data were synchronised to 2 ms with video to the nearest frame. Once the acquisition sequences are in progress, the patient is prompted to utter the first word in the list. This is achieved by launching a simple window that displays the word in large font (refer to Figure 7.4). Figure 7.4: A typical test prompt. 143
154 Chapter 7: Analysis Software Implementation Data Acquisition By continually monitoring the number of acquired samples, timing for the data acquisition sequence can be derived from the DAS-1202 pacer clock. For example, if the sample count equates to the word display period (e.g. 2 seconds), the display window is prompted to draw the next word. Also, in order to record video at 25 frames per second, the sample count is used to trigger a new video grab every 40 ms. During the acquisition period, DAS-1202 samples are continually transferred from the DMA buffer to the main test buffer. Similarly, the image contained within the MIL buffer is copied to the main video buffer prior to initiating each new grab. However, should this image be incomplete, its transfer is delayed and the new grab abandoned. To prevent synchronisation drift, the abandoned frame is then tagged as dropped. This feature is especially useful when acquiring video on slower computers. When all words in the list have been displayed, both the sound card and DAS-1202 routines are reset. The video grab terminates automatically when the current frame has been acquired. The recorded data are then passed to the main test window for processing and eventual display. 7.3 The Test Analysis Windows Once the acquisition sequence is complete, the user is presented with a test analysis display. The display format is dependent on the parameters selected in the test protocol and is constructed from one or more individual windows. Analysis centres on the main Test Scope window, which contains the data buffers captured during the acquisition phase. This window supplies the data to various child analysis windows including Bar, EPG, Wave, FFT, Spectrogram and Video. Figure 7.5 illustrates a typical test analysis display. 144
155 Chapter 7: Analysis Software Implementation Figure 7.5: A typical test display featuring a selection of airflow, voicing and lingual parameters. The above example illustrates three test windows displaying a selection of speech related parameters. The Test Scope window to the right of the display features the following waveforms (top to bottom): speech intensity, nasal airflow, oral airflow, alveolar contact, palatal contact and velar contact. The patient information and test analysis group boxes are also shown. A single EPG frame showing tongue-palate contact at the cursor position is illustrated in the bottom left-hand window. Finally, high-resolution speech and Lx waveforms are displayed in the top left-hand window The Test Scope Window The initial appearance of the Test Scope window is very similar to its real-time counterpart. In this window however, the words identifying each trace are drawn below the title bar. Many features of the Test Scope window are identical to those discussed in chapter 6 (section 6.4), and therefore only features unique to this window are detailed below. 145
156 Chapter 7: Analysis Software Implementation Analysis Analysis is performed on a Test Scope window by positioning track or block cursors over areas of interest. When a track cursor is activated, calculations reflect its waveform interception point. Calculations performed with the block cursor are averaged over its entire width. A variety of waveform analyses relating to cursor position can be displayed to the right of the window, and also below on the application window s status bar. The calculations may be classified into two main categories - general and specialised. General analyses, performed on all displayed waveforms, are detailed below: Time is given in milliseconds or seconds and reflects the current track cursor position or the time period covered by the block cursor. Amplitude is given as a percentage of full scale and represents the waveform s amplitude at the track cursor position or the mean amplitude covered by the block cursor. A value is given for each of the displayed waveforms and is identified by a colour key. Slope is the rate of change expressed as a percentage of full scale per millisecond, with direction indicated by ±. Single and mean values are generated for track and block cursors respectively. Again, a colour key identifies each value. Calculations performed in the specialised category are dependent on the parameters selected in the test protocol. These may be a combination of the following: Airflow Aerodynamic Nasalance and Ratio. Voicing Fundamental Frequency, Closed Quotient, Jitter and Shimmer. EPG Alveolar Contact, Palatal Contact and Velar Contact. In addition to the traces available in real-time Scope, the Test Scope generates several other useful waveforms derived from the above analysis Additional Voicing Parameters The high-resolution Lx waveform, sampled by the right sound card channel, is used to derive additional voicing parameters. Key features relating to the Lx waveform are illustrated in Figure
157 Chapter 7: Analysis Software Implementation Ai CPi Pi Figure 7.6: Key Lx waveform features. A total of 10 complete glottal cycles are used to derive each of the following parameters: Fundamental Frequency (Fx) - Deriving Fx in this manner produces a more accurate measurement than its hardware-generated equivalent. 1 Fx (7.1) P Where P N i1 Pi N (7.2) And N = 10 Closed Quotient (Qx) This is the percentage of each cycle during which the vocal folds are closed. N CPi / Pi Qx 100 (7.3) i1 N Jitter Factor (Jx) This is the percentage frequency variation of vocal fold vibration over 10 glottal cycles. N 1 ( 1) Pi P i Jx ( 1) (7.4) i N P Shimmer Factor (Sx) This is the percentage amplitude variation of vocal fold vibration over 10 glottal cycles. 147
158 Chapter 7: Analysis Software Implementation N 1 ( 1) Ai A i Sx ( 1) (7.5) i N A Where A N i1 Ai N (7.6) Additional Lingual Parameters. Several useful parameters relating to tongue-palate contact are derived from the electropalatograpy data. As shown in Figure 7.7, the palate may be divided into various regions. A P V Alveolar Palatal Velar L M R LB RB Left Lateral Midline Right Lateral Left Balance Right Balance Figure 7.7: Palatal regions. The following lingual parameters are derived from a single frame of EPG data: Alveolar - The amount of tongue-palate contact in the front two rows of the palate expressed as a percentage. 148
159 Chapter 7: Analysis Software Implementation N A Alveolar (7.7) Where N A is the number of contacts in the alveolar region. Palatal - The amount of tongue-palate contact in the middle three rows of the palate expressed as a percentage. N p Palatal (7.8) Where N P is the number of contacts in the palatal region. Velar - The amount of tongue-palate contact in the rear three rows of the palate expressed as a percentage. NV Velar (7.9) Where N V is the number of contacts in the velar region. Left Lateral - The amount of tongue-palate contact in the left two columns of the palate expressed as a percentage. N L Left Lateral (7.10) Where N L is the number of contacts in the left lateral region. Right Lateral - The amount of tongue-palate contact in the right two columns of the palate expressed as a percentage. N R Right Lateral (7.11) Where N R is the number of contacts in the right lateral region. Midline - The amount of tongue-palate contact in the centre four columns of the palate expressed as a percentage. 149
160 Chapter 7: Analysis Software Implementation N M Midline (7.12) Where N M is the number of contacts in the midline region. Centre of Gravity - The linear centre of gravity of the total contact region, specified as a row number, from front (row 1) to back (row 8) of the palate. Center of 1 N Gravity 8 N N 2 3N 3 8N8 8 6 N N N (7.13) Where N x is the number of contacts in the various rows. A result of 100 % equates to total front contact and 0% to total rear contact. Balance - The balance of tongue-palate contact from left to right expressed as a percentage. N RB N LB Balance 100 (7.14) N RB N LB Where N RB and N LB represent the number of contacts in the right balance and left balance regions respectively. A result of 100% equates to total left contact, 0% is symmetrical and +100% is total right contact. Weight - The number of tongue-palate contacts over the entire palate expressed as a percentage. Weight N (7.15) Where N is the number of contacts over the entire palate Audio Audio playback of the displayed waveforms may be activated from standard controls located on the application window s toolbar. These controls are similar to those found on most audio equipment and include fast forward, rewind, play, pause, stop and loop. Depending on the cursor status, playback may commence in one of three ways: 150
161 Chapter 7: Analysis Software Implementation No cursor active playback commences from the start of the display widow, and unless interrupted by the user, continues for the window duration. Track cursor active playback commences from the start of the track cursor and continues to the end of the display window. Block cursor active playback commences from the left-hand cursor and continues to the right-hand cursor. This feature is useful for isolating specific speech sounds, and when used with the audio loop facility, greatly enhances waveform interpretation. During audio playback an animation cursor appears in each trace window to indicate the current playback position. Also, the data displayed in any active child window changes to reflect the animated cursor position Zoom The user can zoom into a specific word by simply double clicking on the relevant display area. This may be on the actual word itself or anywhere on the displayed trace. The width of the resulting zoom window is dependent on the word display period (e.g. 2 seconds). An additional feature allows the user to zoom into any portion of the display window by dragging a rectangle over areas of interest. This is useful for analysing small word groups or for isolating specific speech sounds. When created, each zoom window inherits the same functionality as its parent window Data Export The individual data samples of the displayed waveforms, together with the numerical parameters available in the analysis group box, can be exported to either a text file or the Windows clipboard. The ASCII format of the exported data is compatible with many proprietary software packages such as Microsoft Excel, which allow further statistical, numerical or graphical analysis to be performed Test Analysis Child Windows As illustrated in Figure 7.1, test analysis versions of Bar, EPG, Wave, FFT, Spectrogram and Video are also available. These windows inherit their data from the main Test Scope window. The amount of data displayed in each child window is largely dependent on the active cursor type. As the cursor is positioned over the test waveforms, any two- 151
162 Chapter 7: Analysis Software Implementation dimensional displays change to reflect the data at the current cursor position. Trend displays show an additional cursor that mimics the main test cursor The Test Bar Window This window reflects the intensity of the selected parameters at any given point on the Test Scope trace. The Test Bar display remains empty until a cursor is activated and moved. At this point, the Test Scope window sends a data notification message to every test child window. Attached to this message is a pointer to a structure containing the following elements: Pointer to the processed DAS-1202 data. Pointer to the Audio data. Pointer to the Video data. Type of cursor active. Relative position of cursor (or cursors). On receipt of this data notification message, the Test Bar window determines the active cursor type. If a track cursor is enabled, its relative position within the DAS-1202 buffer is determined. A single value for each parameter is then extracted and passed directly to the display routine. However, if the block cursor is enabled, the number of samples extracted corresponds to the cursor width. These samples are averaged prior to display. With the exception of the data notification routine, the program structure of the Test Bar window is identical to its real-time counterpart. Both windows also share the same Settings dialogue box, with options not applicable to the test version greyed out The Test EPG Window The Test EPG window processes the data notification message in a manner similar to that previously described. However, this window supports a variety of display options not available in its real-time equivalent. When a track cursor is enabled, the EPG data changes to reflect the tongue-palate contact at the relative cursor position. A number of alternative display options are available when the block cursor is enabled. As illustrated in Figure 7.8, these include grey scale, spectrum and multiple frames. 152
163 Chapter 7: Analysis Software Implementation Figure 7.8: Test EPG display options: grey scale (top left), spectrum (top right) and multiple frames (bottom). The grey scale option (top left), which is actually red in this example, is generated by calculating the average number of segment contacts over the entire block cursor width. The colour density therefore reflects the total duration of contact in each particular segment. In the above example, dark reds represent prolonged contact, pink brief contact and grey no contact. The spectrum option (top right) implements the same algorithm but uses a selection of colours to represent varying degrees of contact. In order of contact strength, the spectrum colours are red, orange, yellow, green and blue. The segment merge option has been applied in the above example. Finally, the multiple frame option allows the user to analyse each frame contained within the block cursor. In this example, the data represented in the grey scale and spectrum windows are displayed as sequential frames in 153
164 Chapter 7: Analysis Software Implementation the multiple view (bottom window). When viewing a large number of frames, horizontal and vertical scroll bars are used to access frames not currently visible in the display area. The frame size may be adjusted to suit individual requirements. Small frames allow greater detail but large frames reveal the associated palate number. The multiple frame option is useful for analysing sequential frames that represent individual sounds or complete words The Test Wave Window The Test Wave window allows the user to analyse any portion of the speech and Lx waveforms recorded with the audio device. When a track cursor is enabled, the displayed data represent 20 ms of audio, centred on the cursor. However, this default time-base may be varied from 10 ms to 100 ms. When a block cursor is active, the time-base changes to reflect the cursor width. In this case, the displayed data represent the audio contained within the cursor The Test FFT Window The Test FFT window reveals the frequency spectrum at any point in the recorded speech signal. The type of cursor active in the Test Scope window determines the actual FFT length. A default length of 100 ms is selected for the track cursor. However, by altering the block cursor width, the FFT window can be adjusted to any length. Where necessary, zero padding is used to generate the required 2 n data boundary The Test Spectrogram Window When activated, the Test Spectrogram automatically generates its display from the entire speech recording. Depending on the test duration, this may take several seconds to compute. To enable the direct comparison between speech mechanism and outcome, this window is usually drawn above the Test Scope window. If enabled, the actions of the main test cursors are simply mimicked in the Test Spectrogram window The Test Video Window When a track cursor is active, a single frame within the Test Video window changes to reflect the cursor position. This is especially useful when analysing videofluoroscopy data, since it allows direct comparison between the velopharyngeal structure and the 154
165 Chapter 7: Analysis Software Implementation accompanying nasal airflow. When a block cursor is enabled, all video frames contained within the cursor are displayed. This is illustrated in Figure 7.9. Figure 7.9: A videofluoroscopy sequence of the velopharyngeal mechanism, generated with the block cursor option. This feature is useful when analysing subtle changes between consecutive video frames. The horizontal and vertical scroll bars may be used to view images not currently visible in the display area. Due to processing and memory constraints, the maximum number of visible frames is restricted to
166 CHAPTER 8 RESULTS AND ANALYSIS This chapter presents a series of test results acquired using the SNORS+ system. The chapter is divided into three main sections: qualitative analysis of multiparameter data, quantitative analysis of multiparameter data (excluding lingual parameters), and the analysis of electropalatograpy data. The analysis displays presented in the first section reveal the relationship between the speech mechanism and the actual speech outcome. To demonstrate how these displays can be qualitatively interpreted, several examples are discussed in detail. Section two discusses the outcome of a small trial conducted on 40 subjects considered by the author to exhibit normal speech. The aim of the trial was to establish a series of baseline parameters that may be used to compare normal speech with pathological speech. The procedures used to quantify the acquired multiparameter data are described, and the results tabulated for each speech parameter. A selection of these parameters are also illustrated graphically and discussed in greater detail. Both inter and intra-subject variability has been investigated. Finally, to allow comparison with pathological data, a single case study of a cleft palate subject is presented. Unfortunately, due to the high cost of electropalatography palates, lingual data was not collected during the trial. Therefore, the third section discusses the normative electropalatography data obtained from four speech and language therapists. The results are illustrated graphically for a number of lingual parameters, and both inter and intrasubject variability has been investigated. Finally, a single electropalatography case study of a young boy with a lateral /s/ is presented. 8.1 Qualitative Analysis of Multiparameter Data SNORS+ is able to simultaneously acquire and display a wealth of speech related parameters. To illustrate how these parameters can be qualitatively interpreted, this section discusses several analysis displays in detail. 156
167 Chapter 8: Results and Analysis Analysis of Combined Acoustic, Airflow, Voicing and EPG Data The analysis displays presented in this section reveal the relationship between the speech mechanism, as depicted in the Scope and EPG windows, and the actual speech outcome, as illustrated in the spectrogram window. Figure 8.1 shows the results obtained from a normal subject uttering the word cheese during a standard test sequence. Figure 8.1: The word cheese, as uttered by a normal subject. With reference to the wideband spectrogram, the initial affricate /t/ produces a relatively low-level, broadband trace. With vowel onset, the acoustic energy increases and the frequency spectrum decreases. The vertical striations produced by the glottal source are also clearly visible. Finally, the voiceless fricative /s/ produces another low-level signal but with a slightly higher frequency content. The deflection in the speech intensity trace is minimal for the low-level sound produced by the initial affricate. The intensity then peaks at the start of the vowel and continues at a 157
168 Chapter 8: Results and Analysis relatively high-level throughout its duration. Again, the low-level sound associated with the final fricative is barely visible. There is minimal nasal airflow throughout the word, except for a small peak prior to release of the affricate, which is due to flexing of the velum (refer to section 8.1.2). The oral airflow, however, peaks considerably during the affricate and remains at a relatively high level during the vowel and the final consonant. After the word, both oral and nasal airflow rise due to exhalation. The voicing starts with the vowel, continues throughout its duration, and subsides before the final consonant. This is to be expected since the vowel is voiced but the affricate and fricative are unvoiced. The red EPG frame represents the tongue-palate contact for the // part of the initial affricate, as indicated by the red cursor. This is a typical palatal grooved pattern with contact along both lateral margins. The production of /t/ is characterised by a stop closure followed by a fricative-like release of oral airflow, which is clearly visible in the Scope window. The blue EPG frame illustrates the tongue-palate contact for the final consonant /s/, as indicated by the blue cursor. Contact is complete along both lateral margins and there is a narrow groove configuration in the anterior rows. This pattern creates the characteristic hissing sound as air passes between the tongue and hard palate. Again, the accompanying oral airflow is clearly visible in the Scope window. Figure 8.2 illustrates the results obtained from a normal subject uttering the word seat during a standard test sequence. 158
169 Chapter 8: Results and Analysis Figure 8.2: The word seat, as uttered by a normal subject. The characteristic low-level, high frequency, acoustic signal accompanying the initial fricative /s/ is clearly visible in the spectrogram. The vowel /i/ produces a brief high intensity, lower frequency trace that is typical of many voiced sounds. There is then a silence prior to release of the plosive /t/, which is followed by another low-level, high frequency acoustic signal on its release. With reference to the speech intensity, the initial low-level sound produced by the fricative is barley visible. A short high-level period is generated for the vowel, which is followed by silence during alveolar closure. Finally, a small deflection representing release of the final /t/ is observed. The nasal airflow is minimal, except for the characteristic peak before the initial fricative and the usual exhalation at the end of the word. The oral airflow is high during the /s/ but falls to a low level during the vowel. This is because the resonant vowel sound results in little DC airflow, comprising mainly the AC perturbations producing the sound pressure wave. The oral airflow finally peaks as the plosive /t/ is released. 159
170 Chapter 8: Results and Analysis Voicing is apparent only during the vowel. Again, this is to be expected since /s/ is an unvoiced fricative sound and /t/ is an unvoiced plosive sound. The two EPG frames show tongue-palate contact during the production of the fricative /s/ and during closure for the plosive /t/. Again, for the fricative /s/, the tongue contacts the hard palate around the dental arch and alveolar ridge, leaving just a small groove through which air is blown to produce the sound. The accompanying oral airflow is clearly indicated by the red cursor. The pattern for the /t/ is similar, except there is no groove. Hence there is a complete seal, which allows air pressure to build up in the oral cavity. It is the transient oral airflow resulting from the release of this pressure that produces the sound. This transient airflow can be observed to the right of the blue cursor Analysis of Combined Airflow and Videofluoroscopy Data In collaboration with the Cleft Palate team at Queen Victoria Hospital, East Grinstead, SNORS+ has been combined with the X-ray technique of videofluoroscopy. This combination reveals the relationship between the actual speech mechanism and the structures displayed within the videofluoroscopic image. Figure 8.3 illustrates a lateral videofluoroscopic image of the velopharyngeal mechanism (right), combined with speech intensity, airflow and voicing data (left). This example was taken from a subject uttering the word fight during a standard test sequence. 160
171 Chapter 8: Results and Analysis Figure 8.3: A SNORS+ analysis display featuring a lateral videofluoroscopic image of the velopharyngeal mechanism. It should be noted that the circular discs positioned near the larynx in the videofluoroscopic image are the Laryngograph electrodes. To aid image interpretation, approximate outlines of the soft palate and posterior pharyngeal wall have been traced. As can be seen, during the vowel //, the velum is raised and clearly makes contact with the posterior pharyngeal wall. It is apparent, therefore, that successful velopharyngeal closure has been achieved in this case. If this expectation is compared with the airflow present at the cursor position, it is seen to be exclusively through the oral cavity, thus supporting the radiographic finding of correct velopharyngeal function. In this particular case, it is likely that no further imaging assessment need be undertaken, since confidence is much greater in the multiparameter assessment than with the imaging technique alone. However, if a significant amount of nasal airflow had been present, it would have suggested that either the videofluoroscopic image had been misinterpreted or the defect in the mechanism is not visible from the view being studied. In this case, further assessment would be required to determine the cause of velopharyngeal incompetence. As another example to illustrate the advantages of combined videofluoroscopy and anemometry, consider the analysis display shown in Figure
172 Chapter 8: Results and Analysis Figure 8.4: Flexing of the velum prior to affricate release. With reference to the nasal airflow trace, a significant burst of airflow can be observed prior to release of the initial affricate /t/, as indicated by the cursors. This characteristic is also often observed before the release of plosive and fricative sounds. Previously, investigators have explained this characteristic in one of two ways: Leakage through the velopharyngeal port due to a pressure build up in the oral cavity. Flexing of the velum due to a pressure build up in the oral cavity, thus expelling any residual air in the nasal cavity. When examining the videofluoroscopic image prior to the nasal air emission, as indicated by the purple cursor, it is evident that the velum is raised and makes contact with the posterior pharyngeal wall. Again, to aid interpretation, an approximate outline of the soft palate has been traced in purple. The subsequent video frame acquired 40 ms later, reveals a definite flexing of the velum towards the pharyngeal wall (as indicated by the blue outline). The author considers this movement to be of sufficient magnitude to produce the characteristic nasal air emission observed in many anemometry studies. 162
173 Chapter 8: Results and Analysis 8.2 Quantitative Analysis of Multiparameter Data The displays described in the previous section provide the user with detailed qualitative data. However, for the purpose of quantifying the various speech parameters, it is necessary to extract a numerical index that is representative of each. A selection of the numerical analyses supported by SNORS+, and the speech functions they describe, are tabulated below. The actual parameters were defined in chapters 6 and 7. Parameter Word/Sentence Duration Acoustic Nasalance Oral Airflow Nasal Airflow Respiration Aerodynamic Ratio Aerodynamic Nasalance Fundamental Frequency Closed Quotient Shimmer Factor Jitter Factor Lingual Parameters Related Speech Function Fluency Velopharyngeal closure Respiratory effort Respiratory effort & velopharyngeal closure Respiratory effort Velopharyngeal closure Velopharyngeal closure Pitch Voice quality Voice quality Voice quality Lingual articulation Table 8.1: A selection of the numerical analyses supported by SNORS+, and their related speech function. In order to establish a baseline with which to compare the numerical indices associated with normal speech to that of pathological speech, a small trial was undertaken. During the trial, data were obtained from 40 speakers considered by the author to exhibit normal speech. Unfortunately, due to the high cost of electropalatography palates, the lingual parameters were not included in this trial. The following sections describe the test protocol implemented in the trial and discuss its outcome The Trial A total of 40 subjects, judged to be normal speakers by the author, were selected for the trial. Measurements were made on both male and female speakers, with ages ranging from late teens to mid fifties. It should be noted, however, that the majority of these subjects were not formally assessed by a trained listener, and were drawn from a range of nationalities including English, German, Israeli and Kuwaiti. No attempt has been made 163
174 Chapter 8: Results and Analysis within this preliminary investigation to relate outcome to any of these important factors. Each subject was assessed using the standard SNORS+ test protocol using the Exeter wordlist. All recordings were made under laboratory conditions in a dedicated measurement clinic. Since it is known that a degree of variability exists between repeated recordings of an individual (Folkins, 1986), three recordings of each subject were made. This generated 120 recordings of each word in the test sequence. To assess inter-subject variability, data from each recording were combined and the mean parameters extracted. Intra-subject variability was investigated using additional recordings made on two subsequent occasions for eight of the subjects. This gave a total of nine recordings for each word per subject Analysis Procedure Numerical analysis in SNORS+ is conducted on the main Test Scope window by positioning track or block cursors over areas of interest. When a track cursor is activated, calculations reflect its waveform interception point. Calculations performed with the block cursor are averaged over its entire width. A variety of waveform analyses relating to cursor position can be displayed to the right of the window in the analysis group box, and also below on the application window s status bar. The numerical parameters displayed in the analysis group box can be exported to either a text file or the Windows clipboard. The ASCII format of the exported data is compatible with many proprietary software packages such as Microsoft Excel, which allow further statistical, numerical or graphical analysis to be performed. During the trial, analysis was performed on the Test Scope window by activating the block cursor then zooming into a particular word. Since any respiration associated with an utterance can significantly affect the accuracy of certain measurements, it was essential to isolate each word before conducting the data analysis. This was achieved by positioning the block cursor around the speech intensity waveform to isolate the sound associated with each word. To aid in the identification of low intensity sounds, such as voiceless fricatives, an additional wideband spectrogram was used. Finally, correct word isolation was confirmed by listening to the audio contained within the block cursor. Once each word had been correctly isolated, the associated numerical analyses were exported to the Windows clipboard then pasted into Microsoft Excel for further statistical processing. 164
175 Chapter 8: Results and Analysis Results As previously stated, three recordings of each subject were made on one occasion, with eight subjects making three additional recordings on two subsequent occasions. This allowed analysis of both inter and intra-subject variability. With the exception of lingual, the parameters specified in Table 8.1 were calculated for all subjects. For inter-subject variability, the parameters obtained from each of the three recordings were combined and an average found. In terms of mean and standard deviation, Table 8.2 summarises the results for each word. Parameter Begin Type Fight Seat Cheese Shoot Smoke King Missing End Word Duration Mean Std Dev Acoustic Mean Nasalance Std Dev Oral Airflow Nasal Airflow Respiration Mean Mean Mean Std Dev Std Dev Std Dev Aerodynamic Mean Ratio Std Dev Aerodynamic Mean Nasalance Std Dev Fundamental Mean Frequency Std Dev Closed Mean Quotient Std Dev Shimmer Mean Factor Std Dev Jitter Factor Mean Std Dev Table 8.2: Summary of inter-subject variability in terms of mean and standard deviation for each word. Intra-subject variability was assessed for eight individual subjects. The parameters were calculated for each word, recorded three times on each of three occasions. In terms of mean and standard deviation, Table 8.3 summarises the results obtained for a 33-year-old male with a Midlands accent. Additional intra-subject results are tabulated in appendix B. 165
176 Chapter 8: Results and Analysis Parameter Begin Type Fight Seat Cheese Shoot Smoke King Missing End Mean Word Duration Std Dev Acoustic Mean Nasalance Std Dev Oral Airflow Nasal Airflow Respiration Mean Mean Mean Std Dev Std Dev Std Dev Aerodynamic Mean Ratio Std Dev Aerodynamic Mean Nasalance Std Dev Fundamental Mean Frequency Std Dev Closed Mean Quotient Std Dev Shimmer Mean Factor Std Dev Jitter Factor Mean Std Dev Table 8.3: Summary of intra-subject variability in terms of mean and standard deviation for a 33-year-old male. To provide a graphical illustration of inter and intra- subject variability, a selection of the measured parameters are plotted in the following sections. Unfortunately, a detailed discussion on all the parameters listed in Table 8.1 is beyond the scope of this thesis. Therefore, attention is given to aerodynamic nasalance since comparatively little normative data is available for this parameter Aerodynamic Nasalance Aerodynamic nasalance, which provides an extremely useful measure of velopharyngeal closure, is defined as the percentage of total positive airflow that is nasal (refer to chapter 6, section ) Inter-Subject Variability A plot of aerodynamic nasalance for the 40 subjects exhibiting normal speech is given in Figure
177 Chapter 8: Results and Analysis Aerodynamic Nasalance (%) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Figure 8.5: Plot of mean aerodynamic nasalance, and standard deviation (error bars), for the 40 subjects exhibiting normal speech. The data points in Figure 8.5 represent the mean aerodynamic nasalance calculated for each word in the test sequence. Additionally, the error bars depict the standard deviation from the mean. This plot clearly reveals the difference in aerodynamic nasalance between purely oral words type, fight, seat, cheese, shoot and those having a nasal element begin, smoke, king, missing, end. The mean values for the oral words lie between 6.5% and 13.5%, whereas the values for the nasal words vary from 21.8% to 52.4%. As expected, these results agree with the theoretical prediction: a higher aerodynamic nasalance accompanies words containing nasal elements. However, the diversity of intersubject variability was not expected, and it is perhaps surprising that subjects having very different aerodynamic nasalance values produce very similar and normal speech. It is also apparent from the graph, that most speakers do not have zero nasal airflow during the production of purely oral words. This suggests that for a word to sound hypernasal, a significant amount of nasal airflow must be present. Oral words containing plosives, which require high intra-oral pressure type, fight, seat, shoot all have higher and more 167
178 Chapter 8: Results and Analysis diverse aerodynamic nasalance values than the word containing an affricate cheese though this also requires high intra-oral pressure. This suggests the following: The required intra-oral pressure for affricates is lower, though more sustained. The high-pressure associated with plosive sounds, either causes the velopharyngeal port to leak or simply flexes the velum, expelling any residual air in the nasal cavity Intra-Subject Variability A plot of mean aerodynamic nasalance for a 33-year-old male exhibiting normal speech is shown in Figure Aerodynamic Nasalance (%) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Figure 8.6: Plot of mean aerodynamic nasalance, and standard deviation, for a 33-yearold male exhibiting normal speech. Although the intra-subject data exhibit markedly less variability than the inter-subject values, there is still a significant variation. As there was no perceived difference in 168
179 Chapter 8: Results and Analysis outcome during the different test sessions, this suggests that considerable changes in aerodynamic nasalance are necessary before the resulting speech is affected Conclusion Due to the uniformity of aerodynamic nasalance, the results obtained in this preliminary investigation suggest that purely oral words should be used for the quantitative analysis of excessive nasal air emission. In particular, the values associated with the word cheese were consistently low amongst individuals, producing a mean of just 6.5%. Although a reduction in aerodynamic nasalance may represent an improvement in hypernasality, zero values must not be expected and inter and intra-subject variations should always be considered. Also of interest is the wide range of aerodynamic nasalance values in and between normal speakers. Because in each of these cases the perceived speech outcome was the same, this observation suggests that analysis of the acoustic signal alone is an unreliable method of assessing hypernasality. Figure 8.7 illustrates the mean acoustic nasalance obtained from the 40 normal speakers. 169
180 Chapter 8: Results and Analysis Acoustic Nasalance (%) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Figure 8.7: Plot of mean acoustic nasalance for 40 subjects exhibiting normal speech. Acoustic nasalance is defined as the percentage of total sound intensity that is nasal (refer to chapter 6, section ). Although the general trend is similar to that found in aerodynamic nasalance, the word cheese consistently produced the greatest acoustic nasalance for purely oral words. The conflict between the two parameters may be explained by the intense oscillatory oral airflow associated with the sustained vowel sound. This causes a flexing of the velum, which in turn generates an audible acoustic resonance in the nasal cavity. Figure 8.8 clearly illustrates this effect. 170
181 Chapter 8: Results and Analysis Figure 8.8: The effect of acoustic coupling via the soft palate. Figure 8.8 shows the test results obtained for the word cheese when recorded in highresolution mode. The following waveforms are presented (top to bottom): nasal speech, oral speech, unfiltered nasal airflow, unfiltered oral airflow and larynx excitation. In this mode, both positive and negative waveform portions are visible, and a baseline is drawn for zero reference. The DC airflow associated with the voiceless affricate is clearly visible on the oral airflow waveform. A good velopharyngeal seal is indicated by very little nasal airflow. Although low-level, the resulting high-frequency acoustic signal can be seen in the oral speech waveform but is absent in the nasal speech waveform. However, with voice onset, the intense oscillations in oral airflow are coupled to the nasal cavity via the soft palate, which vibrates under high intra-oral pressure. The absence of a DC component on the nasal airflow trace suggests an effective velopharyngeal closure has been maintained. Also clearly visible are the resultant acoustic signals generated in both the oral and nasal cavities. The presence of this nasal speech signal increases the magnitude of acoustic nasalance and is misleading since a good velopharyngeal seal has been achieved. 171
182 Chapter 8: Results and Analysis This observation adds to the growing body of evidence that the relationship between speech mechanism and outcome is not straightforward. Further work is needed to study this relationship and to investigate the reasons for the diversity in aerodynamic and acoustic nasalance Fundamental Frequency The perceived pitch of voiced sounds is directly related to the rate of which the speaker s vocal folds vibrate and hence the fundamental frequency. The range of fundamental frequency is an important parameter in the assessment of vocal performance, and can be directly investigated using SNORS+. Fundamental frequency is dependent on the length and tension of the vocal folds: an adult male may have a fundamental frequency range of about 80 Hz to 200 Hz, and an adult female may range from 140 Hz to 310 Hz, in normal speech (Abberton and Fourcin, 1984) Inter-Subject Variability Figure 8.9 illustrates the range of fundamental frequencies obtained for both male and female speakers during the trial. 172
183 Chapter 8: Results and Analysis Female Male Fundamental Frequency (Hz) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Figure 8.9: Plot of mean fundamental frequency, and standard deviation, for both male and female speakers exhibiting normal speech. The data points in Figure 8.9 represent the mean fundamental frequency calculated for each word in the test sequence. Additionally, the error bars depict the standard deviation from the mean. The graph clearly reveals the difference in fundamental frequency between male and female speakers. The values for male speakers range between 82 Hz and 131 Hz, whereas the values for female speakers vary from 163 Hz to 222 Hz. These results lie within the range of fundamental frequencies quoted for normal speakers. On average, the word shoot produced the highest fundamental frequency in both groups. Interestingly, the word end produced the lowest value, which may be attributed to the relief of uttering the final word in the test sequence! Intra-Subject Variability A plot of mean fundamental frequency for a 33-year-old male and a 23-year-old female is shown in Figure
184 Chapter 8: Results and Analysis 300 Female Male 250 Fundamental Frequency (Hz) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Figure 8.10: Plot of mean fundamental frequency, and standard deviation, for a 33-yearold male and a 23 year-old female. As can be seen, the intra-subject data exhibit markedly less variability than the intersubject values. Again, the difference in fundamental frequency between the male and female speaker is clear. Plots of additional speech parameters are illustrated in appendix B. 174
185 Chapter 8: Results and Analysis A Single Aerodynamic Case Study Velopharyngeal insufficiency (VPI) is the inability to make adequate velopharyngeal closure, and may be the result of either neurological or structural abnormalities (Main et al., 1999). VPI results in abnormal speech characteristics such as omissions, substitutions or weak articulation of consonants, and hypernasality. This case study examines the hypernasal speech production of a young man with a repaired cleft lip and palate Case History P.C., a 23-year-old male, had a primary repair of his right unilateral cleft of the lip and palate in New York at an early age. He had a moderate hearing loss in the right ear, and a mild loss in the left. Two years prior to referral, P.C. had a pharyngoplasty in France, and despite having conventional speech and language therapy he still remained moderately hypernasal. His therapist eventually referred him for SNORS+ assessment and treatment, as it was felt he would respond well to this type of therapy Assessment The initial assessment revealed that P.C. was able to make velopharyngeal closure but was unable to maintain it. When observed using a pen torch, he could voluntarily contract his velum/pharynx, and appeared to make closure when doing so. The airflow patterns for the word cheese, recorded during the initial assessment, are illustrated in Figure
186 Chapter 8: Results and Analysis Figure 8.11: Speech and airflow patterns for the word cheese, as uttered by a cleft palate subject prior to SNORS+ therapy. With reference to the nasal airflow trace, there is considerable escape during release of the initial affricate. A brief pause is observed following the affricate and all airflow stops. Nasal airflow increases throughout the rest of the word and peaks during exhalation at the end of the word. An aerodynamic nasalance of 31% was recorded for this utterance, which is outside normal limits according to the preliminary trail on normal speakers Therapy Regime The requirement of velopharyngeal closure for each sound was initially explained to P.C. Work then started on sustained vowels, and he was encouraged to increase the oral airflow while maintaining and/or reducing the nasal airflow. Therapy employed the real-time Bar display to provide visual biofeedback. Any small adjustments that P.C. tried whilst attempting these exercises could be seen to either work or fail. Gradually, as this task became easier, P.C. s target level of nasal airflow was reduced. 176
187 Chapter 8: Results and Analysis During therapy, P.C. was repeatedly asked to focus on his velar function, and to associate any sensation with the resultant sound. He was then encouraged to reproduce these sensations and the accompanying sounds, first with, and as he became more practised, without visual biofeedback. Therapy progressed from sustained sounds, to words, and finally to sentences. Because he was able to observe any improvements made, P.C. remained highly motivated throughout each hour-long session Outcome Figure 8.12 illustrates the airflow patterns recorded for the word cheese during the final assessment. Figure 8.12: Speech and airflow patterns for the word cheese, as uttered by a cleft palate subject post SNORS+ therapy. After six sessions of SNORS+ therapy, the nasal air escape during the affricate had significantly reduced, and there was now complete closure during the vowel. An aerodynamic nasalance of just 1.2% was recorded for this utterance, which is within normal limits according to the preliminary trial on normal speakers. During each session, three recordings of the standard test sequence were made. Figure 8.13 illustrates the 177
188 Chapter 8: Results and Analysis resultant mean aerodynamic nasalance plotted for each word, pre and post SNORS+ therapy Pre Therapy Post Therapy Normal Aerodynamic Nasalance (%) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Figure 8.13: Plot of mean aerodynamic nasalance, pre and post-therapy, for a cleft palate subject. As can be seen, the reduction in P.C. s nasal air emission after six weeks of SNORS+ therapy is quite dramatic. A follow-up assessment one month later showed slight further improvement: virtually no nasal airflow during the word cheese and an aerodynamic nasalance of less than 1%. Since clinical observations have suggested that individuals with cleft palate and/or velopharyngeal inadequacy are prone to various voice disorders Zajac (1995), assessment was extended to include P.C. s laryngeal parameters. Warren (1986) described the larynx as one of several potential valves for regulating speech aerodynamic events when velopharyngeal inadequacy is present. It has been suggested that in order to maintain constant pressures for speech production, an individual may increase laryngeal resistance to compensate for reduced velopharyngeal resistance. This can lead to structural abnormalities in the vocal folds, which may produce abnormal vocal qualities. The laryngeal parameters collected during P.C. s assessment (i.e. fundamental frequency, 178
189 Chapter 8: Results and Analysis closed quotient, shimmer and jitter) were used to verify normal vocal fold activity. When compared to the normative data collected during the trial, these parameters were all found to be within normal limits Conclusion In this particular case, SNORS+ biofeedback therapy proved significantly more effective than conventional therapy in improving velopharyngeal function. Additionally, SNORS+ provided clear graphical and numerical evidence of progress, which assisted in therapy and proved extremely motivating for both patient and therapist. Such measures of outcome are becoming increasingly important as the need to prove efficacy grows. Finally, the additional laryngeal parameters were found extremely useful in the verification of normal vocal fold activity; a function sometimes affected in cleft palate speech. 179
190 Chapter 8: Results and Analysis 8.3 Analysis of Electropalatography Data In order to evaluate the system s ability to analyse electropalatography data, both qualitatively and quantitatively, normative multiparameter data were recorded from four speech and language therapists. This section discusses the various methods used to present and analyse the vast amounts of data produced by Linguagraph Qualitative Analysis of Electropalatography Data A convenient method of qualitatively analysing the recorded Linguagraph data can be accomplished by viewing single EPG frames at various points of interest. As described in chapter 7 (section ), this is achieved by positioning a track cursor within the main Test Scope window, using the available multiparameter data as a guide to correct placement. Figure 8.14 illustrates a selection of the recorded contact patterns. Figure 8.14: A selection of EPG patterns recorded during normal speech. Taken from a normal subject, the above contact patterns were produced for the following sounds: 180
191 Chapter 8: Results and Analysis /k/ - This is a typical velar stop pattern and occurs during the closed phase of velar plosives. This pattern has minimal contact along the margins of the palate and complete contact across the posterior row. The initial closure is often accompanied by the characteristic burst of nasal airflow due to flexing of the velum. Release is characterised by a steep rise in oral airflow producing a low-level, broadband, acoustic signal. Due to its low intensity, this signal is often difficult to detect on the acoustic waveform but is clearly visible on a spectrogram. There is no voicing associated with this sound; however, when voiced this pattern produces the /g/ sound. /s/ - This alveolar grooved pattern also occurs during the voiced sound /z/. Contact is complete along both lateral margins and there is a narrow grooved configuration in the anterior palatal rows. A relatively constant level of oral airflow accompanies this sound, which produces a low-level, broadband, acoustic signal that is clearly visible on a spectrogram. Again, there is no voicing associated with this sound. // - This palatal grooved pattern also occurs during the voiced sound //, and in comparison with /s/ or /z/ has a wider and more posteriorly placed groove. The accompanying multiparameter data is very similar to that described for /s/. /t/ - This alveolar stop pattern is characterised by contact along the lateral margins of the palate and complete closure across the front row. The pattern typically occurs during the closed phase of alveolar plosives, which includes the voiced sound /d/. Again, the initial closure is often accompanied by the characteristic burst of nasal airflow due to flexing of the velum. Release is characterised by a steep rise in oral airflow producing a low-level, broadband, acoustic signal that is clearly visible on a spectrogram. There is no voicing associated with this sound. /l/ - This apical pattern occurs during a /l/ in an open or back vowel environment, and is characterised by minimal posterior and central contact. This voiced sound produces a high-level audio signal that is clearly visible in both the acoustic waveform and spectrogram. A significant amount of oral airflow is also present. Alveolar/velar - This pattern occurs during velar/alveolar or alveolar/velar consonant sequences, such as in the word Kitkat. There is no voicing and very little airflow associated with this pattern. From contact information such as this, it is possible to identify a number of specific patterns that appear in normal speech production. These contact patterns should be 181
192 Chapter 8: Results and Analysis compared with the idealised target patterns presented in chapter 3 (section ). Although the identification of normal tongue-palate configurations is a relatively simple task, the accompanying multiparameter data allows the user to locate each pattern with increased speed, accuracy and confidence. Also, the supplementary data is often useful when identifying the duration of certain lingual sequences that may otherwise be difficult to measure. For example, the palatal groove configuration for // is not clearly defined, but its duration may be revealed by the accompanying oral airflow. In addition, the multiparameter data has been found extremely useful when identifying abnormal tonguepalate configurations (and their duration), which frequently bear little resemblance to normal contact patterns. For example, a lateral /s/ may produce an abnormal amount of contact in all palatal zones, making it extremely difficult to identify with EPG data alone. However, with the aid of airflow and/or acoustic information, identification becomes a relatively simple task. Although the above technique is extremely useful when examining the place of articulation, it conveys little or no information about the dynamics of articulation. This limitation may be overcome by using the block cursor option described in chapter 7 (section ), which enables multiple EPG frames to be displayed for a selected speech segment. These frames are numbered consecutively and read from left to right. Figure 8.15 shows a multiple printout of a normal subject uttering the word Kitkat. 182
193 Chapter 8: Results and Analysis 0 /k/ // /t/ /k/ // /t/ Figure 8.15: Consecutive EPG frames for the word Kitkat, as uttered by a normal subject. In addition to examining place of articulation, a record such as this is useful for measuring lingual timing details. Since the original data were recorded using a 100 Hz frame rate, the individual EPG frames appear at 10 ms intervals. It should be noted that the top left-hand electrode on this particular palate was defective and therefore remains inactive throughout the utterance. The main features of the above sequence may be summarised as follows: 183
194 Chapter 8: Results and Analysis Frame Tongue Position formation of velar closure for initial /k/ 76 release forward and upward movement of tongue for alveolar closure 88 onset of closure for /t/ brief simultaneous alveolar/velar closure 97 release of alveolar closure 110 release of velar closure evidence of low tongue position for // vowel forward and upward movement of tongue for alveolar closure full alveolar and lateral closure for final /t/ 147 release Electropalatography Data Reduction When analysis involves a large corpus and/or a large number of speakers, it is beneficial to transform the EPG recordings into a more manageable set of data. The grey scale option discussed in chapter 7 (section ) is one method of EPG data reduction implemented within SNORS+. The display is generated by calculating the average number of segment contacts over the entire block cursor width. The colour density therefore reflects the total duration of contact in each particular segment. Figure 8.16 illustrates the results obtained from a normal subject (left) and an abnormal subject (right) uttering the word seat. Each display is representative of 100 individual EPG frames. 184
195 Chapter 8: Results and Analysis Figure 8.16: Grey scale EPG frames obtained from a normal subject (left) and an abnormal subject (right) uttering the word seat. This example serves to illustrate the spatial distortion immediately evident in the abnormal EPG frame. Spatial distortions occur when the configuration of contacted electrodes in the EPG is unlike that seen in normal speakers. For instance, studies have shown that in English, the central palatal region remains relatively free of contact during normal speech production (Fletcher, 1988). Consequently, if the whole of the palate is contacted during the production of /s/ or /t/ (as in this case), the EPG is considered to exhibit spatial distortion. With the use of grey scales, the time required to diagnose abnormal EPG s is often reduced Quantitative Analysis of Electropalatography Data The displays described above provide the user with detailed qualitative EPG data. However, for the purpose of quantifying the contact patterns, it is necessary to extract relevant parameters from them. The lingual contact parameters supported by SNORS+ include: alveolar, palatal, velar, left lateral, right lateral, midline, centre of gravity, balance and weight. These parameters were defined in chapter 7 (section ). In analysis mode, these data can be displayed as a function of time to reveal the dynamic articulatory function, or as a single numerical index for statistical processing. Figure 8.17 illustrates a typical lingual waveform display featuring additional numerical analysis and multiple EPG frames. 185
196 Chapter 8: Results and Analysis Figure 8.17: A typical lingual waveform display featuring additional numerical analysis. The Test Scope window to the right of the display features the following waveforms (top to bottom): speech intensity, alveolar contact, palatal contact and velar contact. The patient information and numerical analysis group boxes are also shown. A corresponding set of EPG frames are presented in the left-hand window, the zones are colour coded to match the displayed lingual waveforms. This example illustrates the results obtained from a normal subject uttering the word seat. With reference to the lingual waveforms, there is initially a steep rise in alveolar contact during the formation of the fricative /s/. A relatively constant amount of alveolar contact is maintained throughout the fricative, which lasts for approximately 260 ms. The alveolar contact subsides during the vowel /i/ and a rise in palatal/velar contact is observed. As the seal for the plosive /t/ is formed, another steep rise in alveolar contact can be seen. This seal is maintained for approximately 130 ms. Finally, the alveolar contact falls rapidly on release of the plosive. In addition to revealing the dynamics of tongue movement, the lingual waveforms are often quicker and easier to interpret than the plethora of individual contact patterns. 186
197 Chapter 8: Results and Analysis Numerical Electropalatography Analysis Numerical EPG analysis is performed on the main Test Scope window in an identical manner to that described in section The bar chart in Figure 8.18 illustrates a selection of lingual parameters that represent the closed phase of the plosive /t/, as uttered by four normal subjects AM DD NM TT Mean % contact Alveolar Contact Palatal Contact Velar Contact Lingual Weight Figure 8.18: Lingual parameters representing the closed phase of the plosive /t/, as uttered by four normal subjects. As can be seen, there is a significant amount of inter-subject variability for each of the parameters displayed. For example, the amount of alveolar contact ranges considerably from 57% to 95% with a mean of 77%. However, the relative amounts of contact in each lingual zone are consistent for every subject, and agree with theoretical expectation, i.e. greatest contact in the alveolar region and reduced contact in both the palatal and velar regions. To illustrate intra-subject variability, the bar chart shown in Figure 8.19 represents the same lingual parameters for the plosive /t/, as uttered by a normal subject. The data were gathered from five tests recorded over a two-month period. 187
198 Chapter 8: Results and Analysis Test 01 Test 02 Test 03 Test 04 Test 05 Mean 60 % Contact Alveolar Contact Palatal Contact Velar Contact Lingual Weight Figure 8.19: Lingual parameters representing the closed phase of the plosive /t/, as uttered by a normal subject. As can be seen, the amount of intra-subject variability is minimal for each of the displayed parameters. In this case, the amount of alveolar contact ranges from just 64% to 76% with a mean of 71%. Again, the amount of contact in each lingual zone is consistent for every recording and agrees with the theoretical expectation. Appendix C provides additional bar charts illustrating the amount of alveolar, palatal, velar and overall contact for the sounds /s/, /k/, // and /t/. 188
199 Chapter 8: Results and Analysis A Single Electropalatography Case Study Spatial distortions frequently occur during abnormal fricative production, particularly in young children with functional articulation disorders (Gibbon and Hardcastle, 1987). EPG has been found to be especially valuable in the description of lateralised fricatives. These are a common type of distortion, and may present the clinician with a particular problem because they do not always resolve either spontaneously or as a result of therapy. These distortions are often referred to as lateral /s/, lateral lisp, lateral misarticulations or lateralised articulations (Hardcastle and Gibbon, 1997). This case study examines the speech production of a young boy with an articulatory disorder resulting in lateral misarticulations Case History A.S., a 9-year-old boy, had been known to his local speech and language therapy department since the age of 3, and had attended several blocks of therapy for work on his articulation. In the past he had suffered from fluctuating hearing loss, a mildly delayed speech sound system, lateral misarticulations, and frequent omission of final obstruents. Therapy and maturation resolved all of his speech difficulties except for the lateralisation of /s,, /, which persisted despite all efforts. To resolve this difficulty A.S. was offered a course of electropalatography therapy in October Electropalatography Therapy Regime On initial assessment of A.S. s /s/ there was little evidence of an anterior groove, and there was minimal contact in the first alveolar row. Contact in the palatal and velar regions was found to be abnormally high (refer to Figure 8.20). Therefore, the general aim of therapy was to reduce the amount of contact across the palate and concentrate awareness of the tongue-tip on the alveolar ridge patterns. This was combined with auditory discrimination, contrasting his own fricative production with that of his therapist. By using Linguagraph as a biofeedback tool with visual targeting, A.S. was able to observe the target and compare it with his own contact pattern. This enabled him to visualise how far he was from achieving correct tongue placement, and thus to determine what changes were required. Gradually, as he became more practised at achieving correct tongue placement, the visual image was withdrawn. Despite his articulatory difficulties, A.S. s 189
200 Chapter 8: Results and Analysis interest and motivation were maintained throughout each hour-long session. As can be seen from Figure 8.20, the improvement in contact patterns for /s/ is quite dramatic. Figure 8.20: Contact patterns for /s/ illustrating A.S. s progress throughout therapy. The therapist s target pattern is also shown for comparison Outcome At approximately mid therapy, A.S. had achieved an increase in alveolar contact with evidence of an anterior grove. Both palatal and velar contact in the medial region had also reduced. Finally, at the end of therapy, A.S. consistently produced a clear anterior grove configuration. Both palatal and velar contact had also subsided, yet a complete seal along both lateral margins was maintained. Perceptually, the /s/ sound produced by this tonguepalate configuration was considered an improvement by both his parents and therapist. In terms of alveolar, palatal, velar and overall contact, the bar chart illustrated in Figure 8.21 summaries A.S. s progress throughout his eight weeks of therapy. For comparison, the equivalent tongue-palate contact produced by his therapist is also shown. 190
201 Chapter 8: Results and Analysis Initial Week 3 Week 6 Week 8 Therapist % Contact Alveolar Contact Palatal Contact Velar Contact Lingual Weight Figure 8.21: Lingual parameters for a lateral /s/, which reflect A.S. s progress throughout eight weeks of therapy. Due to his initial lack of contact in the first alveolar row (refer to Figure 8.20), the above bar chart clearly reveals the increase in alveolar contact as A.S. learned to move his tongue-tip forward. Although the amount of contact in this region gradually extends above the therapist s target, the difference recorded during the final session is relatively small. The gradual reduction towards a more normal level of palatal, velar and overall contact is also apparent. A similar chart reflecting A.S. s progress for the sound // is given in appendix D Conclusion Although at the end of therapy A.S. still produced the sounds /s,, / with a lateralised quality, they did appear to be less pronounced. In discussion with A.S. and his parents, it was agreed that he was reasonably satisfied with his speech and able to communicate effectively. A.S. now participates in drama and music groups without being inhibited by his speech. Following this latest block of therapy he was finally discharged from the speech and language therapy clinic. 191
202 Chapter 8: Results and Analysis The data gathered during this case study suggests that electropalatography is an extremely useful additional tool when used in conjunction with conventional therapy techniques. Electropalatography allows objective assessment that enables appropriate targeting of therapy. It provides visual feedback, which assists in therapy, and can be extremely motivating for both patient and therapist. The data also provides an objective measure of outcome, which is an increasingly important consideration for the therapist. Finally, the additional multiparameter data were found particularly useful when identifying the abnormal tongue-palate configurations encountered in this case study. 192
203 CHAPTER 9 CONCLUSIONS AND FURTHER WORK Speech is the result of a highly complex and versatile system of co-ordinated muscular movements, and although perceptual assessments contribute valuable information to the process of diagnosing speech disorders, instrumental observation and measurement offer significant advantages. Increasingly, clinicians are beginning to appreciate the considerable benefits of instrumental analysis, which provides quantitative, objective data on a wide range of different speech parameters. In addition, such measures are becoming increasingly important as the need to prove efficacy grows. Although current instruments are extremely useful, giving excellent measures of individual articulatory function, few are able to measure the co-ordination of the main articulators. The instrumentation described in this thesis, namely SNORS+, allows the simultaneous measurement of five key speech parameters: Speech outcome. Respiration. Larynx excitation. Velopharyngeal closure. Tongue-palate contact. The development of SNORS+ has given clinicians the unique ability to assess the contributory and co-ordinated effects of the main articulators on speech production. As a result, the system has proved to be extremely valuable in the assessment and treatment of various speech disorders. 9.1 Clinical Evaluation Clinical opinion on the usefulness of SNORS+ has been evaluated in three ways: From a questionnaire distributed to clinicians in five different countries: France, Greece, Holland, Sweden and the United Kingdom. From demonstrations, where SNORS+ has been presented to large numbers of speech and language therapists, specialising in different client groups. 193
204 Chapter 9: Conclusions and Further Work Feedback from over eleven establishments where SNORS+ is now in permanent clinical use. The response has been extremely positive and enthusiastic. When consulted, the majority of clinicians felt that the parameters measured by SNORS+ were the most useful. The inclusion of a direct video input has also been well received, not only for imaging techniques such videofluoroscopy and nasendoscopy, but also for monitoring lips, jaw, posture and facial grimaces during speech. Although one or two people initially questioned the clinical usefulness of multiparameter measurement, the vast majority of clinicians seem very much in favour of SNORS+. They commented on problem patients, where conventional therapy had failed to identify the exact nature of the condition. Most clinicians felt that large numbers of patients on their caseloads could potentially benefit from SNORS+. They felt that multiparameter assessment and biofeedback could save clinician s time, improve targeting of treatment, improve therapy and could well improve outcome. In addition to increased diagnosis and treatment proficiency, SNORS+ offers several other major advantages over standalone instrumentation: The need to learn a variety of individual systems is removed, significantly reducing training overheads. A modular design allows the system to be tailored to suit individual requirements. Data from the various modules share the same file format, thus offering 100% compatibility. Archiving time is significantly reduced because of single media data storage. Many clinicians have also commented on the user interface, which they found very intuitive and easy to use. Even those with relatively little computer experience have encountered few difficulties when using SNORS Clinical Measurements The use of multiparameter data is potentially very effective in the assessment and treatment of speech disorders involving several articulators. It also provides invaluable articulatory clues when analysing the abnormal data produced by individual articulators. 194
205 Chapter 9: Conclusions and Further Work The following sections highlight some of the benefits encountered with multiparameter assessment Relating Speech Mechanism to Outcome A selection of the analysis displays presented in the results chapter revealed the relationship between the speech mechanism, as depicted in the Scope and EPG windows, and the actual speech outcome, as illustrated in the spectrogram window. The speech instrumentation survey discussed in chapter 3 revealed no other system capable of this versatility. The measurement of outcome, as depicted by the spectrogram, has proved invaluable for revealing subtle acoustic changes produced by certain articulatory configurations. These include the low-level sounds generated by unvoiced fricatives, affricates and plosives. The multiparameter data were also found extremely useful when identifying the various vocal tract gestures that produce the same acoustic result. In addition, the multiparameter data can significantly ease spectrogram interpretation, since its complex acoustic information can be clearly attributed to the individual articulatory functions, such as airflow, voicing and lingual configurations Assessment of Velopharyngeal Incompetence Aerodynamic and acoustic nasalance provide an extremely useful measure of velopharyngeal closure in both normal and hypernasal speech. To obtain normative data on these, and other parameters, a small trial was conducted on 40 subjects considered by the author to exhibit normal speech. During this preliminary investigation, multiparameter data were collected from each subject as they uttered a series of nasalised and non-nasal words. Due to the low uniform values of both aerodynamic and acoustic nasalance, the results suggested that purely oral words should be used for the quantitative analysis of excessive nasal air emission. In particular, the aerodynamic nasalance values associated with the word cheese were consistently low amongst individuals. In contrast, however, this word consistently produced the highest acoustic nasalance for the same individuals. This conflict can be linked to the intense oscillatory oral airflow associated with the sustained vowel sound. These oscillations cause a flexing of the velum, which in turn generate an audible acoustic resonance in the nasal cavity. The resultant increase in acoustic nasalance can be misleading, especially when a good velopharyngeal seal has been achieved. This observation highlights the benefits of simultaneous aerodynamic and acoustic measurement in the assessment of velopharyngeal incompetence. However, 195
206 Chapter 9: Conclusions and Further Work further work is needed to study the relationship between aerodynamic and acoustic nasalance, and to investigate the effects of phonetic context on these two important parameters. According to clinicians working with cleft palate subjects, multiparameter assessment of the velopharyngeal mechanism could also prove extremely useful. When lateral videofluoroscopy images are combined with multiparameter data (airflow in particular), the resultant displays generate an enhanced diagnostic tool for assessment of the velopharyngeal mechanism. It has been shown by example that this technique has the potential to increase confidence in clinical assessment and thus reduce the number of errors in diagnosis, which often result in further invasive assessment. However, to establish the reliability of this technique, a more detailed and rigorous analysis of the data is required. It is also hoped to combine other imaging techniques with the multiparameter data, such as nasendoscopy. Finally, it has been suggested that the level of intelligibility attained by cleft palate speakers is determined more by the manner in which the various articulatory structures of the vocal tract react to the velopharyngeal incompetence, rather than by the specific degree of incompetence or error present (Warren, 1986). Increased respiratory effort, alteration in tongue position, interruption of the air stream by the vocal cords and nasal grimace have all been recognised as compensatory mechanisms during the speech production of subjects with velopharyngeal incompetence (Books et al., 1966; Trost, 1981; Warren, 1986). The multiparameter acquisition supported by SNORS+ enables the clinician to assess all of these compensatory mechanisms quickly and reliably Identification of Tongue-Palate Configurations Although the identification of normal EPG tongue-palate configurations is a relatively simple task, the accompanying multiparameter data allows the user to locate each pattern with increased speed, accuracy and confidence. Also, the supplementary data is often useful when identifying the duration of certain lingual sequences that may otherwise be difficult to measure. For example, the palatal groove configuration for // is not clearly defined, but its duration may be revealed by the accompanying oral airflow. In addition, the multiparameter data has been found extremely useful when identifying abnormal tongue-palate configurations (and their duration), which frequently bear little resemblance to normal contact patterns. For example, a lateral /s/ may produce an abnormal amount of 196
207 Chapter 9: Conclusions and Further Work contact in all palatal zones, making it extremely difficult to identify with EPG data alone. However, with the aid of airflow and/or acoustic information, identification becomes a relatively simple task. 9.3 Further Work The discussion on further work has been divided into two distinct sections. The first section introduces two novel clinical applications that feature SNORS+. The second outlines a number of proposed hardware and software enhancements Novel Clinical Applications Since the successful completion of SNORS+, preliminary work has been undertaken on two novel clinical applications: the assessment of swallowing disorders, and the assessment of suck swallow breathe synchrony Assessment of Swallowing Disorders In collaboration with the Speech & Language Therapy department at East Kent Hospitals NHS Trust, SNORS+ has been combined with videofluoroscopy for the assessment of swallowing disorders. Proper diagnosis of swallowing disorders (dysphagia) presents a continuing problem in the rehabilitation of patients with stoke, head injury and neurological disease (Reddy et al., 1994). Aspiration, which is defined as the penetration of food, fluid or oral secretions below the level of the vocal folds, is a major problem associated with dysphagia. When undetected, aspiration can be potentially life threatening, and therefore early diagnosis is essential. The speech articulators (i.e. lips, jaw, tongue, velum and larynx) all play a vital role in swallowing. A normal swallow is divided into three distinct phases: the oral, pharyngeal and oesophageal. The oral phase, under voluntary control, involves the preparation of the bolus utilising the lips, mandible/maxilla, teeth and tongue. The bolus is then pushed back in the oral cavity and the swallow triggered as it passes the anterior faucial arches. The ensuing pharyngeal phase is involuntary and requires laryngeal elevation and closure, velopharyngeal closure and the propulsion of the bolus using the tongue base and pharyngeal constrictors. The cricopharyngeal sphincter opens to allow the third phase: the bolus passes into the oesophagus where peristalsis moves it to the stomach. This complex 197
208 Chapter 9: Conclusions and Further Work co-ordinated sequence of events prevents the aspiration of food, fluid and saliva into the airway. The current clinical methods of diagnosis are qualitative, based on clinical (beside) evaluation and videofluoroscopy examination (VFE). The videofluorographic examination involves ionising radiation and therefore has limitations as a diagnostic tool. Other techniques that have been used in the assessment of swallowing include: Fibre-endoscopic nasendoscopy (FEESS) to view the pharynx (Langmore, et al., 1991). Ultrasound to provide information on tongue, pharyngeal and laryngeal movement (Sonies, 1991). Electrolaryngography to measure activity of the vocal folds (Firmin, et al., 1997; Perlman and Grayhack, 1991; Schultz, et al., 1994). Electropalatography to assess tongue movement (Chi-Fishman and Stone, 1996). Anemometry to measure changes in nasal and/or oral airflow (Rogers, et al., 1993). However, there continues to be a need for a non-invasive multiparameter assessment technique to identify patients at risk of aspiration, and to aid in the treatment of dysphagia. The assessment of swallowing requires a measure of the interaction and synchronisation between the contributing mechanisms. Hence, the simultaneous multiparameter measurements supported by SNORS+ are extremely well suited to this application. Preliminary investigations using SNORS+ have successfully combined the established technique of videofluoroscopy with acoustic, airflow and laryngeal data during swallowing. However, further work is needed to establish the effectiveness of SNORS+ as a stand-alone system in the assessment of dysphagia, and to determine whether it is possible to enhance the usefulness of videofluoroscopy Assessment of Suck Swallow Breathe Synchrony In Collaboration with the Speech and Language Therapy department at Croydon and Surrey Downs Community Health NHS Trust, a selection of modified toys have been coupled to the SNORS airflow sensors. This combination enables the assessment of suck swallow breathe synchrony in young children. Suck swallow breathe (SSB) synchrony, as the primary oral motor mechanism and the first developmental pattern that requires timing and sequenced movements, is fundamental to 198
209 Chapter 9: Conclusions and Further Work the development of oral sensorimotor and speech production skills in speech disordered children (Oetter, Richter and Frick, 1995). Preliminary research has shown that facilitating SSB synchrony by providing children with increasingly complex oral play experiences can produce a marked change in oral motor and speech function (Pate, 2000). During preliminary investigations using SNORS+ to assess SSB synchrony in young children, the facemask was often poorly tolerated. Therefore, to encourage co-operation, a selection of toys have been modified and coupled directly to the airflow sensors. These toys were selected to naturally promote the suck/blow function during play. This combination enables quantitative measurements of breath control such as suck/blow effort, consistency and duration. In addition, simultaneous video recordings allow graphical features to be correlated with visual events. A small trial, which aims to objectively quantify SSB synchrony in speech disordered subjects, has been planned for the near future. This cross-sectional study hopes to reveal links between sensorimotor deficits in speech disordered year olds, and their oral motor and speech output skills Hardware and Software Enhancements Although SNORS+ is now a fully integrated multiparameter speech workstation, additional features are proposed which may further enhance the system. These enhancements, which comprise of both hardware and software, are described in the following sections Upgrade of Data Acquisition Card The computer uses a DAS-1202 data acquisition card to sample the various analogue and digital signals produced by SNORS, Linguagraph and Laryngograph. Although this card provides the speed and flexibility required for the project, it utilises the ISA bus interface. Increasingly, this type of interface is being superseded by the PCI format that offers greater bus transfer rates. In addition, many of the new PCI data acquisition cards support plug and play, a specification that enables automatic configuration upon installation. An upgrade to this type of data acquisition card would enable SNORS+ to take advantage of the above features, and also allow compatibility with the Windows NT operating system (currently not supported by the DAS-1202). 199
210 Chapter 9: Conclusions and Further Work Additional Parameters To take full advantage of the auxiliary channel, hardware modules measuring the following parameters could be constructed: Intra-oral pressure. Lip movement. Jaw movement. The auxiliary channel is fully synchronised and can accommodate analogue signals in the range of ±2.5 V Software Adaptations for Young Children Several speech and language therapists have suggested the use of visual images to aid in the assessment of young children. For example, a speech assessment protocol for speech disorders associated with cleft palate and/or velopharyngeal dysfunction known as GOS.SP.ASS. (Great Ormond Street Speech Assessment) contains a set of picture stimulae frequently used by clinicians. During assessment, an image is shown to the child whilst the clinician recites a simple phrase, the child is then asked to repeat that phase. Each phrase relates to a specific picture. In the assessment of the /s/ sound, for example, a picture of a red bus accompanies the phrase I saw Sam sitting on a bus. This type of approach could be implemented in SNORS+ by simply replacing the word list with a set of bitmap images. It has also been suggested that animated displays may encourage young children to participate in therapy sessions. For example, the Bar display could be replaced by a balloon image that drifts up or down in response to the selected parameter. As a further incentive, audio/visual rewards could be triggered when a child achieves a certain target Linear Predictive Coding Although FFT s reveal the fine spectral detail associated with the fundamental frequency and its harmonics, the formant frequencies (vocal tract resonances) can often be obscured. However, a technique known as linear predictive coding (LPC) highlights formant structure by removing the fundamental and harmonic frequencies. Linear predictive coding is a procedure that derives a series of coefficients that describe the time-varying waveform. These coefficients, if properly calculated, correspond to the formant 200
211 Chapter 9: Conclusions and Further Work frequencies. The inclusion of an LPC window, for both assessment and therapy, would greatly enhance the acoustic analysis capabilities of SNORS+. There are, however, some inherent disadvantages in the LPC procedure. Primary among these is that LPC analysis is based on the assumption that there are no side-branch resonators in the vocal tract. Only resonant frequencies are assumed with no provision for antiresonances or zeros in the signal. Antiresonances, most commonly associated with nasal coupling in speech, interact with resonances to affect the spectral output. When antiresonances are introduced, errors may be made in the LPC estimates of formant frequency and bandwidth. To overcome this limitation it may be beneficial to use the FFT as an accurate reference, and simply overlay the LPC to highlight the formant structure. In this case, the ability to switch between waveforms would also be desirable Video Compression Algorithms At present the video acquisition module is very inefficient in terms of memory, requiring a 55-Megabyte buffer for each 20-second sequence. Incorporating video compression algorithms in software could significantly reduce this memory requirement. Unfortunately, many of these algorithms remove the subtle image changes between consecutive frames. The resulting distortion is acceptable for most applications, but the subtle changes considered essential in videofluoroscopic interpretation could be lost. However, some compression algorithms use a loss-less technique, making them suitable for this application. The inclusion of such an algorithm would permit the following: Reduced memory overheads. Longer video sequences. Increased image resolution. The ability to record colour images. In terms of processing time, it is not known whether these compression algorithms will support the current acquisition rate of 25 frames a second. However, as computer performance increases, this consideration will diminish Annotative Test Displays The ability to annotate the main Test Scope window would enable clinicians to label the various waveforms at points of interest. For example, the different speech sounds could be 201
212 Chapter 9: Conclusions and Further Work accurately transcribed using the International Phonetic Alphabet (IPA), or attention could be drawn to abnormal features such as excessive nasal air emission. For a permanent record, these annotations could be saved to disk along with the test file. The facility to switch annotation on or off would also be desirable. 202
213 BIBLIOGRAPHY Abberton, E. R. M. and Fourcin, A. (1997): Electrolaryngography. Chapter 5 in Code, C. and Ball, M. (Ed): Instrumental Clinical Phonetics. Whurr Publishers. Abberton, E. R. M., Howard, D. M. and Fourcin, A. J. (1989): Laryngographic assessment of normal voice: a tutorial. Clinical Linguistics and Phonetics, 3, Abbs, J. H. and De Paul, R. (1989): Assessment of Dysarthria: The Critical Pre-requisition to Treatment in Disorders of Communication. Taylor and Francis Publishers. Abercrombie, D. (1957): Direct palatography. Zeitschrift für Phonetik, 10, Adams, S. G. (1997): Hypokinetic Dysarthria in Parkinson s Disease. Chapter 12 in McNeil, M. R. (Ed): Clinical Management of Sensorimotor Speech Disorders. Thieme Publishers. American Society of Plastic Surgeons Official Web document: org/ Anthony, J. and Hewlett, N. (1984): Electrolaryngography. Chapter 4 in Code, C. and Ball, M. (Ed): Experimental Clinical Phonetics. Croom Helm Publishers. Baken, R. J. (1987): Clinical Measurement of Speech and Voice. College-Hill Press. Björk, L. (1961): Velopharyngeal function in connected speech. Acta Radiol, supplement 202. Borden, G. J. and Harris, K. S. (1984): Speech Science Primer, 2 nd Edition. Williams and Wilkins Publishers. Brooks, A. R., Shelton, R. L. and Youngstrom, K. A. (1966): Tongue-palate contact in persons with palate defects. Journal of Speech and Hearing Disorders, 31, Buck, M. W. and Harrington, R. (1949): Organised speech therapy for cleft palate rehabilitation. Journal of Speech and Hearing Disorders, 14,
214 Bibliography Cannito, M. P. and Marquardt, T. P. (1997): Ataxic Dysarthria. Chapter 10 in McNeil, M. R. (Ed): Clinical Management of Sensorimotor Speech Disorders. Thieme Publishers. Catford, J. C. (1977): Fundamental Problems in Phonetics. Edinburgh. Chi-Fishman, G. and Stone, M. (1996): A new application for electropalatography: swallowing. Dysphagia, 11, Council Directive 93/42/EEC on Medical devices (1993): Annex I: Essential Requirements. Medical Devices Directive. Council Directive 93/42/EEC on Medical devices (1993): Annex VII: EC Declaration of Conformity. Medical Devices Directive. Crystal, D. (1989): Introduction to speech and language pathology. Whurr Publishers. Daniel, B. and Guitar, B. (1978): EMG Feedback and Recovery of Facial and Speech Gestures Following Neural Anastomosis. Journal of Speech and Hearing Disorders, 43, Darley, F. L. (1982): Aphasia. Saunders Publishers. Darley, F. L., Aronson, A. E. and Brown, J. R. (1975): Motor Speech Disorders. WB Saunders Publishers. Denes, P. B. and Pinson, E. P. (1968): The Speech Chain, 5 th Edition. Bell Telephone Laboratories. Ellis, R. E., Flack, F. C., Curle, H. J. and Selly, W. G. (1978): A system for the assessment of nasal airflow during speech. British Journal of Disorders of Communication, 13, EN (1990): Medical electrical equipment. General requirements for safety. British Standards Institute. Enderby, P. M. and Emerson, J. (1996): Speech and Language Therapy: Does it work? British Medical Journal, 312,
215 Bibliography Fabre, P. (1957): Un Procédé Electrique Percutané d Inscription de l Accolement Glottique au Cours de la Phonation: Glottographie de Haute Fréquence. Premiers Résultats. Bull. Acad. Nat. Méd., 141, Firmin, H., Reilly, S. and Fourcin, A. (1997): Non-invasive monitoring of reflexive swallowing. Speech Hearing and Language: work in progress (UCL), 10, Fletcher, S. (1988): Speech production following partial glossectomy. Journal of Speech and Hearing Disorders, 53, Fletcher, S. G. (1970): Theory and Instrumentation for Quantitative Measurement of Nasality. Cleft Palate Journal, Flexner, S. B. (1987): The Random House dictionary of the English language, 2 nd Edition. Random House Publishers. Forrest, K. and Weismer, G. (1997): Acoustic Analysis of Dysarthric Speech. Chapter 4 in McNeil, M. R. (Ed): Clinical Management of Sensorimotor Speech Disorders. Thieme Publishers. Fourcin, A. J. (1978): Acoustic patterns and speech acquisition. In Waterson, N. and Snow, C. (Ed): The Development of Communication. John Wiley Publishers. Fromkin, V. and Rodman, R. (1993): An Introduction to Language, 5 th Edition. Harcourt Brace Jovanovich College Publishers. Fry, D. B. (1994): The Physics of Speech. Cambridge University Press. Gibbon, F. and Hardcastle, W. (1987): Articulatory description and treatment of lateral /s/ using electropalatography: a case study. British Journal of Disorders of Communication, 22, Giegerich, H. J. (1992): English Phonology: An Introduction, 1 st Edition. Cambridge University Press. Haapanen, M. L. (1992): Factors Affecting Speech in Patients with Isolated Cleft Palate: A clinical and instrumental study. Scandinavian Journal of Plastic and Reconstructive Surgery and Hand Surgery. Supplement
216 Bibliography Hanson, W. and Metter, E. (1980): DAF As Instrumental Treatment for Dysarthria in Progressive Supranuclear Palsy: A Case Report. Journal of Speech and Hearing Disorders, 45, Hardcastle, W. J. and Gibbon, F. (1997): Electropalatography and its Clinical Applications. Chapter 6 in Ball, M. and Code, C. (Ed): Instrumental Clinical Phonetics. Whurr Publishers. Hardcastle, W. J., Morgan-Barry, R. A. and Clark, C. J. (1985): Articulatory and Voicing Characteristics of Adult Dysarthric and Verbal Dyspraxic Speakers: An Instrumental Study. British Journal of Disorders of Communication, 20, Hegde, M. N. (1985): Treatment Procedures in Communicative Disorders. College-Hill Press. Hegde, M. N. (1996): Pocket Guide to Treatment in Speech-Language Pathology. Singular Publishing Group, Inc. Horowitz, P. and Hill, W. (1989): The Art of Electronics, 2 nd University Press. Edition. Cambridge Kent, R. D. and Rosenbek, J. C. (1983): Acoustic patterns of apraxia of speech. Journal of speech and hearing research, 26, Langmore, S. E., Schatz, K. and Olsen, N. (1991): Endoscopic and videofluoroscopic evaluations of swallowing and aspiration. Ann Otol Rhinol Laryngol, 100, Lenneberg, E. H. (1967): Biological Foundations of Language. John Wiley and Sons Publishers. Logemann, J. A. (1985): Assessment and Treatment of Articulatory Disorders in Adults. Chapter 1 in Costello, J. (Ed): Speech Disorders in Adults. College Hill press. Love, R. J. and Webb, W. G. (1996): Neurology for the Speech-Language Pathologist, 3 rd Edition. Butterworth Heinemann Publishers. 206
217 Bibliography Main, A. (1998): The use of Electropalatography in the Treatment of Acquired Dysarthria. Masters Thesis, University of Kent. Main, A., Kelly, S. W. and Manley, G. (1999): Instrumental assessment and treatment of hypernasality, following maxillofacial surgery, using SNORS: A single case study. Journal of Language and Communication Disorders, 2, Main, A., Kelly, S. W. and Manley, G. (1997): Assessment of velopharyngeal closure using SNORS. An internal report. University of Kent at Canterbury. McAllister, A. (1998): Personal Communication with Danderyd Hospital, Sweden. McGlashan, J. A. (1998): The LxStrobe. Web Document: McLean, C. C. (1997): Instrumentation for the Multiparameter Assessment of Speech Defects. Ph.D. Thesis, University of Kent. McLean, C. C., Kelly, S. W. and Manley, M. C. G. (1997): An instrument for the noninvasive objective assessment of velar function during speech. Journal of Medical Engineering and Physics, 19, McWilliams, B. J. (1966): Speech and language problems in children with cleft palate. J Am Med Wom Assoc, 21, Minifie, F. D., Hixon, T. J. and Williams, F. (1973): Perspectives in Normal Aspects of Speech, Hearing and Language. Prentice-Hall Publishers. Nesell, R. and Cleeland, C. (1973): Modification of Lip Hypertonia in Dysarthria Using EMG Feedback. Journal of Speech and Hearing Disorders, 38, Netsell, R. (1983): Speech Motor Control: Theoretical Issues with Clinical Impact. Chapter 1 in Berry, M. (Ed): Clinical Dysarthria. College Hill Publishers. Netsell, R. and Daniel, B. (1979): Dysarthria in Adults: Physiologic Approach to Rehabilitation. Archives of Physical Medical Rehabilitation, 60, Netsell, R. and Rosenbek, J. (1986): Treating the Dysarthrias. Chapter 6 in Netsell, R. (Ed): A Neurobiologic View of Speech Production and the Dysarthrias. College- Hill Press. 207
218 Bibliography O Connor, J. D. (1991): Phonetics. Penguin Publishers. Oetter, P., Richter, E. and Frick, S. (1995): M.O.R.E. Integrating the Mouth with Sensory and Postural Functions. PDP Press Inc. Pate, O. (2000): Suck Swallow Breathe Synchrony: a means for assessing and managing sensorimotor difficulties in speech disordered children. An internal report. Croydon and Surrey Downs Community Health NHS Trust. Perlman, A. L. and Grayhack, J. P. (1991): Use of the electroglottograph for measurement of temporal aspects of the swallow: preliminary observations. Dysphagia, 6, Peterson, H. A. and Marquardt, T. P. (1981): Appraisal and Diagnosis of Speech and Language Disorders. Prentice Hall Publishers. Petzold, C. and Yao, P. (1996): Programming Windows 95. Microsoft Press. Plant, R. L. (1999): The Larynx, Basic Anatomy. Web Document: Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P. (1992): Numerical Recipes in C: The Art of Scientific Computing, 2 nd Edition. Cambridge University Press. Quinn, F. B. (1998): Cleft Lip and Palate. Web Document: clefpalt.htm. Rabiner, L. and Juang, B. H (1993): Fundamentals of Speech Recognition. Prentice Hall Publishers. Reddy, N. P., Thomas, R., Enrique, P. C. and Casterline, J. (1994): Classification of dysphagic patients using biomechanical measurements. Journal of Rehabilitation Research and Development, 31(4), Roach, P. (1991): English Phonetics and Phonology, 2 nd Edition. Cambridge University Press. Robertson, S. J. and Thomson, F. (1993): Working with Dysarthrics. Winslow Press. 208
219 Bibliography Rogers, B., Msall, M. and Shucard, D. (1993): Hypoxemia during oral feedings in adults with dysphagia and severe neurological disabilities. Dysphagia, 8(1), Schultz, J. L., Perlman, A. L. and VanDaele, D. J. (1994): Laryngeal movement, oropharyngeal pressure, and submental muscle contraction during swallowing. Arch phys Med Rehabil, 75(2), Sharp, P. D., Kelly, S. W., Main, A. and Manley, G. (1999): An Instrument for the Multiparameter Assessment of Speech. Journal of Medical Engineering and Physics, 21, Sonies, B. C. (1991): Instrumental procedures for diagnosis. Seminars in Speech and Language, 12(3). Sonies, B. C. (1991): Ultrasound imaging and swallowing. In Jones, B. and Donner, M. W. (Eds) Normal and abnormal swallowing: Imaging in diagnosis and therapy. New York. Springer-Verlag. Sonies, B. C. and Stone, M. (1997): Speech Imaging. Chapter 8 in McNeil, M. R. (Ed): Clinical Management of Sensorimotor Speech Disorders. Thieme Publishers. Thompson-Ward, E. C. and Murdoch, B. E. (1998): Instrumental Assessment of the Speech Mechanism. Chapter 3 in Murdoch, B. E. (Ed): Dysarthria: A Physiological Approach to Assessment and Treatment. Stanley Thornes Publishers. Thompson-Ward, E. C. and Theodoros D. G. (1998): Acoustic analysis of dysarthric speech. Chapter 4 in Murdoch, B. E. (Ed): Dysarthria: A Physiological Approach to Assessment and Treatment. Stanley Thornes Publishers. Tortora, G. J. and Grabowski, S. R. (1993): Principles of Anatomy and Physiology (7 th ed.). HarperCollins College Publishers, New York. Trost, J. E. (1981): Articulatory additions to the classical description of the speech of persons with cleft palate. Cleft Palate Journal, 18, Warren, D. W. (1986): Compensatory speech behaviours in cleft palate: a regulation/control phenomenon. Cleft Palate Journal, 23,
220 Bibliography Warren, D. W., Rochet, A. P. and Hinton, V. A. (1997): Aerodynamics. Chapter 5 in McNeil, M. R. (Ed): Clinical Management of Sensorimotor Speech Disorders. Thieme Publishers. Weismer, G. (1984): Acoustic descriptions of dysarthric speech: perceptual correlates and physiological inferences. Seminars in Speech and Language, 5, Yorkston, K. M. and Beukelman, D. R. (1980): A Clinician Judged Technique for Quantifying Dysarthric Speech Based on Single-Word Intelligibility. Journal of Communication Disorders, 13, Zajac, D. J. (1995): Laryngeal airway resistance in children with CP and adequate VP function. Cleft Palate Craniofacial Journal, 33, Zajac, D. J. and Yates, C. C. (1997): Electrolaryngography. Chapter 4 in Code, C. and Ball, M. (Ed): Instrumental Clinical Phonetics. Whurr Publishers. 210
221 APPENDICES
222 Appendix A: SNORS+ Circuit Diagrams 212
223 Appendix A: SNORS+ Circuit Diagrams 213
224 Appendix A: SNORS+ Circuit Diagrams 214
225 Appendix A: SNORS+ Circuit Diagrams 215
226 Appendix A: SNORS+ Circuit Diagrams 216
227 Appendix A: SNORS+ Circuit Diagrams 217
228 Appendix A: SNORS+ Circuit Diagrams 218
229 Appendix A: SNORS+ Circuit Diagrams 219
230 Appendix A: SNORS+ Circuit Diagrams 220
231 Appendix A: SNORS+ Circuit Diagrams 221
232 Appendix B: Normative Speech Parameters Parameter Begin Type Fight Seat Cheese Shoot Smoke King Missing End Mean Word Duration Std Dev Acoustic Mean Nasalance Std Dev Oral Airflow Nasal Airflow Respiration Mean Mean Mean Std Dev Std Dev Std Dev Aerodynamic Mean Ratio Std Dev Aerodynamic Mean Nasalance Std Dev Fundamental Mean Frequency Std Dev Closed Mean Quotient Std Dev Shimmer Mean Factor Std Dev Jitter Factor Mean Std Dev Table Summarising intra-subject variability, in terms of mean and standard deviation, for a 23-year-old female exhibiting normal speech Word Duration (Seconds) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Plot of mean word duration, and standard deviation, for 40 subjects exhibiting normal speech 222
233 Appendix B: Normative Speech Parameters Aerodynamic Ratio (%) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Plot of mean aerodynamic ratio, and standard deviation, for 40 subjects exhibiting normal speech Closed Quotient (%) Begin Type Fight Seat Cheese Shoot Smoke King Missing End Plot of mean closed quotient, and standard deviation, for 40 subjects exhibiting normal speech. 223
234 Appendix C: Normative Lingual Parameters Normal /s/ AM DD NM TT Mean % Contact Alveolar Contact Palatal Contact Velar Contact Lingual Weight Lingual parameters representing the fricative /s/, as uttered by four normal subjects. Normal /k/ AM DD NM TT Mean 70 % co nta ct Alveolar Contact Palatal Contact Velar Contact Lingual Weight Lingual parameters representing the closed phase of the velar plosive /k/, as uttered by four normal subjects. 224
235 Appendix C: Normative Lingual Parameters Normal /sh/ AM DD NM TT Mean % Contact Alveolar Contact Palatal Contact Velar Contact Lingual Weight Lingual parameters representing the fricative //, as uttered by four normal subjects. Normal /ch/ AM DD NM TT Mean % Contact Alveolar Contact Palatal Contact Velar Contact Lingual Weight Lingual parameters representing the affricate / t/, as uttered by four normal subjects. 225
236 Appendix D: Abnormal Lingual Parameters /sh/ Initial Week 3 Week 6 Week 8 Therapist % Contact Alveolar Contact Palatal Contact Velar Contact Lingual Weight Lingual parameters for a lateral //, which reflect a patient s progress throughout eight weeks of EPG therapy. /ch/ Initial Week 3 Week 6 Week 8 Therapist Alveolar Palatal Velar Weight Lingual parameters for a lateral //, which reflect a patient s progress throughout eight weeks of EPG therapy. 226
237 Appendix D: Abnormal Lingual Parameters 227
Speech Therapy for Cleft Palate or Velopharyngeal Dysfunction (VPD) Indications for Speech Therapy
Speech Therapy for Cleft Palate or Velopharyngeal Dysfunction (VPD), CCC-SLP Cincinnati Children s Hospital Medical Center Children with a history of cleft palate or submucous cleft are at risk for resonance
Neurogenic Disorders of Speech in Children and Adults
Neurogenic Disorders of Speech in Children and Adults Complexity of Speech Speech is one of the most complex activities regulated by the nervous system It involves the coordinated contraction of a large
English Phonetics: Consonants (i)
1 English Phonetics: Consonants (i) 1.1 Airstream and Articulation Speech sounds are made by modifying an airstream. The airstream we will be concerned with in this book involves the passage of air from
Who am I? 5/20/2014. Name: Michaela A. Medved. Credentials: MA, TSSLD, CCC SLP. Certifications: LSVT
Interventions for Treatment of Respiratory Issues in Rehab Michaela A. Medved, MA, TSSLD, CCC SLP Speech Language Pathologist Director of Patient Care Services, Aspire Center for Health and Wellness Who
Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Introduction. The Articulatory System
Supplementary Readings Supplementary Readings Handouts Online Tutorials The following readings have been posted to the Moodle course site: Contemporary Linguistics: Chapter 2 (pp. 15-33) Handouts for This
www.icommunicatetherapy.com
icommuni cate SPEECH & COMMUNICATION THERAPY Dysarthria and Dysphonia Dysarthria Dysarthria refers to a speech difficulty that may occur following an injury or disease to the brain, cranial nerves or nervous
Glossary of commonly used Speech Therapy/Language terms
Glossary of commonly used Speech Therapy/Language terms (Adapted from Terminology of Communication Disorders, 4 th Edition by Lucille Nicolosi, Elizabeth Harryman and Janet Kresheck) Ankyloglossia limited
4 Phonetics. Speech Organs
4 Phonetics Speech is a very hierarchical and complex physical phenomenon, including issues related to cognition, language, physiology, hearing and acoustics. A research including features of these fields
How can a speech-language pathologist assess velopharyngeal function without instrumentation?
Clinical Skills for Assessing Velopharyngeal Function by John E. Riski, Ph.D. Speech Pathology at Children's at Scottish Rite http://www.choa.org/ourservices/craniofacial/programs/speech/speechpathology4.asp
a guide to understanding moebius syndrome a publication of children s craniofacial association
a guide to understanding moebius syndrome a publication of children s craniofacial association a guide to understanding moebius syndrome this parent s guide to Moebius syndrome is designed to answer questions
North Bergen School District Benchmarks
Grade: 10,11, and 12 Subject: Anatomy and Physiology First Marking Period Define anatomy and physiology, and describe various subspecialties of each discipline. Describe the five basic functions of living
Anatomy and Physiology: Understanding the Importance of CPR
Anatomy and Physiology: Understanding the Importance of CPR Overview This document gives you more information about the body s structure (anatomy) and function (physiology). This information will help
Ph.D in Speech-Language Pathology
UNIT 1 SPEECH LANGUAGE PRODUCTION Physiology of speech production. Physiology of speech (a) Respiration: methods of respiratory analysis (b) Laryngeal function: Laryngeal movements, vocal resonance (c)
Chapter 2 - Anatomy & Physiology of the Respiratory System
Chapter 2 - Anatomy & Physiology of the Respiratory System Written by - AH Kendrick & C Newall 2.1 Introduction 2.2 Gross Anatomy of the Lungs, 2.3 Anatomy of the Thorax, 2.4 Anatomy and Histology of the
Understanding Impaired Speech. Kobi Calev, Morris Alper January 2016 Voiceitt
Understanding Impaired Speech Kobi Calev, Morris Alper January 2016 Voiceitt Our Problem Domain We deal with phonological disorders They may be either - resonance or phonation - physiological or neural
Respiratory System. Chapter 21
Respiratory System Chapter 21 Structural Anatomy Upper respiratory system Lower respiratory system throat windpipe voice box Function of Respiratory System Gas exchange Contains receptors for sense of
Cranial Nerve/Oral Mech Exam: What Every SLP Needs to Know
Cranial Nerve/Oral Mech Exam: What Every SLP Needs to Know Kelly Dailey Hall, Ph.D. CCC/SLP Pediatric Speech & Language Services, Inc. University of North Carolina Greensboro [email protected] What s in
62 Hearing Impaired MI-SG-FLD062-02
62 Hearing Impaired MI-SG-FLD062-02 TABLE OF CONTENTS PART 1: General Information About the MTTC Program and Test Preparation OVERVIEW OF THE TESTING PROGRAM... 1-1 Contact Information Test Development
Guidelines for Medical Necessity Determination for Speech and Language Therapy
Guidelines for Medical Necessity Determination for Speech and Language Therapy These Guidelines for Medical Necessity Determination (Guidelines) identify the clinical information MassHealth needs to determine
MS Learn Online Feature Presentation Speech Disorders in MS Featuring Patricia Bednarik, CCC-SLP, MSCS
Page 1 MS Learn Online Feature Presentation Speech Disorders in MS Featuring, CCC-SLP, MSCS >>Kate Milliken: Hello. I'm Kate Milliken, and welcome to MS Learn Online. There are many symptoms associated
WHAT IS CEREBRAL PALSY?
WHAT IS CEREBRAL PALSY? Cerebral Palsy is a dysfunction in movement resulting from injury to or poor development of the brain prior to birth or in early childhood. Generally speaking, any injury or disease
A. function: supplies body with oxygen and removes carbon dioxide. a. O2 diffuses from air into pulmonary capillary blood
A. function: supplies body with oxygen and removes carbon dioxide 1. ventilation = movement of air into and out of lungs 2. diffusion: B. organization a. O2 diffuses from air into pulmonary capillary blood
2.06 Understand the functions and disorders of the respiratory system
2.06 Understand the functions and disorders of the respiratory system 2.06 Understand the functions and disorders of the respiratory system Essential questions What are the functions of the respiratory
a guide to understanding pierre robin sequence
a guide to understanding pierre robin sequence a publication of children s craniofacial association a guide to understanding pierre robin sequence this parent s guide to Pierre Robin Sequence is designed
CHAPTER 1: THE LUNGS AND RESPIRATORY SYSTEM
CHAPTER 1: THE LUNGS AND RESPIRATORY SYSTEM INTRODUCTION Lung cancer affects a life-sustaining system of the body, the respiratory system. The respiratory system is responsible for one of the essential
Animal Systems: The Musculoskeletal System
Animal Systems: The Musculoskeletal System Tissues, Organs, and Systems of Living Things Cells, Cell Division, and Animal Systems and Plant Systems Cell Specialization Human Systems The Digestive The Circulatory
8/17/11 IMPROVING RESPIRATORY SUPPORT: NONSPEECH TASKS. DYSARTHRIA TREATMENT: PRACTICE GUIDELINES AND OPTIONS PART 1 KSHA September 2011
NONSPEECH TASKS DYSARTHRIA TREATMENT: PRACTICE GUIDELINES AND OPTIONS PART 1 KSHA September 2011 Use these strategies if speaker can t generate enough subglottic air pressure to support phonation These
Developmental Verbal Dyspraxia Nuffield Approach
Developmental Verbal Dyspraxia Nuffield Approach Pam Williams, Consultant Speech & Language Therapist Nuffield Hearing & Speech Centre RNTNE Hospital, London, Uk Outline of session Speech & language difficulties
Oral Motor Exercises for the Treatment of Motor Speech Disorders: Efficacy and Evidence Based Practice Issues
Oral Motor Exercises for the Treatment of Motor Speech Disorders: Efficacy and Evidence Based Practice Issues A literature review based on a tutorial by Heather M. Clark (2003) Presented by Leslie Kubacki
Phonetics Related to Prosthodontics
Middle-East Journal of Scientific Research 12 (1): 31-35, 2012 ISSN 1990-9233 IDOSI Publications, 2012 DOI: 10.5829/idosi.mejsr.2012.12.1.988 Phonetics Related to Prosthodontics 1 2 Abdul-Aziz Abdullah
Speech & Swallowing The ba sic fac t s
Speech & Swallowing The ba sic fac t s Multiple sclerosis If people are asking you to repeat words; if it s getting harder to carry on conversations because your speech is slurred, slow, or quiet; if you
Human Body Vocabulary Words Week 1
Vocabulary Words Week 1 1. arteries Any of the blood vessels that carry blood away from the heart to all parts of the body 2. heart The muscular organ inside the chest that pumps blood through the body
Chetek-Weyerhaeuser High School
Chetek-Weyerhaeuser High School Anatomy and Physiology Units and Anatomy and Physiology A Unit 1 Introduction to Human Anatomy and Physiology (6 days) Essential Question: How do the systems of the human
What is cerebral palsy?
What is cerebral palsy? This booklet will help you to have a better understanding of the physical and medical aspects of cerebral palsy. We hope it will be a source of information to anyone who wishes
SPEECH, SWALLOWING, AND COMMUNICATION IN HD. Cheryl Gidddens, Ph.D. Associate Professor Oklahoma State University cheryl.giddens@okstate.
SPEECH, SWALLOWING, AND COMMUNICATION IN HD Cheryl Gidddens, Ph.D. Associate Professor Oklahoma State University [email protected] The information provided by speakers in workshops, forums, sharing/networking
Cerebral palsy can be classified according to the type of abnormal muscle tone or movement, and the distribution of these motor impairments.
The Face of Cerebral Palsy Segment I Discovering Patterns What is Cerebral Palsy? Cerebral palsy (CP) is an umbrella term for a group of non-progressive but often changing motor impairment syndromes, which
Parts of the Brain. Chapter 1
Chapter 1 Parts of the Brain Living creatures are made up of cells. Groups of cells, similar in appearance and with the same function, form tissue. The brain is a soft mass of supportive tissues and nerve
Bachelors of Science Program in Communication Disorders and Sciences:
Bachelors of Science Program in Communication Disorders and Sciences: Mission: The SIUC CDS program is committed to multiple complimentary missions. We provide support for, and align with, the university,
The etiology of orthodontic problems Fifth session
بنام خداوند جان و خرد The etiology of orthodontic problems Fifth session دکتر مھتاب نوری دانشيار گروه ارتدنسی Course Outline( 5 sessions) Specific causes of malocclusion Genetic Influences Environmental
Critical Review: Sarah Rentz M.Cl.Sc (SLP) Candidate University of Western Ontario: School of Communication Sciences and Disorders
Critical Review: In children with cerebral palsy and a diagnosis of dysarthria, what is the effectiveness of speech interventions on improving speech intelligibility? Sarah Rentz M.Cl.Sc (SLP) Candidate
Human Anatomy and Physiology The Respiratory System
Human Anatomy and Physiology The Respiratory System Basic functions of the respiratory system: as a Gas exchange supply oxygen to aerobic tissues in the body and remove carbon dioxide waste product. in-
Pulmonary Ventilation
Pulmonary Ventilation Graphics are used with permission of: Pearson Education Inc., publishing as Benjamin Cummings (http://www.aw-bc.com) Page 1. Introduction Pulmonary ventilation, or breathing, is the
3030. Eligibility Criteria.
3030. Eligibility Criteria. 5 CA ADC 3030BARCLAYS OFFICIAL CALIFORNIA CODE OF REGULATIONS Barclays Official California Code of Regulations Currentness Title 5. Education Division 1. California Department
Section B: Epithelial Tissue 1. Where are epithelial tissues found within the body? 2. What are the functions of the epithelial tissues?
Tissue worksheet Name Section A: Intro to Histology Cells are the smallest units of life. In complex organisms, cells group together with one another based on similar structure and function to form tissues.
Functions of the Brain
Objectives 0 Participants will be able to identify 4 characteristics of a healthy brain. 0 Participants will be able to state the functions of the brain. 0 Participants will be able to identify 3 types
2161-1 - Page 1. Name: 1) Choose the disease that is most closely related to the given phrase. Questions 10 and 11 refer to the following:
Name: 2161-1 - Page 1 1) Choose the disease that is most closely related to the given phrase. a disease of the bone marrow characterized by uncontrolled production of white blood cells A) meningitis B)
A diagram of the ear s structure. The outer ear includes the portion of the ear that we see the pinna/auricle and the ear canal.
A diagram of the ear s structure THE OUTER EAR The outer ear includes the portion of the ear that we see the pinna/auricle and the ear canal. The pinna or auricle is a concave cartilaginous structure,
Articulatory Phonetics. and the International Phonetic Alphabet. Readings and Other Materials. Review. IPA: The Vowels. Practice
Supplementary Readings Supplementary Readings Handouts Online Tutorials The following readings have been posted to the Moodle course site: Contemporary Linguistics: Chapter 2 (pp. 34-40) Handouts for This
Position Classification Standard for Speech Pathology and Audiology Series, GS-0665
Position Classification Standard for Speech Pathology and Audiology Series, GS-0665 Table of Contents SERIES DEFINITION... 2 EXCLUSIONS... 2 COVERAGE OF THE SERIES... 3 COVERAGE OF THE STANDARD... 4 OCCUPATIONAL
THESE ARE A FEW OF MY FAVORITE THINGS DIRECT INTERVENTION WITH PRESCHOOL CHILDREN: ALTERING THE CHILD S TALKING BEHAVIORS
THESE ARE A FEW OF MY FAVORITE THINGS DIRECT INTERVENTION WITH PRESCHOOL CHILDREN: ALTERING THE CHILD S TALKING BEHAVIORS Guidelines for Modifying Talking There are many young children regardless of age
What is Obstructive Sleep Apnoea?
Patient Information Leaflet: Obstructive Sleep Apnoea Greenlane Respiratory Services, Auckland City Hospital & Greenlane Clinical Centre Auckland District Health Board What is Obstructive Sleep Apnoea?
Department of English and American Studies. English Language and Literature
Masaryk University Faculty of Arts Department of English and American Studies English Language and Literature Eva Mlčáková Speech Defects in English Speaking and Czech Children Bachelor s Diploma Thesis
Things to remember when transcribing speech
Notes and discussion Things to remember when transcribing speech David Crystal University of Reading Until the day comes when this journal is available in an audio or video format, we shall have to rely
Sara Rosenfeld-Johnson s Approach to Oral-Motor Feeding and Speech Therapy
Sara Rosenfeld-Johnson s Approach to Oral-Motor Feeding and Speech Therapy What is oral-motor therapy and what is unique about SRJ oral-motor therapy? Oral-motor therapy addresses the physical movements
SPEECH OR LANGUAGE IMPAIRMENT EARLY CHILDHOOD SPECIAL EDUCATION
I. DEFINITION Speech or language impairment means a communication disorder, such as stuttering, impaired articulation, a language impairment (comprehension and/or expression), or a voice impairment, that
TECHNICAL ASSISTANCE AND BEST PRACTICES MANUAL Speech-Language Pathology in the Schools
I. Definition and Overview Central Consolidated School District No. 22 TECHNICAL ASSISTANCE AND BEST PRACTICES MANUAL Speech-Language Pathology in the Schools Speech and/or language impairments are those
Chapter 13. The Nature of Somatic Reflexes
Chapter 13 The Nature of Somatic Reflexes Nature of Reflexes (1 of 3) A reflex is an involuntary responses initiated by a sensory input resulting in a change in a gland or muscle tissue occur without our
L3: Organization of speech sounds
L3: Organization of speech sounds Phonemes, phones, and allophones Taxonomies of phoneme classes Articulatory phonetics Acoustic phonetics Speech perception Prosody Introduction to Speech Processing Ricardo
BREATHE BETTER SWIM FASTER
BREATHE BETTER SWIM FASTER Breath control is fundamental to efficient swimming. Like singers, swimmers need to train their breathing for effective performance. Controlled breathing is the main factor contributing
Preparation "Speech Language Pathologist Overview"
Speech Language Pathologist Overview The Field - Preparation - Day in the Life - Earnings - Employment - Career Path Forecast - Professional Organizations The Field Speech-language pathologists, sometimes
Human Digestive System Anatomy
Human Digestive System Anatomy Biology 104 Objectives: 1. Learn the anatomy of the digestive system. You should be able to find all terms in bold on the human torso models. 2. Relate structure of the system
Part 1: Physiology. Below is a cut-through view of a human head:
The Frenzel Technique, Step-by-Step by Eric Fattah, Copyright 2001 [email protected] This document should be in electronic form at: http://www.ericfattah.com/equalizing.html Feel free to distribute
Parkinson s Disease (PD)
Parkinson s Disease (PD) Parkinson s disease (PD) is a movement disorder that worsens over time. About 1 in 100 people older than 60 has Parkinson s. The exact cause of PD is still not known, but research
CRANIOFACIAL ABNORMALITIES
CRANIOFACIAL ABNORMALITIES It is well documented that mouth-breathing children grow longer faces. A paper by Tourne entitled The long face syndrome and impairment of the nasopharyngeal airway, recognised
Engage: Brainstorming Body Systems. Record the structures and function of each body system in the table below.
Engage: Brainstorming Body s Record the structures and function of each body system in the table below. Body Nervous Circulatory Excretory Immune Digestive Respiratory Skeletal Muscular Endocrine Integumentary
Cerebral Palsy. 1 - Introduction. An informative Booklet for families in the Children and Teens program
Cerebral Palsy 1 - Introduction An informative Booklet for families in the Children and Teens program Centre de réadaptation Estrie, 2008 Preface Dear parents, It is with great pleasure that we present
Class 10 NCERT Science Text Book Chapter 7 Control and Coordination
Class 10 NCERT Science Text Book Chapter 7 Control and Coordination Question 1: What is the difference between a reflex action and walking? A reflex action is a rapid, automatic response to a stimulus.
Airways Resistance and Airflow through the Tracheobronchial Tree
Airways Resistance and Airflow through the Tracheobronchial Tree Lecturer: Sally Osborne, Ph.D. Department of Cellular & Physiological Sciences Email: [email protected] Useful links: www.sallyosborne.com
Comprehensive Reading Assessment Grades K-1
Comprehensive Reading Assessment Grades K-1 User Information Name: Doe, John Date of Birth: Jan 01, 1995 Current Grade in School: 3rd Grade in School at Evaluation: 1st Evaluation Date: May 17, 2006 Background
The sound patterns of language
The sound patterns of language Phonology Chapter 5 Alaa Mohammadi- Fall 2009 1 This lecture There are systematic differences between: What speakers memorize about the sounds of words. The speech sounds
The Role of the SLP in Schools. A Presentation for Teachers, Administrators, Parents, and the Community 1
The Role of the SLP in Schools A Presentation for Teachers, Administrators, Parents, and the Community 1 Speech-Language Pathologists (SLPs) Are Specially Trained Professionals Who Have Earned: A master
A Note to Physical, Occupational and Speech Therapists
D Page 1 of 5 A Note to Physical, Occupational and Speech Therapists Treating Children with Hurler Syndrome Because Hurler syndrome is such a rare disease, we have provided some basic information to assist
ORGAN SYSTEMS OF THE BODY
ORGAN SYSTEMS OF THE BODY DEFINITIONS AND CONCEPTS A. Organ a structure made up of two or more kinds of tissues organized in such a way that they can together perform a more complex function that can any
Laryngeal Mask Airways (LMA), Indications and Use for the Pre-Hospital Provider. www.umke.org
Laryngeal Mask Airways (LMA), Indications and Use for the Pre-Hospital Provider Objectives: Identify the indications, contraindications and side effects of LMA use. Identify the equipment necessary for
Basic techniques of pulmonary physical therapy (I) 100/04/24
Basic techniques of pulmonary physical therapy (I) 100/04/24 Evaluation of breathing function Chart review History Chest X ray Blood test Observation/palpation Chest mobility Shape of chest wall Accessory
ALL ABOUT SPASTICITY. www.almirall.com. Solutions with you in mind
ALL ABOUT SPASTICITY www.almirall.com Solutions with you in mind WHAT IS SPASTICITY? The muscles of the body maintain what is called normal muscle tone, a level of muscle tension that allows us to hold
Investigating the Human Body On-site student activities: Years 7-8 Investigating the Human Body On-site student activities Years 7 8
Investigating the Human Body On-site student activities Years 7 8 Student activity (and record) sheets have been developed with alternative themes for students to use as guides and focus material during
Developmental Verbal Dyspraxia
Developmental Verbal Dyspraxia Pam Williams Dip. CST; M Sc; MRCSLT Pam Williams is Principal Speech and Language Therapist, Nuffield Hearing and Speech Centre, Royal National Throat, Nose and Ear Hospital,
Transmittal 55 Date: MAY 5, 2006. SUBJECT: Changes Conforming to CR3648 for Therapy Services
CMS Manual System Pub 100-03 Medicare National Coverage Determinations Department of Health & Human Services (DHHS) Centers for Medicare & Medicaid Services (CMS) Transmittal 55 Date: MAY 5, 2006 Change
Categories of Exceptionality and Definitions
7. CATEGORIES and DEFINITIONS of EXCEPTIONALITIES Purpose of the standard To provide the ministry with details of the categories and definitions of exceptionalities available to the public, including parents
Comprehensive Special Education Plan. Programs and Services for Students with Disabilities
Comprehensive Special Education Plan Programs and Services for Students with Disabilities The Pupil Personnel Services of the Corning-Painted Post Area School District is dedicated to work collaboratively
1: Motor neurone disease (MND)
1: Motor neurone disease (MND) This section provides basic facts about motor neurone disease (MND) and its diagnosis. The following information is an extracted section from our full guide Living with motor
Cerebral Palsy. In order to function, the brain needs a continuous supply of oxygen.
Cerebral Palsy Introduction Cerebral palsy, or CP, can cause serious neurological symptoms in children. Up to 5000 children in the United States are diagnosed with cerebral palsy every year. This reference
43 243.1. Criteria for Entry into Programs of Special Education for Students with Disabilities
Document No. STATE BOARD OF EDUCATION CHAPTER 43 Statutory Authority: Individuals with Disabilities Education Improvement Act of 2004, 20 U.S.C. 1400 et seq. (2004) 43 243.1. Criteria for Entry into Programs
Reavis High School Anatomy and Physiology Curriculum Snapshot
Reavis High School Anatomy and Physiology Curriculum Snapshot Unit 1: Introduction to the Human Body 10 days As part of this unit, students will define anatomy, physiology, and pathology. They will identify
Practice Test for Special Education EC-12
Practice Test for Special Education EC-12 1. The Individualized Educational Program (IEP) includes: A. Written evaluation B. Assessment tests C. Interviews 2. Learning disabilities include: A. Cerebral
Diagram 2(i): Structure of the Neuron
Diagram 2(i): Structure of the Neuron Generally speaking, we can divide the nervous system into different parts, according to location and function. So far we have mentioned the central nervous system
SPEECH OR LANGUAGE IMPAIRMENT
I. DEFINITION "Speech or Language Impairment" means a communication disorder, such as stuttering, impaired articulation, a language impairment, or a voice impairment, that adversely affects a child's educational
THE SPINAL CORD AND THE INFLUENCE OF ITS DAMAGE ON THE HUMAN BODY
THE SPINAL CORD AND THE INFLUENCE OF ITS DAMAGE ON THE HUMAN BODY THE SPINAL CORD. A part of the Central Nervous System The nervous system is a vast network of cells, which carry information in the form
Adult Speech-Language Pathology Services in Health Care
Adult Speech-Language Pathology Services in Health Care How do speech-language pathologists (SLPs) help people? SLPs work with people who have trouble speaking listening reading writing thinking swallowing
Lecture 1-10: Spectrograms
Lecture 1-10: Spectrograms Overview 1. Spectra of dynamic signals: like many real world signals, speech changes in quality with time. But so far the only spectral analysis we have performed has assumed
Cerebral Palsy. 1995-2014, The Patient Education Institute, Inc. www.x-plain.com nr200105 Last reviewed: 06/17/2014 1
Cerebral Palsy Introduction Cerebral palsy, or CP, can cause serious neurological symptoms in children. Thousands of children are diagnosed with cerebral palsy every year. This reference summary explains
ROLE OF ORAL APPLIANCES TO TREAT OBSTRUCTIVE SLEEP APNEA
1 ROLE OF ORAL APPLIANCES TO TREAT OBSTRUCTIVE SLEEP APNEA There are three documented ways to treat obstructive sleep apnea: 1. CPAP device 2. Oral Appliances 3. Surgical correction of nasal and oral obstructions
Dysarthria. Dysarthria Review. Review of the Dysarthria Types. Dysarthria Review (con t) Speech Characteristics Flaccid
Dysarthria Differential diagnosis and treatment of dysarthria in children Presented by K. Farinella, Ph.D., CCC-SLP April 12, 2013 1:30 3pm General diagnostic term for a group of speech disorders resulting
Mathematical modeling of speech acoustics D. Sc. Daniel Aalto
Mathematical modeling of speech acoustics D. Sc. Daniel Aalto Inst. Behavioural Sciences / D. Aalto / ORONet, Turku, 17 September 2013 1 Ultimate goal Predict the speech outcome of oral and maxillofacial
Divisions of the Skeletal System
OpenStax-CNX module: m46344 1 Divisions of the Skeletal System OpenStax College This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 By the end of this
School-Based Health Services: Speech and Language Therapy. Brenda Addington, MA, CCC-SLP Jessamine County Schools August 29, 2013
School-Based Health Services: Speech and Language Therapy Brenda Addington, MA, CCC-SLP Jessamine County Schools August 29, 2013 Session Objectives: 1. Overview of the areas of communication served in
Resource Guide to Oral Motor Skill Difficulties in Children with Down Syndrome
Resource Guide to Oral Motor Skill Difficulties in Children with Down Syndrome By Libby Kumin, Ph.D., CCC-SLP Loyola College, Columbia, MD Why does my child have difficulty with feeding, drinking and speech?
Using telehealth to deliver speech treatment for Parkinson s into the home: Outcomes & satisfaction
Using telehealth to deliver speech treatment for Parkinson s into the home: Outcomes & satisfaction Deborah Theodoros PhD Anne Hill PhD Trevor Russell PhD Telerehabilitation Research Unit Parkinson s Australia
Essentials of Human Anatomy & Physiology. 7 th edition Marieb, Elaine, 2003. Chapters 10-11. Lab Manual, 2 nd edition and coloring book, 7 th edition
Topic/Unit: Anatomy & Physiology Circulatory System Curricular Goals/ Learning Outcomes: Students will be able to identify the composition of blood and its function. Students will be able to differentiate
