Telephone Based Voice Pathology Assessment using Automated Speech Analysis and VoiceXML



Similar documents
Confidence Intervals for One Mean

Basic Measurement Issues. Sampling Theory and Analog-to-Digital Conversion

LECTURE 13: Cross-validation

Domain 1 - Describe Cisco VoIP Implementations

Normal Distribution.

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

PSYCHOLOGICAL STATISTICS

Study on the application of the software phase-locked loop in tracking and filtering of pulse signal

Prescribing costs in primary care

Review: Classification Outline

Research Article Sign Data Derivative Recovery

Convention Paper 6764

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Modified Line Search Method for Global Optimization

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

I. Chi-squared Distributions

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Cantilever Beam Experiment

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Quadrat Sampling in Population Ecology

Systems Design Project: Indoor Location of Wireless Devices

Hypothesis testing. Null and alternative hypotheses

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Determining the sample size

How To Solve The Homewor Problem Beautifully

Chapter 7: Confidence Interval and Sample Size

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

June 3, Voice over IP

Baan Service Master Data Management

ODBC. Getting Started With Sage Timberline Office ODBC

HCL Dynamic Spiking Protocol

Statistical inference: example 1. Inferential Statistics

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

Now here is the important step

CHAPTER 3 DIGITAL CODING OF SIGNALS

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Hypergeometric Distributions

CONTROL CHART BASED ON A MULTIPLICATIVE-BINOMIAL DISTRIBUTION

Output Analysis (2, Chapters 10 &11 Law)

1 Computing the Standard Deviation of Sample Means

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

Department of Computer Science, University of Otago

APPLICATION NOTE 30 DFT or FFT? A Comparison of Fourier Transform Techniques

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Theorems About Power Series

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

5: Introduction to Estimation

Chapter 7 Methods of Finding Estimators

Escola Federal de Engenharia de Itajubá

Domain 1: Designing a SQL Server Instance and a Database Solution

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Spam Detection. A Bayesian approach to filtering spam

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

Classification Data Mining with Hybrid Fuzzy Logic Aggregation

INVESTMENT PERFORMANCE COUNCIL (IPC)

Automatic Tuning for FOREX Trading System Using Fuzzy Time Series

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

Enhancing Oracle Business Intelligence with cubus EV How users of Oracle BI on Essbase cubes can benefit from cubus outperform EV Analytics (cubus EV)

A probabilistic proof of a binomial identity

Evaluating Model for B2C E- commerce Enterprise Development Based on DEA

Measures of Spread and Boxplots Discrete Math, Section 9.4

One-sample test of proportions

1 Correlation and Regression Analysis

Volatility of rates of return on the example of wheat futures. Sławomir Juszczyk. Rafał Balina

CS100: Introduction to Computer Science

Bio-Plex Manager Software

Properties of MLE: consistency, asymptotic normality. Fisher information.

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Evaluation of Different Fitness Functions for the Evolutionary Testing of an Autonomous Parking System

BaanERP. BaanERP Windows Client Installation Guide

(VCP-310)

CHAPTER 3 THE TIME VALUE OF MONEY

5.4 Amortization. Question 1: How do you find the present value of an annuity? Question 2: How is a loan amortized?

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Lesson 15 ANOVA (analysis of variance)

Incremental calculation of weighted mean and variance

Configuring Additional Active Directory Server Roles

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Security Functions and Purposes of Network Devices and Technologies (SY0-301) Firewalls. Audiobooks

Tell us if you need help because of a disability Ask for a free interpreter

Estimating Probability Distributions by Observing Betting Practices

Is there employment discrimination against the disabled? Melanie K Jones i. University of Wales, Swansea

Domain 1 Components of the Cisco Unified Communications Architecture

Reliability Analysis in HPC clusters

Modeling of Ship Propulsion Performance

1. C. The formula for the confidence interval for a population mean is: x t, which was

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY

Capacity of Wireless Networks with Heterogeneous Traffic

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

1 The Gaussian channel

Lesson 17 Pearson s Correlation Coefficient

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Transcription:

ISSC 2004, Belfast, Jue 30 - July 2 Telephoe Based Voice Pathology Assessmet usig Automated Speech Aalysis ad VoiceXML Rosaly Mora φ, Richard B. Reilly φ, Philip dechazal φ ad Peter Lacy* φ Departmet of Electroic ad Electrical Egieerig, Uiversity College Dubli Belfield, Dubli 4, IRELAD E-mail: φ Richard.Reilly@ucd.ie * St James s Hospital, Dubli 8 IRELAD E-mail: *etdoc@iol.ie Abstract -- A system of remotely detectig vocal fold pathologies usig telephoe quality speech is preseted. Usig VoiceXML, a database of 63 clea speech files of the sustaied phoatio of the vowel soud /a/ (58 ormal subjects, 573 pathologic from the Disordered Voice Database Model 4337 was trasmitted over telephoe chaels to produce a test corpus. Pitch perturbatio features, amplitude perturbatio features ad a set of measures of the harmoic-to-oise ratio are extracted from the clea ad trasmitted speech files. These feature sets are used to test ad trai automatic classifiers, employig the method of Liear Discrimiat Aalysis. Cross-fold validatio was employed to measure classifier performaces. While a sustaied phoatio ca be classified as ormal or pathologic with accuracy greater tha 90%, results idicate that a telephoe quality speech ca be classified as ormal or pathologic with a accuracy of 74.5%. Amplitude perturbatio features provig most robust i chael trasmissio. This study highlights the real possibility for remote diagosis of voice pathology. Keywords Voice Pathology, speech aalysis, VoiceXML. methods are time ad persoel itesive ad lack I. ITRODUCTIO objectivity. Patiets with voice disorders ca possess a variety of vocal fold pathologies. Voice pathologies are relatively commo affectig almost 5% of the populatio []. These pathologies ca be foud i varyig degrees of severity ad developmet. They ca be classed as physical, euromuscular, traumatic ad psychogeic ad all directly affect the quality of the voice. Developmets i o-ivasive methods for voice pathology diagosis have bee motivated by a eed to coduct both objective ad efficiet aalysis of a patiet s vocal fuctio. At preset a umber of diagostic tools are available to the otolarygologists ad speech pathologists such as videostroboscopy [2] ad videokymography. However these curret Research has bee reported o the developmet of reliable ad simple methods to aid i early detectio, diagosis, assessmet ad treatmet of larygeal disorders. This research has lead to the developmet of feature extractio from acoustic sigals to aid diagosis. Much focus has bee cetred o perturbatio aalysis measures such as jitter ad shimmer ad o sigal-to-oise ratios of voiced speech, which reflect the iteral fuctioig of the voice. Through this research it has bee show that these features ca discrimiate betwee ormal ad pathologic speakers [3],[4],[5],[6]. Voice pathology detectio systems usig high quality voice recordigs have achieved classificatio accuracies of over 90% i beig able to discrimiate betwee ormal ad pathologic speakers [7], [8].

I. AIM The aim of this research was to ivestigate the performace of a telephoe based voice pathology classifier to categorise a sustaied phoatio of the vowel soud /a/ ito either a ormal or pathologic class. The goal of this project was to produce a voice pathology classifier providig remote diagosis i a o-itrusive ad objective maer. II. METHODOLOGY The steps ivolved i a voice pathology classificatio system, as show i Figure, are discussed below. frequecy (F0, jitter (short-term, cycle to cycle, perturbatio i the fudametal frequecy of the voice, shimmer (short-term, cycle to cycle, perturbatio i the amplitude of the voice, sigal-tooise ratios ad harmoic-to-oise ratios [7]. The features used i this study iclude Pitch Perturbatio s, Amplitude Perturbatio s ad Harmoic to oise Ratio (HR. a Pitch ad Amplitude Measures Pitch ad Amplitude Perturbatio measures were calculated by segmetig the speech waveform (3-5 secods i legth ito overlappig epochs. Each epoch is 20msecod with a overlap of 75% betwee adjacet epochs. A 20msecod epoch is ecessary to give a accurate represetatio of pitch. Audio Data Acquisitio extractio Classifier Figure. Processes ivolved i Voice Pathology Classificatio Audio Data: The labelled voice pathology database Disordered Voice Database Model 4337 [9] acquired at the Massachusetts Eye ad Ear Ifirmary Voice ad Speech Laboratory ad distributed by Kay Elemetrics is ofte cited i the literature for voice pathology assessmet. A detailed descriptio of the database ca be foud at [9]. The mixed geder database cotais 63 voice recordigs (Pathological: 573 ad ormal: 58 with a associated cliical diagosis. The types of pathologies are diverse, ragig from Vocal Fold Paralysis to Vocal Fold Carcioma. Attetio i this study was focused o the sustaied phoatio of the vowel soud /a/ (as i the Eglish word cap. This database was origially recorded at a samplig rate of 25kHz. For this study the database was dowsampled to 0kHz. Acquisitio: A voice recordig is typically acquired usig a microphoe ad digital storage i a clea audio eviromet. As the aim of this study is for a remote diagosis classificatio system, each of the 63 voice recordigs were played over a log distace telephoe chael usig a specifically writte VoiceXML script ad recorded uder the cotrol of aother VoiceXML script. The VoiceXML developmet was carried out usig the olie developmet system VoxBuilder [0]. This process created a telephoe quality voice pathology database for all 63 voice recordigs i the Disordered Voice Database Model 4337 database. A VoiceXML applicatio curretly exists to allow ew telephoe quality audio samples to be gathered. Extractio: s typically extracted from the audio data for voice pathology aalysis iclude the fudametal Decisio o. Descriptio Formula Mea F0 2 Maximum F0 3 Miimum F0 4 Stadard Deviatio F i max( F i mi( F i of F0 cotour ( 2 F i F 5 Phoatory Frequecy Rage 6 Mea Absolute Jitter (MAJ 7 Jitter (% 8 Relative Average Perturbatio smoothed over 3 pitch periods 9 Pitch Perturbatio Quotiet smoothed over 5 pitch periods 2 2 4 F0 _ hi log 0 _ F lo 2 log 2 F i F i MAJ Fi + + Fi + Fi 3 2 3 i+ 2 + Fi 00 F( k k = i 2 Fi 5 00 0 Pitch Perturbatio i 27 Quotiet smoothed ( 27 27 over 55 pitch + F k k = i Fi 54 28 55 periods 00 Pitch Perturbatio p threshold Factor 00 2 Directioal Perturbatio Factor voice ± voice Table Pitch Perturbatio s 00

o. Descriptio Formula Mea Amp A i 2 Maximum Amp 3 Miimum Amp 4 Stadard Deviatio max( A i mi( A i of Amp cotour ( 2 A i A 5 Mea Absolute Shimmer (MAS 6 Shimmer (% 7 Shimmer :Decibels A i A i MAS + = i 20 log Ai + 8 Amplitude Relative Ai + + Ai + Ai A Average 2 i 2 3 Perturbatio smoothed over 3 pitch periods 2 9 Amplitude ( Perturbatio Quotiet smoothed i+ A k 2 k 2 = i Ai 4 3 5 over 5 pitch periods i 27 0 Amplitude ( 27 Perturbatio 27 + A k k = i Ai Quotiet smoothed 54 28 55 over 55 pitch periods Amplitude p threshold Perturbatio Factor 00 voice 2 Amplitude ± 00 Directioal voice Perturbatio Factor Table 2 Amplitude Perturbatio s A 00 00 00 b Harmoic to oise Ratio Mel Frequecy Cepstral Coefficiets (MFCC features are commoly used i Automatic Speech Recogitio (ASR ad also Automatic Speaker Recogitio systems []. The Cepstral domai is employed i speech processig as the lower valued cepstral quefrequecies model the vocal tract spectral dyamics, while the higher valued quefrequecies cotai pitch iformatio, see as equidistat peaks i the spectra. The Harmoic to oise Ratio is calculated i the Cepstral domai, as follows:. Speech sigal is processed to have zero mea ad uit variace. 2. A 00msecod epoch is extracted. 3. A peak-pickig algorithm locates the peaks at multiples of the fudametal frequecy. 4. A badstop filter i the Cepstral domai is applied to the sigal. The stopbad of the filter is limited to the width of each peak. The remaiig sigal is kow as the rahmoics (harmoics i the cepstral domai combliftered sigal ad cotais the oise iformatio. 5. The Fourier trasform of this comb-liftered sigal is take, geeratig a estimate of the oise eergy preset (f. Similarly, the Fourier Trasform of the origial cepstral-domai sigal, icludig rahmoics is take, O(f. 6. The HR for a give frequecy bad B is the calculated as per HRβ ( f = mea( O( f β mea( ( f β Eleve HR measures were calculated Bad umber Icorporatig Frequecies ( Hz 0-500 2 0-000 3 0-2000 4 0-3000 5 0-4000 6 0-5000 7 500 000 8 000-2000 9 2000 3000 0 3000 4000 4000-5000 Table 3 HR bads. Classifier: Liear discrimiats (LD [2] partitio the feature space ito the differet classes usig a set of hyper-plaes. The parameters of this classifier model were fitted to the available traiig data by usig the method of maximum likelihood. Usig this method the processig required for traiig is achieved by direct calculatio ad is extremely fast relative to other classifier buildig methods such as eural etworks. This model assumes that the feature data has a Gaussia distributio for each class. I respose to iput features, liear discrimiats provide a probability estimate of each class. The fial classificatio is obtaied by choosig the class with the highest probability estimate. The cross-validatio scheme [3] was used for estimatig the classifier performace. The variace of the performace estimates was decreased by averagig results from multiple rus of cross validatio where a differet radom split of the traiig data ito folds is used for each ru. I this study te repetitios of te-fold cross-validatio were used to estimate classifier performace figures. For each ru of cross fold validatio the total ormal populatio ad a radomly selected group of abormals equal i size to the ormal populatio was utilised. This results i a more realistic reflectio of the predictive ability of the system. I this study the performace of the classifier is quoted usig the class sesitivities, predictivities ad the overall accuracy.

Diagosed Pathology Diagosed ormal P Classificatio Pathology True Positive TP False Positive FP Table 4 Classificatio Matrix Classificatio ormal False egative F True egative T Defiitios of sesitivity, specificity, predictivities ad the overall accuracy are give below:. Sesitivity = TP TP + F Fractio of speech files from the set of all pathologic files correctly classified. 2. Specificity = T T + FP Fractio of speech files from the set of all ormal voices correctly classified. 3. Positive _ ictivity = TP TP + FP Fractio of speech files detected as pathologic that are correctly classified. 4. egative _ ictivity = T T + F Fractio of speech files detected as ormal that are correctly classified. 5. TP + T Overall _ Accuracy = TP + T + FP + F Fractio of the total umber of subjects voices that are classified correctly. III. RESULTS Each feature was tested for class sesitivities, predictivities ad the overall accuracy. The cotributios provided by each of the pitch perturbatio measures, amplitude perturbatio measures ad HR, o the subsampled (0kHz ad telephoe quality databases are give i Tables 5-0. umber eg. 52.82 50.7 55.50 53.30 52.35 2 65.75 53.0 78.53 7.46 62.33 3 5.78 48.28 55.32 52.24 5.38 4 60.02 35.52 84.82 70.3 56.5 5 60.97 39.66 82.55 69.70 57.47 6 58.28 30.00 86.9 69.88 55.09 7 57.42 27.59 87.6 69.26 54.45 8 57.33 27.59 87.43 68.97 57.40 9 57.4 48.66 43.56 60.49 58.3 0 60.0 25.52 95. 84.09 55.78 62.45 37.93 87.26 75.09 58.4 2 49.52 49.4 49.9 49.83 49.23 Table 5 Pitch Perturbatio Measures: Clea 0kHz database umber eg. 47.79 43.28 52.36 47.90 47.69 2 63.66 57.93 69.46 65.75 6.99 3 50.9 43.0 58.8 5.44 50.52 4 62.36 4.03 83.94 72.2 58.49 5 6.0 42.24 82.02 70.40 58.39 6 6.75 37.07 86.74 73.88 58.49 7 60.54 34.48 86.9 72.73 57.72 8 60.36 34.32 86.56 72.20 56.62 9 58.37 35.00 82.02 66.34 55.49 0 63.3 32.76 94.24 85.20 58.6 58. 36.90 79.58 64.65 53.47 2 49.35 36.38 62.48 49.53 49.24 Table 6 Pitch Perturbatio: Trasmitted database umbe eg. 49.87 52.59 47.2 50.6 49.54 2 5.7 53.28 49.04 5.4 50.9 3 50.04 52.24 47.82 50.33 49.73 4 56.63 70.7 42.93 55.45 58.7 5 72.94 92.76 52.88 66.58 87.83 6 67.82 88.97 46.42 62.70 80.6 7 7.2 94.66 47.47 64.59 89.77 8 67.65 88.62 46.42 62.6 80.2 9 66.6 87.76 45.20 6.85 78.48 0 62.36 76.2 48.34 59.89 66.75 56.55 7.72 4.9 55.25 59.00 2 67.04 75.52 58.46 64.79 70.23 Table 7 Amplitude Perturbatio: Clea 0kHz database umber eg. 5.7 53.0 49.2 5.42 50.90 2 56.63 57.24 56.02 56.85 56.4 3 47.70 49.66 45.72 48.08 47.29 4 63.4 73.63 52.53 6.09 66.30 5 72.68 89.72 52.36 66.34 87.72 6 63.57 82.76 44.5 60.00 7.67 7 5.08.72 90.92 56.67 50.44 8 63.66 82.93 44.5 60.05 7.88 9 62.97 8.55 44.5 59.65 70.28 0 58.8 68.45 45.03 55.76 58.50 56.90 72.4 4.9 55.48 59.60 2 6.23 68.62 53.75 60.03 62.86 Table 8 Amplitude Perturbatio: Trasmitted database Bad umber eg. 76.25 79.83 78.8 78.74 75.00 2 75.55 83.79 78.58 80.07 82.78 3 79.88 85.00 74.69 77.27 83. 4 77.62 83.28 7.90 75.00 80.94 5 65.74 77.59 53.75 62.94 70.32 6 47.6 52.59 42.58 48. 47.0 7 70.86 83.79 57.77 66.76 77.88 8 50.22 34.66 65.97 50.76 49.93 9 75.20 73.28 77.4 76.44 74.04 0 79.63 78.79 82.4 82.9 79.40 75.46 84.4 66.67 7.87 80.59 Table 9 HR Bads: Clea 0kHz database

Bad umber eg. 53.77 72.4 28.0 52.70 57.09 2 44.4 42.76 46.07 44.52 44.30 3 45.9 48.45 4.88 45.77 44.53 4 49.78 4.83 95.29 50.9 49.73 5 53.86 67.24 40.3 53.5 58.86 6 A A A A A 7 5.08 6.55 86.04 54.55 50.46 8 50.65 3.03 70.5 5.58 50.25 9 58.28 24.48 92.50 76.76 54.75 0 57.5 52.4 74.00 67. 60.5 A A A A A Table 0 HR Bads: Trasmitted database For the telephoe database bads 6 ad were omitted. However due to chael variability bads - 5 were measured, as above, from 0Hz. The various feature groups provide idepedet ad complimetary classificatio iformatio. By combiig the feature groups it was aticipated that the overall classificatio performace would be improved. Classificatio results were obtaied for the combiatio of these features, as show i Table. Test Corpus eg. Clea 89.0 93.26 85.4 87.63 86.25 0kHz Telephoe 74.5 75.69 72.60 73.66 74.69 Table Classificatio results based o the combiatio of feature sets. IV. DISCUSSIO Twelve pitch ad twelve amplitude perturbatio measures were extracted from the pitch ad amplitude cotours respectively. The pitch ad amplitude perturbatio measures detect short-term chages i the pitch cotour. It was hypothesized that such measures, whe extracted from ormal ad pathologic subjects, would also be statistically differet ad therefore allow a classifier to distiguish betwee the two groups. ormal speech is kow to have certai levels of jitter ad shimmer. However pathologic speech should exhibit larger perturbatios i both the pitch ad the amplitude cotours. However, the difficulty i accurately trackig the pitch cotour especially i speech could severely limit the ability of the perturbatio measures to separate betwee ormal ad pathologic voice. The classificatio performaces of the pitch perturbatio measures to differetiate betwee ormal ad pathologic voice are preseted i Table 5 ad 6. Usig just oe feature, best test set accuracies of 65.75% ad 63.66% were achieved for idividual features o the 0kHz ad telephoe databases. The classificatio performaces of the amplitude perturbatio measures to differetiate betwee ormal ad pathologic voice are preseted i Table 7 ad 8. Best test set accuracies of 72.94% ad 63.66% were achieved for just oe feature (shimmer for idividual features o the 0kHz ad telephoe databases. The Harmoics-to-oise ratio (HR measures were extracted for eleve frequecy bads described i Table 3. The HR method trasforms the speech sigal to the cepstral domai, removes the rahmoic iformatio from the cepstrum ad applies the DFT to this sigal, which is defied as the spectrum of the estimated oise sigal. The classificatio performaces of the harmoic to oise ratio measures to differetiate betwee ormal ad pathologic voice are preseted i Table 9 ad 0. Usig just oe feature, best test set accuracies of 79.88% ad 58.28% were achieved for idividual features o the 0kHz ad telephoe databases. Pathologic subjects should theoretically have icreased levels of jitter, shimmer ad additive oise. Much research exists that demostrates that the sigal-to-oise measure does successfully distiguish betwee the ormal ad pathologic groups. From the results i Table0 the telephoe trasmissio of the audio files does ot allow preservatio of HR iformatio to idetify the pathologic ad ormal voice, compared to the 0kHz database allows. By takig combiatios of features, the ability of the features to separate ormal ad pathologic subjects o the two databases, 89.0% ad 74.5%, is a defiite improvemet over usig the idividual features groups. More research is eeded o the specific combiatios of features for these databases. A umber of research groups [4], [5], [6] have reported results for detectio rates for voice pathologies of 94.87%, 76% ad 96.30% respectively. I [4] the Disorder Voice Database Model 4337 sampled at 25kHz was employed ad their results may be compared with the results obtaied i this study, although they have used the higher quality audio data. I study [5] differet databases were used ad a direct compariso of results caot be made. The database used i the preset study provides a large umber of pathologic subjects that might ot fairly represet the pathologies preset i other studies coducted i this area or those ecoutered by the medical professio o a day-to-day basis. The predictive ability of this model could be cofirmed through exteral validity. As metioed i Sectio II the Disorder Voice Database Model 4337 is accompaied by a diagostic descriptio for each subject. The

diagostic descriptio provided with the database is very detailed. These detailed diagostic descriptios were grouped ito several diagostic categories by our medical cosultat. It was observed from the distributios of the diagoses for each subject that oly at the highest level (i.e. either ormal or pathologic that a patiet s diagosis is mutually exclusive. However, as the level of categorisatio proceeds to sublevels the patiet s diagoses are o loger mutually exclusive. Thus each subject may be diagosed ito more tha oe category i.e. they may have a pathology that is both physical ad euromuscular. This has a sigificat effect o the potetial for a automatic classificatio system to differetiate betwee the categorisatio types. Oe could categorise the database to allow for a vocal quality classificatio scheme. I this way, a speech recordig may be categorised based o the vocal quality of the speech recordig; breathy, straied or oisy. Future work i this area, based o the methods developed i this study, would allow ivestigatio ito the differetiatio, for example, betwee ormal subjects ad subjects with odules. Further voice samples are beig gathered i cojuctio with the Speech ad Laguage Therapy Departmet at Tallaght Hospital, Dubli. Audio samples are recorded i the cliic at 44kHz ad simultaeously usig a VoiceXML based telephoe applicatio. This allows the remote classificatio system to perform aalysis o both telephoy quality ad high quality audio. The classificatio of the audio data is performed automatically o receipt of the audio data, with results posted to a web iterface. It is hoped that as the performace of the telephoy system ehaces with icreased traiig samples, the system could provide the cliical staff with a useful pre-screeig service for voice pathology. V. COCLUSIO The results of the project suggest that by combiig VoiceXML as a telephoy iterface ad server side speech processig, a automatic classificatio system to differetiate betwee ormal ad pathologic voice ca be achieved. This study highlights the real possibility for remote diagosis of voice pathology. ACKOWLEDGEMET The support of the Iformatics Research Iitiative of Eterprise Irelad ad Voxpilot Ltd. is gratefully ackowledged. The authors would also like to ackowledge the assistace i data collectio of the Speech ad Laguage Therapy Departmet at Tallaght Hospital, Dubli. REFERECES [] W.Becker, H.H.auma, C.R.Faltz, Ear, ose ad Throat Diseases, Thieme Medical Publishers, 2 d Editio, 994. [2] B. Scheider, J. Wedler ad W. Seider The relevace of stroboscopy i fuctioal dysphoias, Folia Pho., Vol 54, o., pp 44-54, 2002. [3] P. Lieberma Perturbatios i vocal pitch J. Acoust. Soc. Am, Vol. 33, o. 5, 96. [4] I.R. Titze, Workshop o Acoustic Voice Aalysis, atioal Cetre for Voice ad Speech, America, 994 [5] G. de Krom, Some spectral correlates of pathological breathy ad rough voice quality for differet types of vowel fragmets, J. Speech. Hear. Res., Vol. 38, pp 794-8, 995 [6] D. Michaelis, M. Frohlich, H.W. Strube, Selectio ad combiatio of acoustic features for the descriptio of pathologic voices, J. Acoust. Soc. Am., Vol. 03, o. 3, pp 628-639, 998 [7] C. Maguire, P. de Chazal, R.B. Reilly, P. Lacy Automatic Classificatio of voice pathology usig speech aalysis, World Cogress o Biomedical Egieerig ad Medical Physics, Sydey, August 2003. [8] C. Maguire, P. de Chazal, R.B. Reilly, P. Lacy Idetificatio of Voice Pathology usig Automated Speech Aalysis, Proc. of the 3 rd Iteratioal Workshop o Models ad Aalysis of Vocal Emissio for Biomedical Applicatios, Florece, December 2003. [9] Disorder Voice Database Model 4337 Massachusetts Eye ad Ear Ifirmary Voice ad Speech Lab, Bosto, MA, Ja. 994. Kay Elemetrics Corporatio. [0] Voxpilot Ltd., Dubli. www.voxpilot.com. [] J.P. Campbell, Speaker Idetificatio: A tutorial, Proc. of the IEEE, Vol. 85, o. 9, pp 437-462, 997. [2] R. O. Duda, P. E. Hart, ad H. G. Stork, Patter Classificatio, Wiley-Itersciece, ew York, Y, 2000. [3] R. Kohavi, A study of cross validatio ad bootstrap for accuracy estimatio ad model selectio, Proc. 4 th It. Cof o Art. Itel., pp. 37-43, 995. [4] G. Llorete, S. avarro et al, "O The Selectio of Meaigful Speech Parameters Used Pathologic/opathologic Voice Register Classifier", Eurospeech '99, Volume, Page 563-566, 997. [5] D. G. Childers, Detectio of Larygeal Fuctio usig Speech ad Electrographic Data IEEE Trasactios o Biomedical Egieerig, Vol. 39, o., pp 9-25, JA 992 [6] M. E. Cesar, R. L. Hugo, Acoustic Aalysis of Speech for Detectio of Larygeal Pathologies, Proc. 22 d Aual EMBS It. Cof., pp 2369-2372, July 2000.