Comparative study of the commercial software for sound quality analysis

TECHNICAL REPORT #2008 The Acoustical Society of Japan Comparative study of the commercial software for sound quality analysis Sung-Hwan Shin Department of Electrical and Mechanical Engineering, Seikei University, 3 3 1, Kichijoji-kitamachi, Musashino, 180 8633 Japan ( Received 14 September 2007, Accepted for publication 28 December 2007 ) Abstract: Sound quality (SQ) is a perceptual or subjective reaction to a sound and its concept becomes one of the important factors that improve the competitive power of a product. Through the various studies related to SQ by psycho-acoustic researchers, models for objective measures that substitute subjective evaluation, called SQ metrics, have been proposed which consider human auditory characteristics. Representative SQ metrics are loudness, sharpness, roughness, and fluctuation strength. For other SQ metrics except loudness, however, the calculation algorithms have not been standardized yet. The purpose of this study is to investigate whether there is difference among the commercial software for the calculation of SQ metrics and if any, how much difference exists among them. For this, three kinds of popular commercial software and one self-coded program were chosen and by applying them to some sample sounds, four representative SQ metrics were calculated and compared. As a result, it was confirmed that there are considerable differences among the calculated results of SQ metrics including loudness. This means that it is necessary to standardize SQ metrics as soon as possible before everything else and in addition, to mention used SQ software when an index that can predict SQ is developed or SQ database for any kind of product is created. Keywords: Sound quality, Software comparison, Sound quality metrics PACS number: 43.50.Ba, 43.15, 43.50 [doi:10.1250/ast.29.221] 1. INTRODUCTION As people take an interest in sound quality (SQ) of noise emitted from a product, SQ based noise control has became an important item to improve the competitive power and preference of a product. This kind of study has been mainly performed in the filed of the vehicle and home appliance product. Usually, the evaluation on the SQ of product noise has been conducted by professional testers. However, the method has some problems not only spending too much time and money in order to train profession testers but also being sometimes dependent on their personal preference or status in spite of the training. Additionally, in various industrial fields, it is necessary to create SQ database with objective expression not subjective one for the comparison or estimation of the degree of SQ of new product. Through various researches related to SQ, several objective techniques or measures that can objectively estimate the degree of SQ, called SQ metrics, have been suggested by psycho-acoustic researchers and basically, they are based on the human auditory characteristics. e-mail: soulshin@gmail.com Although several kinds of commercial software have been developed by Head Acoustics, Brüel & Kjær, MTS, LMS, and so on and are being sold in market, other SQ metrics except loudness for the steady sound [1] have not been standardized yet. So, calculated results on the same sound may be different with one another according to the software. The purpose of this study is to investigate whether there is difference among commercial software in the calculation of SQ metrics and if any, how much difference exists among them. For this, three kinds of popular commercial software (S/W A, B, and C) and one self-coded program (S/W D) were employed for the calculation of four representative SQ metrics: loudness, sharpness, roughness, and fluctuation strength. For each SQ metric, a basic reliability test is conducted with the reference signal and spectral and temporal patterns are compared with broad band noise (BBN) like pink noise, amplitude and frequency modulated (AM and FM) sounds, and so on [2,3]. 2. LOUDNESS Loudness is the most important thing among the representative SQ metrics. It belongs to the category of intensity sensation [4] and has a linear relation with 221

sensation level related to the extent that a sound is loud. The models for the loudness calculation of steady sound were systematized and standardized by Stevens and Zwicker [1], respectively, and sone defined by Stevens [5] is used as unit of loudness. 1 sone is 1 khz pure tone having a level of 40 db. Recently, improved loudness models based on Zwicker s loudness model have been suggested by Moore [6], Jeong [7], et al. Loudness model considers various human auditory characteristics [8 10] including equal-loudness contour, sound attenuations between outer and middle ear, critical bandwidth, and spectral and temporal masking effects. There are two kinds of loudness. One is total loudness given in sone, shortly called loudness, which corresponds to whole sensation level of sound. The other is a specific loudness given in sone/bark which is loudness density according to critical band rate. Specific loudness furnishes information that is very useful for describing other SQ metrics such as sharpness, roughness and fluctuation strength. Loudness (N) is the integral of specific loudness (N 0 ) over critical band rate (z) and this relation is expressed mathematically as follows: N ¼ Z 24Bark 0Bark N 0 ðzþdz: First, the reference sound producing 1 sone, a 1 khz pure tone having a level of 40 db [8], was applied to four kinds of SQ software for the basic reliability test. All software were based on ISO532B and only difference was that self-coded program, S/W D employed 47 critical band filters not 24 filters in order to avoid the calculation error occurring when pure tone is located at the boundaries of neighboring filters [7]. Table 1 is to compare the loudness of the reference sound according to the SQ software. All software indicated 1 sone although S/W A and D had a little error within about 3%. Figure 1 shows the distributions of the specific loudness of the reference sound according to the SQ software. The specific loudness of the reference sound had maximum value at 8.5 Bark coincident with 1 khz [8] and those were quite similar in pattern and value as a function of critical band rate. That is reasonable result because the signal is a reference sound on loudness. Second, in order to compare the loudness on BBN, pink noises with different sound pressure level (SPL) were employed. Table 2 summarizes the calculated results according to the SQ software. As a whole, values of Table 1 Loudness (N) on the reference sound producing 1 sone according to the SQ software. S/W A B C D N (sone) 0.97 1.00 1.00 0.98 ð1þ Fig. 1 Comparison of the distributions of specific loudness of the reference sound producing 1 sone according to the SQ software. Table 2 Loudness (sone) on pink noises as a function of SPL according to the SQ software. S/W SPL (db) A B C D 30 0.80 0.58 0.60 0.79 40 2.62 2.21 2.20 2.62 50 6.15 5.44 5.40 6.12 60 12.60 11.35 11.20 12.50 70 24.10 21.97 21.60 23.92 80 44.70 40.93 40.30 44.53 loudness calculated by S/W A and D are larger than those by S/W B and C and there were relative differences between 9 and 30% according to sound. Figures 2(a) and 2(b) show the distributions of spectral loudness of pink noises with 40 and 70 db, respectively. In these two figures, it was shown that calculation results obtained from S/W A and D were very similar in pattern and value to each other. Specific loudness obtained from S/W B was also similar to that from S/W A in frequency range below 1,170 Hz coincident with 9.5 Bark, but it was smaller than others in frequency range above 1170 Hz. In addition, specific loudness obtained from S/W C was smaller value than that from S/W A in whole frequency range whereas it was similar in pattern. These differences in the comparison of BBN were unexpected results because three kinds of commercial software have employed the same standard and 1/3-octave band levels were rather similar to one another. In the equation [8] converting excitation level into specific loudness, threshold level as a function of frequency is used as a factor. However, there are several kinds of threshold level in quiet according to the test method or condition [9]. If the SQ software use the different threshold level one another, it can be a cause of the difference in loudness calculation. Finally, in order to compare the loudness on timevarying sound, frequency modulated (FM) tones were 222

S.-H. SHIN: COMPARISON OF SQ SOFTWARE Fig. 2 Comparisons of the distributions of specific loudness of pink noises with SPL of (a) 40 and (b) 70 db, respectively, according to the SQ software. employed [2]. FM tones have the same center frequency: 1.5 khz, frequency variation: 800 Hz, and SPL: 60 db, but different modulation frequencies. In fact, for the calculation of loudness on time-varying sound, Zwicker [11,12] suggested a loudness model for a transient sound based on ISO 532B and based on the model, Zollner [13] and Blommer [14] also suggested other loudness models for a transient sound. Although these models have been used for the commercial SQ software, however, concrete algorithms have not been known. Figure 3 shows the change of the loudness as a function of modulation frequency. Calculated loudness values had a similar tendency increasing with the increase of the modulation frequency. In addition, except the result obtained from S/W B, calculated results are within 6% relative error at each modulation frequency. Figure 4 is to compare loudness vs. time on FM tones with 10 Hz and 200 Hz modulation frequency. S/W C was excluded from the comparison because it did not provide a function to calculate loudness vs. time. In Fig. 4(a), loudness vs. time obtained from S/W A and D modulates with same modulation frequency to used FM tone, 10 Hz. The reason why loudness is not constant is because human auditory system has a different sensitivity according to Fig. 3 Comparison of the loudness on FM tones having 1.5 khz center frequency, 800 Hz frequency variation, and 60 db SPL as a function of modulation frequency. frequency. Compared with the results of Fig. 4(b), the extent of the change of loudness was reduced when modulation frequency increased. It evidenced that postmasking effect [8,10] was reflected. And, the existence of a loudness increasing part at the initial time between 0 and 0.2 seconds indicates that S/W A and D consider temporal integration effect [8]. The middle figure of Fig. 4(a) shows Fig. 4 Comparisons of the loudness vs. time on FM tone with modulation frequency of (a) 10 and (b) 200 Hz according to the SQ software. 223

that S/W B can not express exactly the change of sound. The simplest cause of this problem can be found in the difference of time resolution for the calculation. For your guidance, S/W A and D use 2.9 ms and 2 ms as time resolution for loudness vs. time, respectively, and S/W B does 20 ms. In addition, it was investigated that repeated discontinuous parts happened in loudness vs. time from S/W B in Fig. 4(b). 3. SHARPNESS Although two sounds have the same loudness, their sensations are different from each other according to the distribution of specific loudness. When the distribution of specific loudness errs on the side of high frequency range, the sound is more sharp and shrill. Sharpness is a sensation quantity expressing the extent that a sound is sharp or shrill and its unit is acum. Models for the calculation of sharpness (S) was suggested by Bismarck [15] and Aures [16,17]. In this study, Aures s model was used and it is defined as follows: Z 24Bark N 0 ðzþg s ðzþ S ¼ c 0Bark dz; ð2þ ln Nþ20 20 where c is a correction factor, and g s ðzþ a weighting factor for sharpness. For your guidance, a noise with continuous spectrum has the same sharpness as a sound composed of many tonal tones if spectral envelopes of two sounds obtained from critical band level are identical. And the variation of sharpness for a sound can be ignored if its level difference is not very large because sharpness of a sound with 90 db is about twice of that of the sound with 30 db [8]. First, the reference sound producing 1 acum, a narrowband noise one critical band wide at a center frequency of 1 khz having a level of 60 db [8], was applied to four kinds of SQ software for the basic reliability test. Table 3 summarizes the calculation results according to SQ software. All software indicated 1 acum for the reference sound although S/W B had a relative error of 7%. The comparison of specific sharpness was not performed because other SQ software except S/W D did not support the calculation of specific sharpness. Next, 5 narrow-band noises one critical band wide at 5 different center frequencies: 4.5, 8.5, 13.5, 17.5, and 21.5 Bark, were used for the calculation of sharpness and their SPLs were all same: 60 db. Figure 5 shows the calculation Table 3 Sharpness (S) on the reference sound producing 1 acum according to the SQ software. S/W A B C D S (acum) 1.02 1.07 1.00 0.97 Fig. 5 Comparison of the sharpness on narrow-band noises one critical band wide at 5 different center frequencies having 60 db SPL. results. For each SQ software, the sharpness increased as center frequency of critical band increased. This is reasonable because high frequency component is usually more sharp than low frequency one. As a whole, like the comparison result on loudness, the sharpness obtained from S/W A and D are rather similar in value each other. In the high frequency range more than 13.5 Bark coincident with 2.15 khz, however, sharpness values obtained from S/W A, B, and C were considerably different with one another. Specially, in the case of center frequency of 13.5 Bark, there was relative difference of 35% between sharpness values by S/W A and B and in the case of center frequency of 17.5 Bark coincident with 4 khz, there was relative difference of 63% between sharpness values by S/W A and C. It is inferred that these differences are mainly due to the difference of specific loudness according to SQ software. 4. ROUGHNESS People easily feel a rough sensation to sounds fast changing with the modulation frequency higher than 20 Hz. SQ metric expressing the extent of such feeling is roughness and its unit is asper. Examples of these kinds of sound are AM and FM sounds with modulation frequency more than 20 Hz and most narrow band noises also have a rough sensation. Since the empirical data related to the roughness of various sounds was reported by Terhardt [18,19] and Kemp [20] etc., some models for the calculation of roughness were suggested by Aures [21], Daniel [22], Widmann [23] et al. In suggested models, roughness is mainly affected by the modulation frequency and the amplitude or frequency modulation index. Unfortunately, roughness model has not been standardized yet and a concrete roughness model used in each commercial SQ software also has not been known. First, the reference sound producing 1 asper, a 60 db, 1 khz tone that is 100% modulated in amplitude at a modulation frequency of 70 Hz [8], was applied to four 224

S.-H. SHIN: COMPARISON OF SQ SOFTWARE Table 4 Roughness (R) on the reference sound producing 1 asper according to the SQ software. S/W A B C D R (asper) 0.96 0.22 1.06 1.06 Fig. 7 Comparison of calculated roughness values and Zwicker s empirical data for the AM tones having 1 khz center frequency, 70 Hz modulation frequency, and 60 db SPL as a function of modulation depth. Fig. 6 Comparison of the distributions of specific roughness of the reference sound producing 1 asper according to the SQ software. kinds of SQ software for the basic reliability test. Here, S/W D employed an algorithm [7] that is based on Aures s roughness model and complemented by Daniel s roughness model. Table 4 summarizes the calculation results according to the SQ software. Except S/W B, other SQ software indicated 1 asper for the reference sound within 6% relative error that is smaller than the just-noticeable difference for roughness, 17% [8]. In Fig. 6, however, it was investigated that the distributions of specific roughness on the reference sound obtained from the SQ software are completely different except that the maximum specific roughness is located at near 8.5 Bark which modulation occurs. This difference seems to be due to the use of distinct roughness model. In addition, it is shown in Fig. 6 that the specific roughness obtained from S/W D has two local maximum points at 8.5 and 10.5 Bark although there are no components near 10.5 Bark coincident with 1,370 Hz. The unexpected peak at 10.5 Bark is due to auditory filter shape [21,22] used for the roughness calculation in S/W D. Considering the procedure to deriving the shape of excitation pattern using the concept of auditory filter [9], some critical bands above 8.5 Bark can have excitation levels which contribute specific roughness of the critical bands. Second, AM tones were employed in order to compare the roughness on sounds with different amplitude modulation index, also called modulation depth, The AM tones have the same center frequency: 1 khz, modulation frequency: 70 Hz, and SPL: 60 db. Figure 7 shows the calculation results by all SQ software and Zwicker s empirical data for the AM tones [8]. The roughness obtained from the SQ software except S/W B had a similar increasing pattern to Zwicker s empirical data as increasing the modulation depth. In the viewpoint of value, however, there were the cases that the relative errors between calculated roughness and empirical data were larger than the just-noticeable difference for roughness, 17%. For example, at the modulation depth, 0.65, the relative errors between the roughness from S/W A, C, and D and the empirical data were 18.2%, 25.0%, and 20.7%, respectively. Finally, the variation of roughness on AM tone according to the modulation frequency was compared. AM tones used for the comparison have the same center frequency: 1 khz, SPL: 60 db, and modulation depth: 100%. Figure 8 shows the calculation results by all SQ software and Zwicker s empirical data for the AM tones. Like Fig. 7, the roughness obtained from the SQ software except S/W B had a similar pattern to Zwicker s empirical data whereas there were considerable relative errors Fig. 8 Comparison of calculated roughness values and Zwicker s empirical data for the AM tones having 1 khz center frequency, 60 db SPL, and 100% modulation depth as a function of modulation frequency. 225

Fig. 9 Comparisons of the distribution of specific roughness of the AM tone with modulation frequency of (a) 50 and (b) 200 Hz according to the SQ software. between two data. In Fig. 9, it is also shown that the distributions of specific roughness are completely different with one another. S/W B had incorrect results for all roughness calculation. It had no reliability in reference check and was not coincident with generally known experimental data. Based on these results, S/W B seems to be necessary to check the calculation algorithm for roughness. 5. FLUCTUATION STRENGTH People can easily indicate the amplitude or frequency variation of a sound with low modulation frequency below about 20 Hz. Fluctuation strength is a sensation quantity related to the extent that a sound is fluctuate and its unit is vacil. Since a model for the calculation of fluctuation strength was suggested by Fastl [8,24], researches on fluctuation strength have been progressed. Like the roughness, fluctuation strength is heavily influenced by modulation frequency and modulation index and it has the maximum value for a sound with modulation frequency of 4 Hz. First, the reference sound producing 1 vacil, a 60 db, 1 khz tone that is 100% modulated in amplitude at a modulation frequency of 4 Hz [8], was applied to four kinds of SQ software for the basic reliability test. Here, S/W D employed an algorithm that the roughness model of S/W D is simply modified for the calculation of fluctuation strength [7]. Table 5 summarizes the calculation results according to the SQ software. Fluctuation strength by S/W A is too small whereas those by S/W B and D are too large compared with 1 vacil for the reference sound. As a whole, the amount of error is larger than those in the cases of Table 5 Fluctuation strength (FL) on the reference sound producing 1 vacil according to SQ software. S/W A B C D FL (vacil) 0.78 1.15 1.03 1.08 Fig. 10 Comparison of the distributions of specific fluctuation strength of the reference sound producing 1 vacil according to the SQ software. other SQ metrics. Figure 10 shows that the distributions of specific fluctuation strength on the reference sound according to the SQ software are different from one another. From this result, all SQ software look like to have used a distinct model or a different weighting function for fluctuation strength as a function of ciritical band rate although exact evidence can not be brought up becasue concrete models used in the commercial SQ software are unknown. Second, in order to compare the fluctuation strength on sounds with different modulation frequency, AM tones were also employed. The AM tones have the same center frequency: 1 khz, SPL: 70 db, and modulation depth: 98%. Figure 11 is to compares the calculation results by all SQ software with Zwicker s empirical data [8] for the AM tones. The fluctuation strength obtained from all SQ software had a similar pattern to Zwicker s empirical data and especially, the results are coincident with the experimental data that the fluctuation strength has the maximum value at the modulation frequency of 4 Hz. In the viewpoint of absolute value, however, there were considerable differences between calculated fluctuation strength and 226

S.-H. SHIN: COMPARISON OF SQ SOFTWARE Fig. 11 Comparison of calculated fluctuation strength and Zwicker s empirical data for the AM tones having 1 khz center frequency, 60 db SPL, and 98% modulation depth as a function of modulation frequency. Fig. 12 Comparison of calculated fluctuation strength and Zwicker s empirical data for the AM-BBN having 60 db SPL and 4 Hz modulation frequency. Fig. 13 Comparisons of the distributions of specific fluctuation strength on AM-BBN with modulation depth of (a) 0.5 and (b) 1.0 according to the SQ software. empirical data. In addition, the fluctuation strength obtained from S/W A was lower than others at all modulation frequencies. Finally, in order to compare the fluctuation strength on BBN with different modulation depth, AM pink noises were employed. The AM pink noises have the same SPL: 60 db and modulation frequency: 4 Hz. Figure 12 is to compare the calculated fluctuation strength with Zwicker s empirical data for the AM-BBN. From this figure, it was investigated that the fluctuation strength on BBN were definitely different. Compared with the empirical data, the calculation results obtained from S/W B and D are too large, on the contrary, those from S/W A and C are too small. These differences were rarely found in the comparisons on other SQ metrics. Also, the distributions of specific fluctuation strength according to the SQ software were totally different in pattern and value like shown in Fig. 13. 6. CONCLUSIONS The purpose of this study was to compare the calculation results on the specific sounds according to the SQ software not to find a SQ software having better performance. Consequently, following outputs were obtained: (1) There were differences among the calculated results by the SQ software in all comparisons except the case related to the reference sound for loudness. (2) According to the SQ software, the distributions of SQ metric as a function of critical band rate such as specific loudness were also completely different with one another. The reason seems to be because the SQ software has used a different algorithm and weighting function. (3) The differences were remarkably observed in the comparisons of roughness and fluctuation strength compared with the cases of loudness and sharpness. Especially, for S/W B, its roughness algorithm looked incorrect. Actually, in spite of these differences, it is very difficult to say which SQ software is correct because calculation methods of SQ metrics except loudness for steady signal have not been standardized yet. Therefore, it is necessary to standardize algorithms for SQ metrics as soon as possible before everything else. Fortunately, new standards for the loudness of instantaneous sound and the sharpness will be published by German institute for standardization (DIN) 227

[25]. In addition, on the current situation, it is demanded to mention accurately the SQ software employed when an index that can predict or evaluate the SQ of product noise is developed or SQ database for a kind of product is created. ACKNOWLEDGEMENTS S.-H. Shin would like to thank Prof. Hashimoto of Seikei Univ. and Prof. Ih of KAIST for their helpful comments. This research was financially supported by the Promotion of Science for Private Schools of Japan through the High-tech Research Center Project. REFERENCES [1] ISO 532, Acoustics-method for calculating loudness level (1975). [2] H. Fastl, Calibration signal for meters of loudness, sharpness, fluctuation strength, and roughness, Proc. InterNoise 93, pp. 1257 1260 (1993). [3] H. Fastl and W. Schmid, Comparison of loudness analysis systems, Proc. InterNoise 97, pp. 981 986 (1997). [4] H. Fastl, Psychoacoustics and Sound Quality Metrics, Proc. Sound Quality Symp. 98, pp. 3 10 (1998). [5] S. S. Stevens, Perceived level of noise by mark VII and decibels, J. Acoust. Soc. Am., 51, 575 601 (1971). [6] B. C. J. Moore and B. R. Glasberg, A revision of Zwicker s loudness model, Acustica, 82, 335 345 (1996). [7] H. Jeong, Sound Quality Analysis of Non-Stationary Acoustic Signal, Ph. D. Thesis, Department of Mechanical Engineering, KAIST, Korea (1999) (in Korean). [8] E. Zwicker and H. Fastl, Psychoacoustics, Facts and Models (Springer, New York, 1999). [9] B. C. J. Moore, An Introduction to the Psychology of Hearing (Academic Press, New York, 2003). [10] U. Widmann, R. Lippold and H. Fastl, A computer program simulating post-masking for applications in sound analysis systems, Proc. Noise-Con 98, pp. 451 456 (1998). [11] E. Zwicker and H. Fastl, A portable loudness meters based on ISO 532B, Proc. 11th ICA, Vol. 8, pp. 135 137 (1983). [12] E. Zwicker, K. Deuter and W. Peisl, Loudness meters based on ISO532B with large dynamic range, Proc. InterNoise 85, Vol. I, pp. 1119 1122 (1985). [13] Anon., Noise Analysis and Sound Processing Introduction Manual (Neutrik Cortex Bertriebs GmbH, 1995). [14] M. Blommer, N. Otto, G. Wakefield, B. J. Feng and C. Jones, Calculating the loudness of impulsive sounds, SAE Noise and Vibration Conf., 951311 (1995). [15] v. Bismarck, Sharpness as an attribute of the timbre of steady sounds, Acustica, 30, 159 172 (1974). [16] W. Aures, The sensory euphony as a function of auditory sensation, Acustica, 58, 282 290 (1985). [17] W. Aures, A model for calculating the sensory euphony of various sounds, Acustica, 59, 130 141 (1985). [18] E. Terhardt, On the acoustic roughness and fluctuation strength, Acustica, 20, 215 224 (1968). [19] E. Terhardt, On the perception of periodic sound fluctuation (roughness), Acustica, 30, 201 213 (1974). [20] S. Kemp, Roughness of freqeuncy-modulated tones, Acustica, 50, 126 133 (1982). [21] W. Aures, A procedure for calculating auditory roughness, Acustica, 58, 268 281 (1985). [22] P. Daniel and R. Weber, Psychoacoustic roughness: Implementation of an optimized model, Acustica, 83, 113 123 (1997). [23] U. Widmann and H. Fastl, Calculating roughness using timevarying specific loudness spectra, Proc. Sound Quality Symp. 98, pp. 55 60 (1998). [24] H. Fastl, Fluctuation strength and temporal masking pattern of amplitude-modulated broad-band noise, Hear. Res., 48, 59 69 (1982). [25] H. Fastl, Psychoacoustics, sound quality and music, Inter- Noise 07, No. 003 (2007). Sung-Hwan Shin received the M.S. and Ph. D. degrees in mechanical engineering from the Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 1999 and 2004, respectively. He has worked as a postdoctoral fellow at Seikei University, Tokyo, Japan. His research interests include sound quality analysis, psychoacoustics, signal processing and pattern recognition, sound and vibration control, and room acoustics. Dr. Shin is a member of Acoustical Society of Korea (ASK), Korean Society of Noise and Vibration Engineering (KSNVE), Acoustical Society of Japan (ASJ), and Society of Automotive Engineers of Japan (JSAE). 228