Automatic Long-Term Loudness and Dynamics Matching

Automatic Long-Term Louness an Dynamics Matching Earl ickers Creative Avance Technology Center Scotts alley, CA, USA earlv@atc.creative.com ABSTRACT Traitional auio level control evices, such as automatic gain controls (AGCs) an compressors, generally have little or no avance knowlege of the ynamic characteristics of the remainer of the current auio program. If such avance knowlege is available (i.e., if auio files can be pre-analyze), it becomes possible to match esire values of overall louness an ynamics. We introuce two new measures, long-term louness matching level an ynamic sprea, an present new methos for long-term louness an ynamics matching. INTRODUCTION Louness is a subjective measure relating to the physical soun pressure level (SPL) as perceive by the human ear. A number of evices have been create for controlling auio levels to moify either a signal s louness or its ynamic change in louness. Automatic Gain Controls (AGCs) are typically use to imize louness ifferences between auio programs (for example, between one song an the next). Compressors are similar to AGCs but operate on a faster time scale; they are primarily intene to imize the louness changes within a single song or auio program [1, 2]. Compressors have a number of uses, incluing increasing the louness of the softer parts of an auio program so they can be hear above the noise floor (e.g., for automotive listening), ecreasing the louness of the louest segments (for example, to avoi isturbing neighbors uring late-night listening), an keeping signal levels within technical limits require for raio broacast. Compressors an AGCs typically operate in real-time with little or no avance knowlege of the contents of the remainer of the current auio program. It seems likely that if we ha aitional information about the ynamic characteristics of the auio program as a whole, we coul o a better job of matching a esire louness or ynamic behavior. Since music ata is often store in soun files on computer har rives, we are in a position to generate an use louness metaata in orer to improve performance an reuce artifacts. In this paper, we present a metho for matching the louness of an entire song or soun file to a esire level using a novel measure, long-term louness matching level. In aition, we present a compressor that analyzes the ynamic characteristics of a soun file an matches the output to a esire statistical behavior, using a new measure calle ynamic sprea. This prevents over-compressing auio that alreay has limite ynamics.

One sie effect of ynamic compression is that it can alter the overall louness in a way that may vary from one recoring to the next, making it ifficult to perform post-compressor louness matching if the compression is one in real-time. Therefore we present a metho for estimating the effect of any given compressor settings on a particular soun file, so we can automatically compensate by scaling the gain accoringly. 1 LONG-TERM LOUDNESS MATCHING Normalization is a way of matching the levels of multiple soun files by scaling each one to the maximum extent possible without clipping. Unlike traitional compressors an AGCs, which operate in real-time with imal look-ahea capability, normalization operates on a soun file as a whole, applying a single gain to the overall signal. By exaing the entire soun file in avance, the normalizer is able to scale the auio without making any (possibly unwante) gain ajustments uring playback. Unfortunately, there is no guarantee that two normalize soun files will soun equally lou. The peak amplitue of a song is not a very robust measure of its louness. What we actually want is to normalize the perceive louness, not the peak amplitue. While a number of attempts have been mae to efine an quantify the louness of a single, short-uration tone [3-5], there is little agreement as to how to combine a series of short-term louness values to efine the louness of an extene, ynamically changing signal such as an entire song. I. Allen, in an analysis of the louness of movie sountracks [6], etere that the equivalent louness, 2 T Leq m = 1 log 1 1 P( t) T P t (where P is the soun pressure level an the subscript m refers to the type of frequency weighting, for example, the A equallouness curve), yiels a goo match to the relative subjective louness of various sountracks. Allen conclue that the L eq is better than a (C-weighte, fast) peak level measurement for etering subjective louness. The software program Soun Forge [7] inclues a similar louness efinition as an optional part of their normalization process, using the average RMS effective amplitue instea of the average soun pressure level. 1.1 Emphasizing Louer Frames Zwicker an Fastl propose perforg a critical-ban analysis an calculating the percentage of time for which a given louness is reache or exceee [8]. They presente evience suggesting that the louness of a ynamically changing soun can be wellcharacterize by its N 5 louness; i.e., the level which only 5% of auio frames will reach. Neoran an Shashoua propose a filterbank approach to louness epenent normalization [9]. After creating a histogram from a sequence of louness level estimates, they etere a single louness number by taking the integral of the P% highest histogram levels, where P may be set to aroun 2%. The implicit assumption seems to be that the louer segments will more heavily influence human jugment of long-term louness. (Note that we speak of juging the louness, not the softness, of a song. A bias towar the level of the louer segments may be reflecte in our language.) Personal experience suggests that if we were to try to match the louness of two songs having very ifferent ynamic ranges, sole reliance upon either the louest or the average frame levels might result in a mismatch of perceive louness. For example, in Figure 1, column 1 represents a signal with a wie ynamic range of 8 B (from -8 to B), while the other columns have only a 2 B range. Signals 1 an 2 may have the same average frame louness, but the song represente by column 1 will probably be perceive as louer, since louer segments appear to have a stronger influence on our perception of overall louness. On the other han, signals 1 an 3 have the same maximum louness, but the auio of column 3 may be perceive as louer because the signal is generally stronger than that represente by column 1. Range of B relative to full-scale - -2-4 -6-8 -1 1 2 3 Signal # Figure 1. Long-term Louness Matching by Mean vs. Maximum Level. Signal 2 attempts to match the overall louness of signal 1 using the average frame level. Signal 3 tries to match signal 1 using the maximum frame level. Neither achieves an optimal perceptual match. 2 LONG-TERM LOUDNESS MATCHING LEEL We propose comparing the overall louness of extene soun files using a measure we will refer to as the long-term louness matching level, or LLML. The LLML efines a metho for combining a series of iniviual (per-frame) level estimates, obtaine using any of a variety of methos [5 9]. Our efinition oes not attempt to be compatible with stanar efinitions of louness in sones or louness level in phons; our primary interest is in creating a measure that is easily compute an manipulate an correspons reasonably well with human jugments of the relative louness of extene auio signals. 2.1 Desire Conitions We woul like our efinition of the LLML to satisfy the following conitions: 1. The iniviual (per-frame) level estimates shoul reflect the fact that human hearing is more sensitive to certain frequencies than to others. 2. All non-silent frames shoul contribute to the value of the LLML. 3. Louer frames shoul optionally be weighte more heavily than softer frames, since the louer segments may more heavily influence our jugment of louness. 4. A single emphasis parameter shoul control the amount of aitional weighting given to louer frames. 5. If the optional louness weighting of conition 3 is use, any number of aitional silent frames shoul have no influence on the result. 6. The LLML of a soun file with a constant per-frame level shoul be that level. 7. The LLML of a full-scale square wave shoul be B. 2

8. Scaling the entire soun file by N B shoul result in an N B change in the LLML. 2.2 Hearing Curve Pre-emphasis To satisfy conition 1, we nee to take into account variations in the sensitivity of the ear at ifferent frequencies. One efficient way of perforg this frequency-weighting woul be a simple preemphasis of the signal using an inverte equal-louness curve; for example, the B weighting curve (see Figure 2) that applies to soun playback at moerate levels such as those encountere in home listening situations. This pre-emphasis has the aitional avantage of making the result relatively insensitive to DC offsets. Relative response (B) -5-1 -15-2 -25-3 -35-4 -45-5 1 1 1 1 1 Figure 2. B frequency weighting curve. The B curve pre-emphasis is only a rough approximation, not only because we may not know the actual playback level, but also because the equal louness contours constitute a family of curves an cannot be uplicate using a single linear filter. Nevertheless, this approximation can be useful. In situations where the computational power is quite limite, we coul pre-filter the signal using a simplifie approximation of the low en of the B weighting curve. Since most musical signals have much more energy in the bass an mi-range than in the high frequencies, the highs will ten to have imal impact on the final louness estimation. For example, for a 48 khz signal, we coul use a 2 Hz first-orer Butterworth high-pass filter, x filt ( n) =.9871x( n).9871x( n 1) +.9742x filt ( n 1). 2.3 Level Extraction We then extract a smoothe level from the pre-emphasize signal, using a level etector such as root-mean-square (RMS) [1]. The RMS level etector is preferre over the often use smoothe fullwave rectifier because it reuces the number of higher-orer harmonics an the possibility of alias components foling back into the auible range [11, 12]. In a iscrete-time system, the subsample effective amplitue rms can be calculate using a running-average filter, as follows: Frequency (Hz) ( i) = ( Ni + N 1), rms ms where ms ( n) = c ms 1 τ F c = e s, an ( n 1) + ( 1 c) x 2 filt ( n), ms is a running average of the power of the equal-lounessfiltere signal, N is the number of samples per frame, τ is the RMS time constant (for example, 35 ms.), Fs is the sampling frequency an c is the smoothing coefficient. To satisfy conition 7, B is efine as a ratio to the RMS amplitue of a full-scale square wave (i.e., unity), converte to B: ( ( )) B ( i) = 2log1 rms i. The erivation of B coul be replace with any reasonable metho of obtaining iniviual per-frame level estimates. For example, the output of a smoothe Hilbert envelope [13] coul be use in place of rms to further reuce unwante ripple an higher-orer harmonics. 2.4 Weighte Average Calculation The LLML is obtaine by taking a weighte average of the iniviual level estimates: L M 1 = w( i) i= B ( i) where L is our long-term louness matching level in B an M is the number of frames in the file. To satisfy conitions 2 through 5, we efine the weighting function w as follows: ( i) u ( i ) = k B, < k 1 w ( i) = M 1 u( i) j= u( j) Thus, the weight applie to each iniviual B measurement is an emphasis parameter k raise to the negative B (i) power, normalize so that the sum of the weights is unity. If k = 1, the LLML becomes a simple average of the iniviual B measurements. As k approaches, the LLML approaches the level of the louest single iniviual frame. At intermeiate values, for example, k =.85, the LLML gives a somewhat greater emphasis to the louer frames, as esire. As long as k < 1, any number of aitional silent frames will have no effect on the result (in compliance with conition 5), but all non-silent frames will be represente (as require by conition 2). Preliary tests have shown this efinition of LLML to be useful for louness matching of songs. As esire, scaling the entire soun file yiels the expecte overall change in B. Aitional listening tests may reveal the optimal value of k to best moel the way humans juge louness ifferences between extene soun recorings., 3

2.5 Gain Calculation To perform louness matching, we etere the amount of gain neee to convert the analyze LLML to the target level. If no compression is applie, the esire gain is simply: L t L a g = 1 2, Output(B) 2. Fixe compression ratio regarless of whether the signal is alreay squashe 3. Fixe post-gain regarless of signal louness where L t is the target LLML an L a is the analyze LLML. If L t is set to a high level, a peak limiter may be helpful to avoi the possibility of clipping. Alternatively, we can prevent clipping by constraining the gain such that g max, peak where max is the full-scale amplitue an peak is the instantaneous peak amplitue of the signal. 3 COMPRESSION While normalizers are use to ajust the overall louness of an entire song or soun file, compressors are use to reuce the amount of louness variation within a song. Figure 3 shows a compressor block iagram [2], iffering from the typical compressor only in the aition of an equal-louness hearing curve filter. This filter is useful for ensuring that the compressor oes not overreact to the beats of music containing heavy bass content, causing pumping of the mirange vocals or excessive attenuation of the bass. Figure 4 epicts a typical compressor transfer function (or characteristic). The input signal level (along the x-axis) maps to an output level (along the y-axis). In the example shown, low signal levels will be unchange ue to the 45º-angle line segment, while signal levels above the breakpoint will be attenuate. 3.1 Three Problems with Traitional Compressors Figure 4 also illustrates some problems inherent in compressors lacking avance knowlege of the signals they are about to process. First, if an input signal has been normalize, a great eal of compression will be applie, whereas if the same signal is attenuate in avance of the compressor, it may receive no compression at all. We woul like the ynamic range at the output of the compressor to be inepenent of whatever scaling may previously have been applie. 1. Input (B) No compression Lots of compression will be applie will be applie Figure 4. Compressor Transfer Function, illustrating three problem areas. Seconly, without human intervention, traitional compressors use the same compression curve for each song, even if that song has alreay been squashe like a roa-kill possum. This, of course, is because the compressor has no avance knowlege of the song s overall statistics. Thirly, traitional compressors use a fixe (i.e., often incorrect) post-gain in an attempt to compensate for the attenuation ue to the transfer function. The correct post-gain epens not only on the compression curve, but also on the louness of the input signal an the etails of how its ynamics line up with the compression curve. 4 DYNAMIC SPREAD Compressors are popularly sai to reuce the ynamic range of an auio signal, though the term ynamic range is also use to refer to the ifference between the peak signal level an the noise floor or imum signal level. Even if we calculate the ynamic range base on a per-frame B evaluation (for example, the ifference between the B values of the louest an softest frames) instea of per-sample, the result still tells us little about the istribution of ynamics. The literature is overue for a term that better escribes what compressors are Z -n X X Hearing Curve RMS Log Transfer Function Log -1 Gain Smoothing Post- Compression Gain Figure 3. Compressor Block Diagram. intene to reuce. In this paper, we will use the term ynamic sprea. 4

Range is one measure of the sprea of a ata set an is efine by the istance between the largest an smallest measurements. Because it is base on only two measurements, range is not always the most useful or robust measure of sprea. 4.1 Desire Conitions We woul like our efinition of ynamic sprea to satisfy the following conitions: 1. It shoul be unaffecte by a simple gain scaling. 2. If we were to scale all the istances from the mean by the same amount (essentially, compressing or expaning the ynamics aroun a central louness), the sprea shoul be scale by the same factor. 3. For robustness, the ynamic sprea shoul be base on substantially all of the per-frame B values. One way to satisfy these conitions woul be to erive the ynamic sprea from the generalize eviation of the per-frame B values: = 1 M M 1 i= B ( i) p 1 where is the ynamic sprea an is either the mean or the meian of B. The power p provies control over the relative emphasis given to outliers. While represents the central per-frame level, the ynamic sprea relates to how closely the values are clustere about that center; i.e., how squashe the auio is, either naturally (as in the case of the solo bassoon, with a ynamic range of perhaps 1 B), or as a result of earlier compression. If p = 2 an is the mean of B, we obtain the stanar eviation (also calle the root-mean-square eviation). However, because this equation squares the istance from the mean, it tens to over-emphasize the extremes of the B array. Our preferre efinition of ynamic sprea sets p = 1, which yiels the mean absolute eviation, = M 1 M 1 i= B ( i) This simply computes the average istance from the central perframe level. Because it oesn t square the istance, this equation is less sensitive to outliers such as the level of the noise floor. We o not want a small amount of tape hiss to have a large effect on the ynamic sprea. The perceptual correlate to a signal s ynamic sprea coul be referre to as its louness sprea; this is the quantity the compressor is ultimately intene to reuce. 4.2 B Histogram Both the LLML an the ynamic sprea can be approximate from a B histogram (or statistical frequency function), such as the ones illustrate in the first column of Figure 5. The avantage is a. p large reuction in the amount of metaata require. If B is quantize to, say, 1 B increments, an array containing the number of frames at each useful B level woul only require about 1 values, regarless of the length of the song. In Matlab [14], the coe for the B histogram algorithm woul be as shown in Listing 1 (using positive array inices to represent negative B histogram bins): Hist=zeros(1,1); % Allocate array for i=1:1 % for each frame Bin = -roun(b(i)); if Bin < 1, Bin = 1; en; if Bin > 1, Bin = 1; en; Hist[Bin] = Hist[Bin] + 1; En Listing 1. Creating B histogram. 5 DYNAMIC SPREAD MATCHING Dynamic sprea matching is a way of ensuring that compression results in auio files with similar istributions of ynamics. We want uniformity of the results, not of the process. With traitional compressors [1, 2], once the parameter settings are selecte, the same compression curve is applie to every song regarless of its original ynamics an scaling. This, again, is because the compressor oes not know the original ynamics istribution in avance. If we can perform a pre-analysis of the auio ata, this will no longer be the case. Instea of blinly applying the same amount of compression to every song, we can match each song to a esire ynamic sprea. If a song is alreay heavily compresse, it woul be riiculous to compress it further. However, if the next song is, say, a piece of classical music with a wie ynamic range, we may want to apply a suitable amount of compression. By calculating the original ynamic sprea an comparing that to a esire ynamic sprea, we can intelligently apply whatever compression is neee. 5.1 Using a Single Line Segment The simplest way of achieving the esire result is to use a single line segment as our compressor s transfer function, as shown in Figure 6, with a slope etere by S = where S is the slope, a is the analyze ynamic sprea of our soun file an is our esire ynamic sprea. (Note that in some articles [1, 2], the term slope is use to refer to the negative slope of the gain curve, or 1 1/R, where R is the compression ratio. In this paper, slope simply means the inverse of the compression ratio; in other wors, the slope of the transfer function line segment.) Slopes in the range S < 1 prouce compression. If the esire ynamic sprea is wier than that of the original, we may prefer to leave the original as is, rather than to apply expansion. If so, S can be limite to a maximum value of 1. a, 5

# frames 15 1 5 Mozart B Histogram -2 B -4-6 Mozart Dynamics Profile # frames -1-8 -6-4 -2 B (B) 3 Tull B Histogram 2 1 tape hiss, circa 1973-8 2 4 6 8 1 Percentile Tull Dynamics Profile -2 B -4-6 -1-8 -6-4 -2 B (B) 8 Hole B Histogram 6 # frames 4 2-1 -8-6 -4-2 B (B) -8 2 4 6 8 1 Percentile Hole Dynamics Profile -2 B -4-6 -8 2 4 6 8 1 Percentile Figure 5. B Histograms an Dynamics Profiles. Note how ifferently a 2 B compressor threshol woul affect the first an thir recorings. Compressor transfer function -2 Y (B) -4-6 -8-1 -1-9 -8-7 -6-5 -4-3 -2-1 X [B] 3 Compressor gain 2 1 Gain [B] -1-2 -3-1 -9-8 -7-6 -5-4 -3-2 -1 X [B] Figure 6. Compressor transfer function an gain, single line segment. Note that the compressor gain keeps increasing as the input signal level ecreases. While the use of a single line segment yiels the esire ynamic sprea, it has an unfortunate effect on the signal-to-noise ratio. As we see from Figure 6, as the input signal level ecreases, the compressor gain keeps increasing. The result is a very noisy compressor, because we re applying a large amount of gain to those frames that alreay have the worst signal-to-noise ratio. 5.2 Using Multiple Line Segments Therefore, we will probably want to use a traitional multisegment compressor transfer function, which typically has a constant-gain region below the compressor threshol (as seen in Figure 4). 6

5.2.1 Specifying the Threshols If our characteristic uses two or more piecewise-linear segments, we will want to specify certain threshol levels in avance. For example, we will want to specify the location of the compressor threshol, an we may want an expaner segment below a noise gate threshol to suppress low-level signals an imize noise. This poses a subtle yet serious problem: how can we specify a threshol so it will behave similarly with any soun file? We have seen that if we specify the threshol as an absolute position (for example, a certain number of B below full-scale), the resulting transfer function will affect soun files ifferently epening on how they have been scale. Since we want our compressor to achieve similar results regarless of scaling, this is clearly not the esire effect. A common approach (in raio stations, for example) is to precee the compressor with an automatic gain control. This requires increase computational expense, an the aitional layer of compression may increase istortion an compoun transient problems such as overshoots. Another solution woul be to perform a pre-compressor (x-axis) louness normalization step, possibly incluing a limiter, in aition to our post-compressor (yaxis) normalization. This is cumbersome an inefficient. A much better solution woul be to specify the threshol locations in a signal-inepenent way an then translate those specifications into signal-epenent breakpoints. For example, one might specify the threshols in terms of multiples of the ynamic sprea above or below the mean value, B. For example, we may want our noise gate threshol to be locate at B - 1.5, while our compressor threshol is at B + 1.1. Unfortunately, if our ynamics istribution is significantly skewe (for example, by a large amount of tape hiss), the resulting breakpoint might en up being outsie the range of the iniviual B measurements. A more robust metho woul be to specify the compressor threshol as a percentile. We can o this using the ynamics profile. 5.2.2 Dynamics Profile The ynamics profile, illustrate on the right sie of Figure 5, is a cumulative relative frequency plot with the x- an y-axes switche. This provies an overview of the statistical (but not temporal) istribution of ynamics within a song. For a given percentile value P on the x-axis, the ynamics profile gives us a B value on the y-axis, such that P% of the frames in the song are softer or equal in louness to. The conceptually simplest way to view the ynamics profile is to sort the original B array in orer of increasing level, then relabel the x-axis to isplay a range from the th to the 1 th percentile. A more computationally efficient metho of calculating the ynamics profile, not requiring a large sort operation, is to erive it irectly from the B histogram. In Matlab, this coul be one as shown in Listing 2: % Allocate arrays BsPercent = zeros(1, 1); relfreqspercent = zeros(1, 1); inx = 11; sum = ; % Start at -1 B for i=1:1 % for each percentile % Fin the lowest B level whose % relative frequency excees this % percentile. while sum <.1*i inx = inx - 1; if inx < 1 inx = 1; break; en; sum = sum + relfreqsb(inx); en BsPercent(i) = -inx; en Listing 2. Deriving ynamics profile from the B histogram. 5.2.3 Using the Dynamics Profile Figure 5 shows the B histograms an ynamics profiles for three ifferent soun files an helps illustrate the problem with using a fixe compressor threshol. If we were to apply the same compression settings to each of these soun recorings, using a fixe threshol at -2 B, the compressor woul have almost no effect on the first signal (which has the most nee for compression of any of the three). The thir signal, which has relatively little louness variation to begin with, woul receive a great eal of compression. To solve this problem, we propose specifying the compressor threshol as a percentile on the ynamics profile. For example, we may want to put the noise gate threshol at the 5 th percentile, while the compressor threshol is place at the 6 th percentile. This metho guarantees that our breakpoints will never be outsie the range of the per-frame B ata, while automatically aapting the breakpoint locations to the ynamics of the auio ata, regarless of how the auio may have been scale or compresse. In Figure 7, we can see how our use of the percentile omain helps to normalize ifferences in louness an ynamic sprea between soun files. The B histograms (frames vs. B) an ynamics profiles (B vs. percentile) from Figure 5 have been combine into a single plot of frames vs. percentile, with all three ata sets superimpose. Note the greatly improve similarity between the histograms in Figure 7 compare to their counterparts in Figure 5. By specifying the threshols as a percentile, we achieve substantial inepenence from song-to-song variations in scaling an ynamics. 5.2.4 Detering the Slope Once we specify the x-axis breakpoint locations, the next step is to etere the line segment slopes neee to yiel the esire ynamic sprea. A brute-force metho woul be simply to choose a slope, perform the actual compression on the entire song, measure the resulting ynamic sprea an ajust the slope as neee. This is inefficient. A better metho is to preict the statistical results of the compression process. Assume for the moment that our characteristic has two line segments: a stationary 45º segment below the threshol an a compressor segment whose slope is to be etere. Figure 8 illustrates how we can estimate the effect of an arbitrary compressor characteristic on the ynamics profile of an arbitrary soun file, simply by applying the static transfer 7

function irectly to the ynamics profile to create a new ynamics profile. In Matlab, this is one as shown in Listing 3. 1.9.8.7 # frames, normalize.6.5.4.3.2 Mozart Tull Hole.1 1 2 3 4 5 6 7 8 9 1 Percentile Figure 7. Normalizing histograms using percentiles. Original ynamics profile Compressor transfer function Resultant ynamics profile -2-2 -2-4 B -4 B -4 B -6-6 -6-8 -8-8 -1 5 1 Percentile -1-1 -5 B -1 5 1 Percentile Figure 8. Application of the transfer function. The compressor s static transfer function is applie to the ynamics profile of the original soun file to yiel the approximate ynamics profile of the resultant soun file. % Allocate array newrelfreqsb = zeros(1, 1); % The inices of xferfcn represent % input levels in negative B; the % array contents represent the % corresponing output levels. for i = 1:1 % for each orig. -B newdblevel = roun(xferfcn(i)); en newrelfreqsb(-newdblevel) =... newrelfreqsb(-newdblevel)... + relativefreqsb(i); Listing 3. Applying transfer function to ynamics profile. If we initially set the slope of the compressor segment to º (horizontal), we can apply this static transfer function to the original ynamics profile an obtain an approximation of the ynamics profile that woul result from this extreme compression (essentially, limiting). By analyzing the resulting ynamics profile, we can obtain an estimate of the ynamic sprea that woul be obtaine if we were to perform the actual compression. 8

Next, we interpolate between this estimate ynamic sprea an the ynamic sprea of the original signal (which can be viewe as being compresse with unity slope; i.e., unchange) to estimate the slope that will yiel the esire ynamic sprea. Assug an approximately linear relationship between changes in slope an changes in ynamics sprea, we fin: S S max S S = max where S is the esire slope, S is the imum slope (here, ), S max is the maximum slope (or 1), is the esire ynamic sprea, max is the original ynamic sprea (at unity slope), an is the ynamic sprea obtaine from applying compression with the compressor segment at imum slope. (A imum slope greater than zero might be esire in orer to imize etrimental sonic effects from extreme compression ratios.) Solving for S, we obtain: S = S + (1 S ) max,. If, after applying our new compressor curve to the original ynamics profile, the ynamic sprea of the resulting ynamics profile is not sufficiently close to the esire value, we can iterate the interpolation process until we reach the esire precision. Similar processes can be evise in case there are aitional line segments. 5.3 Temporal Behavior The process of applying the static compressor curve irectly to the ynamics profile oes not take into account the compressor s temporal attack an release characteristics. Note that if our compressor were to use instantaneous attack an release times, relying solely upon the level etector for its smoothing, the estimate ynamics profile shoul match the actual result of the compression. Given sufficiently fast attack an release times (several hunre ms or less), the use of the static compressor curve in obtaining our ynamics profile estimate oes not appear to cause significant skewing of the estimate over the course of an entire soun file. This might pose a larger problem for automatic gain controls, ue to their slower time constants. 6 POST-COMPRESSOR LOUDNESS COMPENSATION Dynamic compression changes the overall louness of a soun file in a signal-epenent way. Traitional compressors try to compensate for their attenuation by applying a fixe postcompressor gain, but this is often too much or too little, epening on the song. Without perforg some sort of statistical analysis, we on t know in avance exactly how an arbitrary compression curve will affect the overall louness of an arbitrary soun file, even if we know its original louness, because the result epens on the exact istribution of the soun file s energy an how that lines up against the compression curve. If we want to apply louness matching at the output of the compressor in real-time, we nee a way of estimating the compressor s effect on a particular soun file. We o this by again using the technique illustrate in Figure 8. By applying the compressor s static transfer function irectly to the original ynamics profile, we obtain an estimate of the resulting ynamics profile. We then apply our LLML analysis process to the new ynamics profile to preict the LLML of the song after the compressor. This in turn reveals the amount of post-gain neee for louness matching. The louness normalization is calculate immeiately before playback an then applie in the post-compression gain block shown in Figure 3. The equal-louness filter use in the compressor shoul match the one use to generate the B ata uring the song analysis phase. 7 SYSTEM OERIEW A block iagram of the overall system, ivie into song analysis, pre-playback, an playback phases, is shown in Figure 9. Figure 1 gives an overview of the song analysis (metaata generation) phase. A block iagram of the pre-playback (compressor parameter generation) phase is shown in Figure 11. 8 CONCLUSION We have presente a metho for normalizing the louness of a soun file by comparing its long-term louness matching level (LLML) to a esire target value. The LLML might provie a useful alternative to having a human operator attempt to match sountrack levels for motion pictures; for example, it might provie an automatic way of generating the ialnorm metaata for normalizing ialogue levels in the AC-3 format [15]. The LLML might also be a useful basis for a stanar to normalize the louness of vieo games an other computer applications. Developers coul measure the LLML of a nearly complete game an use that to set the overall louness to an inustry stanar. An avantage of the LLML in this context is its ability to eemphasize the weighting of perios of relative quiet. In aition, we have presente a metho for normalizing the ynamic sprea of soun files, so that the esire compression is obtaine without over-compressing auio that is alreay ynamically challenge. This technique coul possibly be extene to incorporate other common compressor features, such as multi-ban compression, etc. Finally, we have shown how to etere the correct post-gain to match a compressor s output to a esire louness, even though the compresse signal is not yet available, by estimating the compressor s effect on the ynamics of the original signal. The techniques presente here make use of an analysis of the original auio ata. This analysis phase coul take place while auio ata is being rippe from compact isks, uring ownloa from a network, or as a backgroun process. Since we are perforg a statistical analysis, we have foun that it generally suffices to analyze 5 or fewer frames, somewhat ranomly chosen, inepenent of the length of the soun file. If the ynamics profile is quantize to one-percentile increments, the song analysis process results in a very small amount of ata, on the orer of a hunre bytes per soun file. This ata coul easily be store as metaata on CDs or DDs, as siestream ata in streag auio formats, in playlist tables, etc. This metaata can be generate without human intervention an oes not force the playback system to use pre-etere compressor breakpoints or time constants. It is our hope that creators of new auio formats an stanars will give strong consieration to incluing such ata as part of their format efinitions. 9

9 ACKNOWLEDGEMENT Dr. Jean-Marc Jot mae valuable remarks an aske intriguing questions which prompte the current research. He also rea earlier versions of the manuscript an offere useful suggestions. 1 PATENT NOTICE Some of the methos escribe in this paper are the subject of a patent application. 1

Auio Data Song Analysis Song analysis (Metaata Generation) Phase B Hist. Original Dyn. Sprea Desire Compressor Threshol (as percentile) Desire Dynamic Sprea Desire Attack, Release, etc. Desire LLML Pre-Playback Compressor Parameter Generation Immeiately Prior to Playback Transfer Function Attack, Release coeffs Post- Gain Input Auio Signal Compressor Output Auio Signal During Playback Figure 9. Block iagram of the overall system. The song analysis phase is shown in Figure 1, the parameter generation phase in Figure 11, an the compressor in Figure 3. Auio Data Hearing Filter Auio to B Histogram Calculate Dynamic Sprea B Histogram Original Dynamic Sprea Figure 1. Block iagram of the Song Analysis (metaata generation) phase. 11

B Histogram Calculate Dynamics Profile (B vs percentile) Dynamics Profile Desire Compressor Threshol (as percentile) Convert Percentile to B Breakpoint (B) Initial Slope = New Slope Slope Create Transfer Function Transfer Function B Histogram Apply Transfer Function to B Histogram Iterate if neee Calculate New Dynamic Sprea New B Histogram Calculate Estimate LLML Original Dynamic Sprea Desire Dynamic Sprea Interpolate if neee for New Slope Desire LLML Calculate Post-Gain Post-Gain Figure 11. Block iagram of the Compressor Parameter Generation. This phase of processing typically occurs immeiately prior to playback. 11 REFERENCES [1] G. W. McNally, Dynamic Range Control of Digital Auio Signals, J. Auio Eng. Soc., ol. 32, No. 5, May 1984, pp. 316 327. [2] U. Zölzer, Digital Auio Signal Processing, John Wiley & Sons Lt., 1997, pp. 27 219. [3] E. Zwicker, G. Flottorp, an S. S. Stevens, Critical Ban With in Louness Summation, J. Acoust. Soc. Am., vol. 29, no. 5, pp. 548 557, 1957. [4] B. C. J. Moore, B. Glasberg, an T. Baer, A Moel for the Preiction of Threshols, Louness, an Partial Louness, J. Auio Eng. Soc., ol. 45, No. 4, 1997, pp. 224 24. [5] B. C. J. Moore an B. R. Glasberg, A Revision of Zwicker s Louness Moel, Acoustica Acta Acoustica, vol. 82, 1996, pp. 335 345. [6] I. Allen, Are Movies Too Lou? SMPTE Journal, ol. 17, p. 3, Jan. 1998, <http://www.olby.com/tech/tooloup.pf>. [7] Sonic Founry, Inc., Soun Forge (software). [8] Zwicker, E., Fastl, H., Psychoacoustics, Springer-erlag 2 n Eition, 1999. [9] I. Neoran an M. Shashoua, A Perceptive Louness-Sensitive Leveler for Auio Broacasting an Mastering, 15 th Auio Eng. Soc. Convention, Preprint No. 4852, 1998. [1] F. Floru, Attack an Release Time Constants in RMS-Base Feeback Compressors, J. Auio Eng. Soc., ol. 47, No. 1, Oct. 1999, pp. 788 83. [11] A. Bateman, W. Yates, Digital Signal Processing Design, Computer Science Press, 1989, pp. 37 311. [12] P. Kraght, Aliasing in Digital Clippers an Compressors, J. Auio Eng. Soc., ol. 48, No. 11, Nov. 2, pp. 16 165. [13] P. Dutilleux, Filters, Delays, Moulations an Demoulations: A Tutorial, First COST-G6 Workshop on Digital Auio Effects (DAFX98), November 19 21, 1998 [14] The MathWorks, Inc., Matlab (software). [15] Dolby Laboratories, Inc., Dolby Digital Broacast Implementation Guielines, 1998, <http://www.olby.com/tech/bigsc.pf>. 12