Practical Design of Filter Banks for Automatic Music Transcription



Similar documents
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

The Calculation of G rms

Analysis/resynthesis with the short time Fourier transform

The Phase Modulator In NBFM Voice Communication Systems

ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1

B3. Short Time Fourier Transform (STFT)

Onset Detection and Music Transcription for the Irish Tin Whistle

Simulation of Frequency Response Masking Approach for FIR Filter design

Auto-Tuning Using Fourier Coefficients

Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)

SR2000 FREQUENCY MONITOR

Speech Signal Processing: An Overview

RECOMMENDATION ITU-R SM Measuring sideband emissions of T-DAB and DVB-T transmitters for monitoring purposes

Understanding CIC Compensation Filters

Introduction to Digital Audio

Matlab GUI for WFB spectral analysis

Filter Comparison. Match #1: Analog vs. Digital Filters

CTCSS REJECT HIGH PASS FILTERS IN FM RADIO COMMUNICATIONS AN EVALUATION. Virgil Leenerts WØINK 8 June 2008

CIRCUITS LABORATORY EXPERIMENT 3. AC Circuit Analysis

CHAPTER 6 Frequency Response, Bode Plots, and Resonance

SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY

MATRIX TECHNICAL NOTES

Lock - in Amplifier and Applications

RANDOM VIBRATION AN OVERVIEW by Barry Controls, Hopkinton, MA

Aircraft cabin noise synthesis for noise subjective analysis

GSM/EDGE Output RF Spectrum on the V93000 Joe Kelly and Max Seminario, Verigy

The Fundamentals of FFT-Based Audio Measurements in SmaartLive

Agilent Creating Multi-tone Signals With the N7509A Waveform Generation Toolbox. Application Note

Frequency Response of Filters

TCOM 370 NOTES 99-4 BANDWIDTH, FREQUENCY RESPONSE, AND CAPACITY OF COMMUNICATION LINKS

DIGITAL SIGNAL PROCESSING - APPLICATIONS IN MEDICINE

AN-837 APPLICATION NOTE

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

CONDUCTED EMISSION MEASUREMENT OF A CELL PHONE PROCESSOR MODULE

Spectrum Level and Band Level

LAB 7 MOSFET CHARACTERISTICS AND APPLICATIONS

The Effective Number of Bits (ENOB) of my R&S Digital Oscilloscope Technical Paper

A Low-Cost, Single Coupling Capacitor Configuration for Stereo Headphone Amplifiers

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Direct and Reflected: Understanding the Truth with Y-S 3

25. AM radio receiver

Optimizing IP3 and ACPR Measurements

Separation and Classification of Harmonic Sounds for Singing Voice Detection

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Experiment 3: Double Sideband Modulation (DSB)

AM Receiver. Prelab. baseband

L9: Cepstral analysis

Harmonics and Noise in Photovoltaic (PV) Inverter and the Mitigation Strategies

The front end of the receiver performs the frequency translation, channel selection and amplification of the signal.

FFT Spectrum Analyzers

6.025J Medical Device Design Lecture 3: Analog-to-Digital Conversion Prof. Joel L. Dawson

FUNDAMENTALS OF MODERN SPECTRAL ANALYSIS. Matthew T. Hunter, Ph.D.

Audio Tone Control Using The TLC074 Operational Amplifier

Loop Bandwidth and Clock Data Recovery (CDR) in Oscilloscope Measurements. Application Note

Design of FIR Filters

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

Experiment 7: Familiarization with the Network Analyzer

RF System Design and Analysis Software Enhances RF Architectural Planning

CLOCK AND SYNCHRONIZATION IN SYSTEM 6000

Basic Acoustics and Acoustic Filters

Active Vibration Isolation of an Unbalanced Machine Spindle

Jeff Thomas Tom Holmes Terri Hightower. Learn RF Spectrum Analysis Basics

Dithering in Analog-to-digital Conversion

7. Beats. sin( + λ) + sin( λ) = 2 cos(λ) sin( )

DIGITAL-TO-ANALOGUE AND ANALOGUE-TO-DIGITAL CONVERSION

AN Application Note: FCC Regulations for ISM Band Devices: MHz. FCC Regulations for ISM Band Devices: MHz

How to Design 10 khz filter. (Using Butterworth filter design) Application notes. By Vadim Kim

USB 3.0 CDR Model White Paper Revision 0.5

Frequency Response of FIR Filters

Timing Errors and Jitter

First, we show how to use known design specifications to determine filter order and 3dB cut-off

MSB MODULATION DOUBLES CABLE TV CAPACITY Harold R. Walker and Bohdan Stryzak Pegasus Data Systems ( 5/12/06) pegasusdat@aol.com

Generic - Hearing Loop - (AFILS) U.S. System Specification

Agilent AN 1316 Optimizing Spectrum Analyzer Amplitude Accuracy

Sampling Theorem Notes. Recall: That a time sampled signal is like taking a snap shot or picture of signal periodically.

Frequency response. Chapter Introduction

HD Radio FM Transmission System Specifications Rev. F August 24, 2011

DeNoiser Plug-In. for USER S MANUAL

K2 CW Filter Alignment Procedures Using Spectrogram 1 ver. 5 01/17/2002

Adding Sinusoids of the Same Frequency. Additive Synthesis. Spectrum. Music 270a: Modulation

LAB 12: ACTIVE FILTERS

Jitter Measurements in Serial Data Signals

Convention Paper Presented at the 112th Convention 2002 May Munich, Germany

INTRODUCTION. Please read this manual carefully for a through explanation of the Decimator ProRackG and its functions.

Agilent AN 1315 Optimizing RF and Microwave Spectrum Analyzer Dynamic Range. Application Note

RECOMMENDATION ITU-R BS.704 *, ** Characteristics of FM sound broadcasting reference receivers for planning purposes

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Electronic Circuits Spring 2007

DESIGN AND SIMULATION OF TWO CHANNEL QMF FILTER BANK FOR ALMOST PERFECT RECONSTRUCTION

Frequency response: Resonance, Bandwidth, Q factor

POWER SYSTEM HARMONICS. A Reference Guide to Causes, Effects and Corrective Measures AN ALLEN-BRADLEY SERIES OF ISSUES AND ANSWERS

RECOMMENDATION ITU-R BS *,** Audio quality parameters for the performance of a high-quality sound-programme transmission chain

Jitter Transfer Functions in Minutes

Trigonometric functions and sound

RightMark Audio Analyzer 6.0. User s Guide

Audio processing and ALC in the FT-897D

The Fundamentals of Signal Analysis. Application Note 243

Lecture 1-6: Noise and Filters

Transcription:

Practical Design of Filter Banks for Automatic Music Transcription Filipe C. da C. B. Diniz, Luiz W. P. Biscainho, and Sergio L. Netto Federal University of Rio de Janeiro PEE-COPPE & DEL-Poli, POBox 6854, Rio de Janeiro, RJ, Brazil, 2945-972 {filiped, wagner, sergioln}@lps.ufrj.br Abstract This paper aims at exploring practical issues concerning the design of a bounded-q fast filter bank (BQFFB) for automatic music transcription (AMT). Great effort is spent on avoiding hazardous effects in the spectral analysis stage of the AMT application. The result is a complete procedure for effectively designing the BQFFB tool. The analysis tool is then validated through computer experiments.. Introduction Music transcription consists of writing down the score of a musical piece based on its execution. To achieve an effective result, the whole procedure is commonly performed by a well-trained music expert, most probably an experienced musician. Nowadays, great effort is being spent on attempting to automate such a task, forming the research field of automatic music transcription (AMT) [6]. In very general lines, the AMT can be divided into the following stages []:. Time-frequency analysis of the music signal; 2. Note identification; 3. Note onset (beginning) and offset (end) detection; 4. Timbre recognition; 5. Evaluation of results coherence. The foundation of such a system is the spectral analysis stage. One way of implementing it is based on filter banks [9], which separate the signal into narrow frequency bands. For AMT systems, an attractive category of filter bank is the bounded-q fast filter bank (BQFFB) [2]. This tool presents very selective frequency bins, with low interchannel interference, and a non-uniform frequency distribution suitable for the organization of the Western musical scale. The present paper explores the practical issues involved in the implementation of the BQFFB in order to ensure its effectiveness in AMT systems. The text is organized in the following way: Section 2 discusses the desired characteristics of spectral analysis tools for the envisaged application. Emphasis is given to the BQFFB tool, which combines the positive aspects of high channel selectivity, efficient channel distribution in the frequency domain, and reduced implementation cost. Section 3 describes some practical issues of the BQFFB design in an AMT environment, including a channel mapping to properly represent the equal tempered scale. Finally, Section 4 includes some numerical experiments that validate the main contributions of the paper. 2. Spectral Analysis The most basic filter bank for spectral analysis is the sliding fast Fourier transform (sfft), which consists of an FFT applied to consecutive windows of the input signal. The sfft requires only one complex multiplication per output sample [4]. The main drawback of this tool is its inherent low selectivity. Their filters, transformed from low-order kernel filters [8], provide an attenuation of only 3 db between adjacent channels. In an attempt to surpass the FFT selectivity, the so-called fast filter bank (FFB) was proposed in [8]. In the FFB, higher-order kernel filters are employed in an FFT-like tree structure. As in the frequency-response masking (FRM) [7] technique, computational cost is kept low by the use of interpolated versions of the kernel filters, with most of their coefficients null. The overall result is a moderately complex but highly selective filter bank, with an attenuation of 56 db between adjacent channels. This allows an improved distinction between components of the music signal, leading to a better note identification. Frequency spacing of both sfft and FFB is constant. This characteristic makes them somewhat inefficient in an AMT context. In fact, in the equal tempered scale, widely employed in Western music [], the spacing between the fundamental frequencies of musical notes increases geometrically. As a consequence, linearly distributed channels capable of properly discriminate between contiguous notes in the low-frequency range of the spectrum tend to be too numerous in high frequencies. Conversely, a well matched high-frequency resolution corresponds to an insufficient resolution in the low-frequency range. In an attempt to take advantage of the equal tempered scale, Brown [] proposed the use of geometrically spaced channels in AMT systems. The result was the so-called constant-q transform (CQT), which presents a high computational cost, since it is not able to take advantage of the FFT tree-like algorithm. The combination of the CQT with the

FFB resulted in the constant-q fast filter bank (CQFFB) [3]. Another form of channel distribution uses a geometric spacing between different groups of channels (which may correspond to musical octaves) and a linear distribution inside each group. This approach is referred to as the bounded-q transform (BQT), and was proposed in [5]. The BQT employs the FFT algorithm within each octave, greatly reducing the associated implementation cost. 2.. Bounded-Q Fast Filter Bank The high-selectivity counterpart of the BQT is denominated bounded-q fast filter bank (BQFFB) [2]. The BQFFB implementation in the AMT context consists of submitting the input signal, at first, to a CQFFB presenting channels geometrically spaced by a factor of 2. These channels can be associated to the octaves perceivable by the human auditory system, between 2 Hz and 2 khz. This first separation step presents high selectivity, but low computational cost due to the small number of channels. Each octave is then decimated by a proper factor, and the higher half of its complex spectrum (between π and 2π) is fed into 32 channels of an FFB, yielding a linear channel separation. The resulting resolution is enough to distinguish frequencies half-tone apart, which is generally the minimum spacing between notes in the equal tempered music scale. This entire procedure is represented in Figure. 3.. End-of-Octave Gap After the lower part (between and π) of each octave spectrum is eliminated, it must be fed into the appropriate channels of the second-step FFB. However, picking one half of the channels creates a small gap at the end the octave. To illustrate this issue in a simple way, an N-channel FFB, with N =8, is depicted in Figure 2. From this figure, one clearly observes that the normalized frequency range near 2π is in fact associated to the first channel. Therefore, the higher N/2 FFB channels do not entirely fill the desired π to 2π range. 2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8.2.4.6.8.2.4.6.8 2.2.4.6.8.2.4.6.8 2 (c).2.4.6.8.2.4.6.8 2 (d).2.4.6.8.2.4.6.8 2 Frequency ( π rad/sample) Figure 2. responses of an 8- channel FFB: Complete bank; Higher N/2 channels; (c) Lower N/2 channels; (d) First channel along with higher N/2 channels. Figure. BQFFB implementation based on a CQFFB followed by an FFB. Hence, the BQFFB combines the FFB high selectivity with the quasi-geometric spacing of the BQT. A comparative table with the properties of each spectral-analysis tool can be found in [2], along with detailed comparisons between their respective computational costs. 3. Practical Issues on the BQFFB Implementation This section discusses a few points that should be taken into account in the BQFFB implementation to avoid artifacts in the time-frequency analysis. Hence, in the BQFFB implementation, one must also take into consideration the first FFB channel, as depicted in Figure 2(d), to avoid the end-of-octave gap. The lower portion of this first channel does not impose any practical problem, since the CQFFB eliminates all components within this range in the first step of the BQFFB procedure, as seen in Figure. 3.2. Octave Filter Compensation Obviously, the CQFFB that separates the spectrum in octaves is not ideal as depicted in Figure. The true magnitude responses of the individual filters can be observed in Figure 3. This behavior can be compensated by incorporating a gain factor to each inner channel, thus equalizing the octave-separation procedure. The output of each FFB channel c inside a given octave o is multiplied by a gain given by G c = H o (Ω i ), () where H o (Ω) is the frequency response of that octave filter and Ω i is the center frequency of channel c. Naturally, within the stopband of H o (Ω), the gains are set to zero.

2.5 2 2.5 3.5 4 5 6 2.5.5.5 2 Frequency ( π rad/sample) 7 2 8 9.5..2.3.4.5.6.7.8.9 Frequency ( π rad/sample) Figure 3. CQFFB presenting channels geometrically spaced by a factor of 2. These channels correspond approximately to the octaves perceived by the human auditory system, between 2 Hz and 2 khz..5.5.5 2 Frequency ( π rad/sample).3.25.2.5. This simple procedure is illustrated in Figure 4. Figure 4 shows the individual channel magnitudes within the 7 th octave without gain compensation, whereas Figure 4 shows the compensating gain in each channel, yielding the flat result depicted in Figure 4(c). 3.3. Tempered-Scale Mapping The previous steps allow one to separate the entire signal spectrum into octaves. These are defined as a function of the normalized sampling frequency 2π in the following way: The highest octave of interest lies within π/2 and π; the second highest octave lies between π/4 and π/2, and so on. After that, each of the M octaves is divided into N equally spaced channels, taking advantage of the FFT-like efficient algorithm. A simple strategy can be devised to map the BQFFB channels onto candidates to fundamental frequencies of the equal tempered scale:. Starting from a reference note (e.g. A4 = 44 Hz), compute the theoretical fundamental frequencies of musical notes spanning the complete spectrum, and define a half-tone band t l around each note l. 2. Starting from the sampling frequency (e.g. F s = 44. khz), once the number of octaves to be analyzed by the BQFFB (e.g. M =) and the number of channels within each octave (e.g. N =32) have been chosen, define the lower and higher limits of each individual BQFFB channel. 3. Compose each candidate fundamental l by adding up the BQFFB channels contained in t l weighted by their percentage of intersection. For the values suggested above, one would consider the range associated to A4 from 427.5 Hz to 452.9 Hz. On the.5.5.5 2 Frequency ( 2π rad/s) (c) Figure 4. Example of gain compensation after the octave separation stage: response without gain compensation; Gain compensation; (c) response with gain compensation. other hand, the three associated BQFFB channels should be: A from 425.2 Hz to 436. Hz; B from 436. Hz to 446.8 Hz; C from 446.8 Hz to 457.6 Hz. The weighted sum would then take 79% of channel A, % of channel B, and 56% of channel C to form the corresponding A4 channel. This process is illustrated in Figure 5. A 425.2 436. A4 427.5 452.9 B 446.8 C 457.6 Figure 5. Mapping example between BQFFB channels and note channels. 4. Experiments In order to illustrate the behavior of the BQFFB system described in this paper, two computer experiments are de-

scribed. In the first one, one performs the analysis of a sinusoidal signal whose frequency varies according to an ascending chromatic scale, i.e. where each note is one half-tone above its predecessor. This test verifies the BQFFB effectiveness along the frequency range, and shows the effects of the octave filter compensation and tone-mapping procedure described previously. The second experiment tests the system under a practical AMT scenario. A real recording is submitted to the system to verify the percentage of notes correctly detected. 4.. Experiment : Chromatic Scale The chromatic scale used in this experiment extends from C5 (523 Hz) to E8 (5274 Hz). The respective input signal was submitted to a BQFFB with M =, N =32, without any compensation procedure, generating the output seen in Figure 6. Figure 7. Experiment : BQFFB output for the signal containing the chromatic scale after the tempered scale mapping..4 Amplitude.2.2.4 5 5 2 25.4 Amplitude.2.2 Figure 6. Experiment : BQFFB output for the signal containing the chromatic scale. By applying the tone-mapping procedure, the BQFFB octaves with 32 filter channels are mapped onto musical octaves with 2 note channels each. The result is seen in Figure 7, where all the tones are properly distinguishable in frequency and time. From the gray scale in Figure 7, one notices that distinct tones have been detected with different intensities, an effect of the non-ideal octave separation. The experiment was then repeated with the gain-compensating technique. Figure 8 shows the direct sum of the filter outputs in both cases without and with the gain compensation. Transients were purposely kept to serve as markers of note transitions. From the plots, one verifies that the gain compensation succeeded in equalizing the response of octave-separation filters. 4.2. Experiment 2: Real Recording This experiment aims at evaluating the system performance over a real signal. This signal contains an execu-.4 5 5 2 25 Time(s) Figure 8. Experiment : BQFFB output for the signal containing the chromatic scale. Before the octave filter compensation. After the octave filter compensation. tion of the piece Flight of the Bumblebee, by Rimski- Korsakoff, arranged for piano by Rachmaninoff. The main information to be retrieved concerns the detected notes. At this point, it is not important to estimate onsets and offsets. The task is to verify if the notes present in the musical score are indicated by the BQFFB output, which is shown in Figure 9. Note detection was based on the simple search for local maxima, at each time instant, at the outputs of the BQFFB channels. The results are shown in Figure. Comparing the notes from the BQFFB output and from the musical score, one concludes that 8% of the notes were detected. It shows that the BQFFB can generate an output that can be presented to the next block in an AMT system in order to detect musical notes.

Figure 9. BQFFB output for the signal containing the Flight of the Bumblebee. This paper explored practical issues concerning the bounded-q fast filter bank (BQFFB) implementation for the automatic music transcription. The impact of some solutions were illustrated over some computer experiments using artificial and real signals. The results are quite promising when they are put into an AMT scenario, since it was possible to reach near 8% of detection rate after an extremely simple heuristics. References Figure. Simple note detection, of the Flight of the Bumblebee extract, based on the BQFFB output after gain-compensation and channel-mapping procedures. 5. Conclusions [] J. C. Brown. Calculation of a constant Q spectral transform. Journal of the Acoustical Society of America, 89():425 434, 99. [2] F.C.C.B.Diniz,I.Kothe,S.L.Netto,andL.W.P.Biscainho. High-selectivity filter banks for spectral analysis of music signals. EURASIP Journal Advances on Signal Processing, 27(PAPER ID 9474): 2, 27. [3] C. N. dos Santos, S. L. Netto, L. W. P. Biscainho, and D. B. Graziosi. A modified constant-q transform for audio signals. In Proc. of ICASSP 24 - Int. Conf. on Audio, Speech, and Signal Processing, volume 2, pages 469 472, Montreal, Canada, May 24. IEEE. [4] B. Farhang-Boroujeny and Y. C. Lim. A comment on the computational complexity of sliding FFT. IEEE Transactions on Circuits and Systems - II: Analog and Digital Signal Processing, 39(2):875 876, 992. [5] K. L. Kashima and B. Mont-Reynaud. The bounded-q approach to time-varying spectral analysis. Technical Report 28, Stanford University, Dep. of Music, Stanford, CA, USA, 985. [6] A. Klapuri and M. Davy. Signal Processing Methods for Music Transcription. Springer-Verlag, New York, USA, 26. [7] Y. C. Lim. Frequency-response masking approach for the synthesis of sharp linear phase digital filters. IEEE Transactions on Circuits and Systems, 33(4):357 364, 986. [8] Y. C. Lim and B. Farhang-Boroujeny. Fast filter bank (FFB). IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 39(5):36 38, 992. [9] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Prentice-Hall, Upper Saddle River, NJ, USA, 992. [] D. William and E. Brown. Theoretical Foundations of Music. Wadsworth, Belmont, CA, USA, 978.