Automatic Transcription: An Enabling Technology for Music Analysis



Similar documents
Recent advances in Digital Music Processing and Indexing

Piano Transcription and Radio Travel Model

Audio-visual vibraphone transcription in real time

Methods for Analysing Large-Scale Resources and Big Music Data

Harmonic Adaptive Latent Component Analysis of Audio and Application to Music Transcription

DYNAMIC CHORD ANALYSIS FOR SYMBOLIC MUSIC

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

PIANO MUSIC TRANSCRIPTION WITH FAST CONVOLUTIONAL SPARSE CODING

Beethoven, Bach und Billionen Bytes

FOR A ray of white light, a prism can separate it into. Soundprism: An Online System for Score-informed Source Separation of Music Audio

Annotated bibliographies for presentations in MUMT 611, Winter 2006

AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David

A causal algorithm for beat-tracking

Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction

Onset Detection and Music Transcription for the Irish Tin Whistle

Mathematical Harmonies Mark Petersen

FAST MIR IN A SPARSE TRANSFORM DOMAIN

ESTIMATION OF THE DIRECTION OF STROKES AND ARPEGGIOS

MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

EVALUATION OF A SCORE-INFORMED SOURCE SEPARATION SYSTEM

Music Transcription with ISA and HMM

WHEN listening to music, humans experience the sound

Toward Evolution Strategies Application in Automatic Polyphonic Music Transcription using Electronic Synthesis

1 About This Proposal

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

Advanced Signal Processing and Digital Noise Reduction

Non-Negative Matrix Factorization for Polyphonic Music Transcription

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 17, NO. 7, SEPTEMBER /$ IEEE

SYMBOLIC REPRESENTATION OF MUSICAL CHORDS: A PROPOSED SYNTAX FOR TEXT ANNOTATIONS

Monophonic Music Recognition

UNIVERSAL SPEECH MODELS FOR SPEAKER INDEPENDENT SINGLE CHANNEL SOURCE SEPARATION

Analysis/resynthesis with the short time Fourier transform

Control of affective content in music production

8 th grade concert choir scope and sequence. 1 st quarter

Content-Based Music Information Retrieval: Current Directions and Future Challenges

ISOLATED INSTRUMENT TRANSCRIPTION USING A DEEP BELIEF NETWORK

Automatic Chord Recognition from Audio Using an HMM with Supervised. learning

EUROPEAN COMPUTER DRIVING LICENCE. Multimedia Audio Editing. Syllabus

Timbre. Chapter Definition of timbre modelling. Hanna Järveläinen Giovanni De Poli

Separation and Classification of Harmonic Sounds for Singing Voice Detection

TEMPO AND BEAT ESTIMATION OF MUSICAL SIGNALS

Unsupervised Learning Methods for Source Separation in Monaural Music Signals

Variational Inference in Non-negative Factorial Hidden Markov Models for Efficient Audio Source Separation

MUSIC is to a great extent an event-based phenomenon for

Audio Coding Algorithm for One-Segment Broadcasting

Key Estimation Using a Hidden Markov Model

Rameau: A System for Automatic Harmonic Analysis

Speech Signal Processing: An Overview

B3. Short Time Fourier Transform (STFT)

Music recommendation for music learning: Hotttabs, a multimedia guitar tutor

Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

GUITAR TAB MINING, ANALYSIS AND RANKING

How To Create A Beat Tracking Effect On A Computer Or A Drumkit

Section for Cognitive Systems DTU Informatics, Technical University of Denmark

Time allowed: 1 hour 30 minutes

Unit Overview Template. Learning Targets

The loudness war is fought with (and over) compression

Session Keys. e-instruments lab GmbH Bremer Straße Hamburg Germany

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification

Fast Algorithm for In situ transcription of musical signals : Case of lute music

Modeling Affective Content of Music: A Knowledge Base Approach

Sample Entrance Test for CR (BA in Popular Music)

Due Wednesday, December 2. You can submit your answers to the analytical questions via either

Cursive Handwriting Recognition for Document Archiving

Machine Learning and Statistics: What s the Connection?

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Parameters for Session Skills Improvising Initial Grade 8 for all instruments

Music Information Retrieval Meets Music Education

TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY

Demonstrate technical proficiency on instrument or voice at a level appropriate for the corequisite

AUTOMATIC FITNESS IN GENERATIVE JAZZ SOLO IMPROVISATION

Higher Music Concepts Performing: Grade 4 and above What you need to be able to recognise and describe when hearing a piece of music:

Audio Engineering Society. Convention Paper. Presented at the 119th Convention 2005 October 7 10 New York, New York USA

Automatic Evaluation Software for Contact Centre Agents voice Handling Performance

THE SPECTRAL MODELING TOOLBOX: A SOUND ANALYSIS/SYNTHESIS SYSTEM. A Thesis. Submitted to the Faculty

Practical Design of Filter Banks for Automatic Music Transcription

Analysis of the Data Quality of Audio Descriptions of Environmental Sounds. Uncorrrected Proof

Feature Extraction and Enriched Access Modules for Musical Audio Data

GABOR, MULTI-REPRESENTATION REAL-TIME ANALYSIS/SYNTHESIS. Norbert Schnell and Diemo Schwarz

Habilitation. Bonn University. Information Retrieval. Dec PhD students. General Goals. Music Synchronization: Audio-Audio

Automatic Music Transcription as We Know it Today

FOURIER TRANSFORM BASED SIMPLE CHORD ANALYSIS. UIUC Physics 193 POM

Auto-Tuning Using Fourier Coefficients

HOG AND SUBBAND POWER DISTRIBUTION IMAGE FEATURES FOR ACOUSTIC SCENE CLASSIFICATION. Victor Bisot, Slim Essid, Gaël Richard

Search and Information Retrieval

MUSIC GLOSSARY. Accompaniment: A vocal or instrumental part that supports or is background for a principal part or parts.

Short-time FFT, Multi-taper analysis & Filtering in SPM12

Figure1. Acoustic feedback in packet based video conferencing system

The Tuning CD Using Drones to Improve Intonation By Tom Ball

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

Lecture 1-10: Spectrograms

Time-series analysis of Music: Perceptual and Information Dynamics

Neovision2 Performance Evaluation Protocol

Royal Canadian College of Organists

An accent-based approach to performance rendering: Music theory meets music psychology

AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION

Music-Theoretic Estimation of Chords and Keys from Audio

AUTOMATIC EVOLUTION TRACKING FOR TENNIS MATCHES USING AN HMM-BASED ARCHITECTURE

DSAP - Digital Speech and Audio Processing

Transcription:

Automatic Transcription: An Enabling Technology for Music Analysis Simon Dixon simon.dixon@eecs.qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University of London

Semantic Audio Extraction of symbolic meaning from audio signals: How would a person describe the audio? Solo violin, D minor, slow tempo, Baroque period Bar 12 starts at 0:21.961 2nd verse starts at 1:43.388 Quarter comma meantone temperament Annotation of audio Onset detection Beat tracking Segmentation (structural,...) Instrument identification Temperament and inharmonicity analysis Automatic transcription Automatically generated annotations are stored as metadata Matched against content-based queries Also useful for navigation within a document Simon Dixon (C4DM) Automatic Transcription 2 / 16

Automatic music transcription (AMT) Audio recording Score Applications: Music information retrieval Interactive music systems Musicological analysis Subtasks: Pitch estimation Onset/offset detection Instrument identification Rhythmic parsing Pitch spelling Typesetting Still remains an open problem Simon Dixon (C4DM) Automatic Transcription 3 / 16

Background (1) Core problem: Multiple-F0 estimation Common time-frequency representations: Short-Time Fourier Transform Constant-Q Transform ERB Gammatone Filterbank Common estimation techniques: Signal processing methods Statistical modelling methods NMF/PLCA Sparse coding Machine learning algorithms HMMs commonly used for note tracking Simon Dixon (C4DM) Automatic Transcription 4 / 16

Background (2) evaluation: Best results achieved by signal processing methods Multiple-F0 estimation approaches rely on assumptions: Harmonicity Power summation Spectral smoothness Constant spectral template 0 0 Magnitude (db) 50 Magnitude (db) 50 100 100 1 2 3 4 5 1 2 3 4 5 Frequency (khz) Frequency (khz) Simon Dixon (C4DM) Automatic Transcription 5 / 16

Background (3) Design considerations: Time-frequency representation Iterative/joint estimation method Sequential/joint pitch tracking 3 3 STFT Magnitude 2 1 RTFI Magnitude 2 1 0 1 2 3 4 5 Frequency (khz) 0 200 400 600 800 1000 1200 RTFI Index Simon Dixon (C4DM) Automatic Transcription 6 / 16

Signal Processing Approach (ICASSP 2011) RTFI time-frequency representation Preprocessing: noise suppression, spectral whitening Onset detection: late fusion of spectral flux & pitch salience Pitch candidate combinations are evaluated, modelling tuning, inharmonicity, overlapping partials and temporal evolution HMM-based offset detection (a) Ground-truth guitar excerpt RWC MDB-J-2001 No. 9 MIDI Pitch 80 70 60 50 (b) Transcription output of the same recording MIDI Pitch 2 4 6 8 10 12 14 16 18 20 22 (a) Ground Truth 80 70 60 50 2 4 6 8 10 12 14 16 18 20 22 (b) Transcription Simon Dixon (C4DM) Automatic Transcription 7 / 16

Background: Non-negative Matrix Factorisation (NMF) X = WH X is the power (or magnitude) spectrogram W is the spectral basis matrix (dictionary of atoms) H is the atom activity matrix Solved by successive approximation Minimise X WH Subject to constraints (e.g. non-negativity, sparsity, harmonicity) Elegant model, but... Not guaranteed to converge to a meaningful result Difficult to extend / generalise Simon Dixon (C4DM) Automatic Transcription 8 / 16

PLCA-based Transcription (Ref: Benetos and Dixon, SMC 2011) Probabilistic Latent Component Analysis (PLCA): probabilistic framework that is easy to generalize and interpret PLCA algorithms have been used in the past for music transcription Shift-invariant PLCA-based AMT system with multiple templates: P(ω, t) = P(t) p,s P(ω s, p) ω P(f p, t)p(s p, t)p(p t) P(ω, t) is the input log-frequency spectrogram, P(t) the signal energy, P(ω s, p) spectral templates for instrument s and pitch p, P(f p, t) the pitch impulse distribution, P(s p, t) the instrument contribution for each pitch, and P(p t) the piano-roll transcription. Simon Dixon (C4DM) Automatic Transcription 9 / 16

PLCA-based Transcription (2) System is able to utilize sets of extracted pitch templates from various instruments (Figure: violin templates) 1000 900 800 Frequency 700 600 500 400 300 200 100 35 40 45 50 55 60 65 70 75 80 Pitch Number Simon Dixon (C4DM) Automatic Transcription 10 / 16

PLCA-based Transcription (3) Figure: transcription of a Cretan lyra excerpt Original: Transcription: 700 600 Frequency 500 400 300 200 100 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 Time (frames) Simon Dixon (C4DM) Automatic Transcription 11 / 16

Application 1: Characterisation of Musical Style (Ref: Mearns, Benetos and Dixon, SMC 2011) Bach Chorales: audio and MIDI versions Audio transcribed and segmented by beats Each vertical slice is matched to chord templates based on Schoenberg s definition of diatonic triads and 7ths for the major and minor modes HMM detects key and modulations Observations are chord symbols, hidden states are keys Observation matrix models relationships between chords and keys State transition matrix models modulation behaviour Various models based on Schoenberg and Krumhansl comparison of models from music theory and music psychology Transcription Chords Key Audio 33% 56% 64% MIDI (100%) 85% 65% Simon Dixon (C4DM) Automatic Transcription 12 / 16

Application 2: Analysis of Harpsichord Temperament (Ref: Tidhar et al. JNMR 2010; Dixon et al. JASA 2011, to appear) For fixed-pitch instruments, tuning systems are chosen which aim to maximise the consonance of common intervals Temperament is a compromise arising from the impossibility of satisfying all tuning constraints simultaneously We measure temperament to aid study of historical performance Perform conservative transcription Calculate precise frequencies of partials (CQIFFT) Estimate fundamental and inharmonicity of each note Classify temperament Synthesised Acoustic (real) Spectral Peaks 96% 79% Conservative Transcription 100% 92% Simon Dixon (C4DM) Automatic Transcription 13 / 16

Application 3: Analysis of Cello Timbre (Ref: Chudy, Benetos and Dixon, work in progress) Expert human listeners can recognise the sound of a performer Not just the recording Not just the instrument Can this be captured in standard timbral features? Temporal: attack time, decay time, temporal centroid Spectral: spectral centroid, spectral deviation, spectral spread, irregularity, tristimuli, odd/even ratio, envelope, MFCCs Spectrotemporal: spectral flux, roughness In order to estimate features, it is necesary to identify the notes (onset, offset, pitch, etc.) As a manual task, this is very time consuming Conservative transcription solves the problem: high precision, low recall System is trained on cello samples from RWC database Simon Dixon (C4DM) Automatic Transcription 14 / 16

Take-home Messages Automatic transcription enables new musicological questions to be investigated Many other applications: music education, practice, MIR The technology is not perfect but music is highly redundant Shift-invariant PLCA is ideal for instrument-specific polyphonic transcription Proposed procedure for folk music analysis: Extract templates for desired instruments (e.g. using NMF) Perform shift-invariant PLCA transcription Estimate features of interest Interpret results according to estimates obtained Simon Dixon (C4DM) Automatic Transcription 15 / 16

Any Questions? Acknowledgements Emmanouil Benetos, Lesley Mearns, Magdalena Chudy: the PhD students who do all the work Dan Tidhar, Matthias Mauch: collaboration on the harpsichord project Aggelos, Christina and EURASIP for the invitation Simon Dixon (C4DM) Automatic Transcription 16 / 16