Feature Extraction and Enriched Access Modules for Musical Audio Data

Size: px
Start display at page:

Download "Feature Extraction and Enriched Access Modules for Musical Audio Data"

Transcription

1 Internal Note Feature Extraction and Enriched Access Modules for Musical Audio Data Version 1.0 Draft Date: 15 February 2007 Editor: QMUL Contributors: Christian Landone, Dan Barry, Ivan Damnjanovic

2 Introduction This document enumerates the modules for the extraction of musical features from recorded audio files to be integrated within the EASAIER framework. Additional modules dedicated to the implementation of Enriched Access features will also be described here. 1. Architectural Notes The EASAIER framework requires applications with DSP capability both on the content provider side ( Archiver ) and on the end user side ( Browsernavigator ). Initially the project envisaged a complete separation between feature extraction and enriched access tools, the former being exclusively assigned to the server side application and the latter to be used by the client side application. In practice, following the initial systems architecture meeting ( ), it was found that both sides of the system might benefit from a certain degree of interoperability between the two set of tools. The following sections describe the generic feature extraction and audio processing functionality of the Server and Client-Side applications. (note, these two sections are mostly generated by brainstorming and guessing, so do modifyaddcomplain as you see fit) 1.1. Server Side Archiving Software The server side archiving application is a tool that allows content providers to manually enter andor automatically extract meta-data from musical audiovideo assets and archive them within the EASAIER system Audio analysis The musical audio asset is submitted to the application and, whenever necessary, undergoes restoration. A simplified system diagram is proposed in figure 1. Also, a compressed version of the audio asset is generated and submitted, along with the original, to the audio files repository: this lower quality copy can then be used by the EASAIER server for the purpose of streaming audio to the end user without using excessive amounts of bandwidth. The process of sound source separation may also be performed at this stage, although there is limited confidence that this will significantly improve the performance of the musical features extractors. However, the inclusion of this algorithm within the Archiver would allow an expert operator to choose an optimal set of separation parameters uniquely associated with the audio file, which can be transmitted to the enriched access tools on the client-side application as default settings. Following restoration and source separation, the audio data goes through a number of modules for the extraction of mid and high-level musical features that will be included in the meta-data associated to the audio file under analysis for classification and search purposes. The modules have been divided in two categories: mid-level extractors and high-level extractors. Broadly speaking, mid level extractors return time-synchronous (frame-based) information such as harmonic and timbre profiles, chord sequences or the position of beats and are particularly suitable

3 for spawning transcriptions and performing similarity-based searches within the EASAIER archives. High level extractors, on the other side, aim to describe global, and mostly single-valued, information regarding a piece of music, such as the tempo, meter, global key, mode or the presence of a particular instrument within the audio file. These descriptors can be employed to perform a parameter-based search such as: find an audio file exhibiting a tempo of 120 bpm at 44 time signature and containing the instrument conga. The mid-level features are extracted by the relevant algorithm (see section 2 for a description) and stored in a suitable format (TBD) in a repository within the EASAIER system. As well as being utilised by the server for search purposes, these features can also be used by the client-side navigation and playback tool to provide specialised visualisations of the music under analysis (e.g. an intensity envelope) and markers on points of interest within the waveform (e.g. position of beats, versechorus boundary, etc). Mid level descriptors are also used within the archiving application by a second level of software modules for the generation of high level features. Unfortunately high level features extractors are not robust enough at this stage of development to guarantee an absolute consistency, hence we envisage the use of a reliability metric that can prompt the operator to double-check the results and, if necessary, to manually populate the relevant high-level tags. Input Audio File (PCM) Compression To PCM & compressed audio assets repository Archiving application (musical audio) De-Noising Restoration Source Separation Optimal source separation & denoising parameters Mid-Level Feature Extractors Transcript. High-Level Features Extractors Manual Tags & Manual High Level Features Mid-level descriptors & transcript (similarity search) High-level features (parametric search) Manual Entry TagsData Reliability Metric To Metadata Repository Figure 1: server side musical audio archiving Video analysis The video asset is submitted to the EASAIER server and it undergoes necessary transcoding process. A compressed version of the video asset is generated and submitted, along with the original, to the video files repository, this lower quality copy can then be used by the EASAIER server for the purpose of streaming video to the end user without using excessive amounts of

4 bandwidth. In this process the audio stream is extracted from the video for purpose of audio analysis given in figure 1. The video stream undergoes then automatic analysis as shown in figure 2. All these processes on the videoaudio assets will be accomplished using open source software, such as ffmpeg [FFMPEG]. The ffmpeg software is known as fastest and most reliable open source transcoding software, having integrated majority of popular audiovideo coders. Input Video File Compression Original video file Streaming video file (eg. mpeg 4) Multimedia assets repository Audio Stream Extraction PCM Audio stream analysis (figure 1) Metadata Features Temporal data Video Segmentation and Keyframe extraction Keyframes KF temporal data Video segments temporal data Keyframe Analysis Manual Annotation KF Extracted Features Video segments metadata Figure 2: server side video archiving. Metadata repository QMUL will also provide video segmentation and key frame extraction modules. The modules take as input video in mpeg2 format and give as output temporal information about start and duration of video segments as well as keyframes images and their positions within video file. The modules are already available as linux binaries and in the stage of developing cross-platform versions. In the current implementation, only one feature is extracted for each video frame, the ColorLayout. ColorLayout is a simple representation of the layout of colour within a frame, using a DCT to represent the feature. One DCT is created for each colour component (one luminance and two chrominance components in the case of a video frame). A difference metric for each component involves taking the weighted Euclidian distance between each DCT value in each colour component. This leads to fast matching, and scalability can be improved by using fewer DCT values and sacrificing accuracy. The resulting feature vector can be used for a variety of applications. Simple shot cuts can be detected by looking for peaks in the rate of change of feature between subsequent frames, which produces a robust method for detecting abrupt shot changes which is reasonably accurate even in sequences with high visual activity. In the presence, we are working on expansion of the feature set used for the cut detection and keyframe extraction and on more sophisticated difference metrics, such as N-Cut (Normalized Cut). The extracted keyframes are further processed in order to extract a set of MPEG7 low-level descriptors [MPEG7], which will be used in the EASAIER cross-retrieval engine in addition to audio similarities searches to provide expansion of searches to non-audio assets. For this purpose the MPEG-7 experimentation Model (XM) software [MP7XM] will be used. This is standard

5 reference software used by Mpeg standardization body that is open source and both Linux and Windows versions exists and were tested and used at QMUL. The starting set of features that will be extracted for the purpose of EASAIER is defined in EASAIER metadata document [EMD2006] and Deliverable 3.1 [ED312006], but is still to be refined during implementation and testing phases of the EASAIER project Client Side Search and Browsing Software The end user will be able to access the content of the EASAIER archive by means of an application (figure 3) that can retrieve an audio asset and its associated meta-data using a variety of non mutually exclusive query methodologies, such as: - Queries based on general tags: i.e. find material by authortitle, genre and year - Musical parameters-based queries: i.e. find songs by key, orchestration, tempo range. - Similarity-based queries: i.e. once a musical audio asset has been retrieved, find other assets that exhibit some degree of similarity in terms of macroscopic structure, timbre and harmonic profile. The audio is delivered by the server (either by streaming or download of the entire compressed file) to the client application and then buffered and converted to a suitable format for further processing and visualisation of its time-domain waveform. Following the decoding stage, a suite of real-time audio processing modules allows restoration, source separation and enhancement of the incoming audio stream. The associated meta-data retrieved from the server contains a set of default parameters for both the source separation and restoration algorithms; alternatively, the user can override these parameters manually through an advanced menuinterface on the client application (enriched access UI). The default source separation parameters can be associated to the tags generated by the instrument recognition algorithms to provide a click and play list of the various orchestral components of the musical audio asset. A time-scale modification algorithm that can be operated in real time by the user is included in the enriched access set of tools, allowing to slow down or speed up the audio playback, without affecting the pitch content. As well as providing default operational parameters to the enriched access tool set, the meta-data also contain: 1) General and music-specific tags providing comprehensive information regarding the audio asset under analysis (displayed in the Browsing and Searching UI ) 2) Mid-level features that can be used to deliver technical visualisations of the audio asset as well as markers for advanced playback and looping functionalities, (displayed in the Looping and Visualisation UI ) Although high and mid-level musical descriptors are generated by the archiving application on the server side, an enhancement to the functionality offered by the EASAIER system can be identified in the ability to provide similarity-based searches using audio files residing on the client s hard drive.

6 As shown in the bottom of figure 3, this functionality will require the deployment of a scaled-down version of the archiving application, allowing the generation of data that can be used to search the contents of the EASAIER server. Streaming Audio Buffer Decode Browsing application (musical audio) Source Separation De-Noising Restoration Equalisation E A S A I E R Metadata Default Enriched Access Parameters High-Level Features TextualGeneral Tags Mid-Level Features Time & pitch Scale Modification User-Defined Parameters Audio Out S E R V E R Query Browsing & Searching UI QUERY ENGINE Looping & visualisation UI Search method and parameters Enriched Access UI Mid & High Level Features Mid-level descriptors & transcript (similarity search) High-level features (parametric search) Mini Archiver (musical audio) High-Level Features Extractors Mid-Level Feature Extractors Transcript. 2. Software Modules Local Audio File Figure 3: client side musical audio browsernavigator. The software modules described in this section are included in the following EASAIER work packages: 1) WP4 Sound Object Representation: This work package deals with the identification of features within the archived audio assets. As far as the musical audio is concerned, the tools will enable the extraction of high and mid-level descriptors for classification and search

7 purposes as well as modules capable of providing information regarding the musical structure of the audio asset for visualisation and looping purposes. 2) WP5 Enriched Access Tools: Tools developed within this work package will allow the user to apply useful modifications to the audio content at access time and in real-time, enabling an enriched exploration of the musical audio asset Enriched access Time-scale Modification Pitch-scale Modification Provided by DIT : The TSM algorithm will allow the user to vary the playback rate of the audio in real-time without affecting the local pitch content. The module will use both time domain algorithms and frequency domain algorithms. The appropriate algorithm will be chosen automatically depending on metadata provided with the audio content. The user should also be able to choose the algorithm manually. Pitch scale modification independent of time base is achievable in similar manner. Provided by QMUL: An alternative TSM algorithm based on a phase vocoder implementation. The algorithm allows for excellent transient preservation and robust stereo performance but requires a-priori knowledge of transients within the audio file, this can be provided by the extracted mid-level features Sound Source Separation Provided by DIT : A real-time separation algorithm which is capable of separating multiple sources from 2 channel mixtures. At present this tool requires the user to set some parameters based on visual and audio feedback from the GUI in order to achieve meaningful separations. This version of the algorithm will be deployed as an enriched access tool for WP5. An automated version of this algorithm may also be provided as a pre-processor for transcription and instrument recognition in WP4. Some other work on single channel separation is ongoing within the group at DIT Equalisation and Noise Reduction Provided by DIT : DIT may also be able to provide some rudimentary real-time noise reduction and equalisation tools for the purposes of audio enhancement. QMUL will provide support in the generation of libraries for these tools Sound Object Representations Segmentation Provided by DIT : Some segmentation routines such as a Novel Event Detector which may be incorporated if desired. Provided by QMUL:

8 A module for the segmentation and thumbnailing of recorded musical audio using a hierarchical timbre model (SoundBite) is available Mid Level s and Music Transcription Provided by DIT : The transcription algorithm will perform a non real-time analysis which will result in a musical transcription of the audio content. Harmony features may also be extracted during this analysis. It is also intended that some time aligned visual indication of harmony be provided. Alternative representations of transcribed audio will also be provided such as melodic contours for the purposes of melodic similarity queries. This tool will be deployed at the server side and will provide metadata for the purposes of indexing. The tool may also be deployed at the client side for the case where the user wishes to query by example, where the example audio comes from outside the database. Provided by QMUL: The Centre for Digital Music can provide the following Feature Extraction Modules Detection Function : A module for the generation of a function describing the local structure of an audio signal. Peak Picking : Module for the estimation of onsets from the detection function. Also contains a class for Detection Function processing. Onset Detection : A module for estimating onsets from audio files, incorporating the detection function and peak picking classes. MultiBand Onset Detection : (Released after ) Module for estimating tonal and percussive onsets from audio files. Chroma Class :A module for logarithmic frequency analysis. Beat Tracker : A module for Beat Tracking of Musical Audio Harmonic Change Detection Function (HCDF) : Module for the detection of harmonic change in musical audio files. Chord Estimation : (Ongoing Research) Module for the estimation of musical chords from audio files. Harmonic Content Estimation : The module is intended to provide a mid-level representation of the harmonic and rhythmic information from audio files. The algorithm returns a robust description of musical attributes that is intended to be used for similarity matching rather than for transcription and information retrieval Key Estimation : (Ongoing Research) Module for the estimation of the key in a musical file (frame-based) High Level s

9 Tempo Estimator : (Ongoing Research) The module estimates tempo from a musical audio file using information returned by the beat tracking algorithm Meter Estimator : (Ongoing Research) The module estimates the time signature from a a musical audio file using information returned by the beat tracking algorithm. Global Key Estimator : (Ongoing Research) Module for the estimation of the predominant key in a musical file using information returned by the frame-based key estimation algorithm Musical Instrument Recognition Provided by DIT : DIT has very recently begun work in this field. We expect to be able to integrate this work into EASAIER at a later stage. QMUL will provide legacy code (Instrument Identification Libraries) and knowledge gained from previous research carried out at the Centre for Digital Music

10 3. Current status of Software Modules Type of Feature Underlying Technology Input Output Lang. Current Module Extractor [ references ] Scope Development Status Name Low Level Detection Function A number of techniques are covered. [JF2000] [JPB 2005] Input is a dense frequency domain frame Outputs a single value per input frame module Completed & Deployed VAMP Plugin is available Low Level Peak Picking Detection function undergoes DC removal, smoothing and median filtering [IK2002]. Peak selection is based on quadratic fit [reference needed]. Input is detection function Output is a vector indicating location of estimated onsets Time base is relative to the detection function Completed & Deployed Low Level Onset Detection The module links the detection function and peak-picking classes to provide a complete onset estimator. Input is a pointer to a location containing samples of the audio file under analysis Output is a vector indicating the location of estimated onsets. Time base is relative to the original audio file. Completed & Deployed VAMP Plugin is available Low Level Multi-Band Onset Detection The module splits the signal into four sub-bands using a constant-q filterbank prior to onset detection [CD2004]. Tonal and percussive components are discriminated on the basis of the presence of onsets on the different sub-bands. [ER2005] Output are vectors indicating the location of estimated tonal and percussive onsets. Time base is relative to the original audio file. A version is available. Low Level Chroma Based on an FFT, utilises a sparse kernel approach for the calculation of a constant-q transform. Complete. version is

11 The Chormagram (HPCP) is then calculated from the result of the Constant-Q data. [JB1991], [JB1992], [CH2005] Output is a dense matrix containing the Chromagram bins of the file under analysis. Time base depends on the resolution of the Constant-Q transform deployed but needs revision. VAMP Plugin is available Mid Level Beat Tracker Beat times are recovered by passing the output of an onset detection function through comb filterbank matrices to identify the beat period and alignment The module uses a two state model for tracking tempo changes and for maintaining continuity within a single tempo hypothesis Output is either a sparse vector Sparse Vector with the non-zero elements denoting an estimated beat or a vector containing the temporal location of the identified beats Time base is relative to the original audio file. version is deployed VAMP Plugin is available [MD2004] [MD2005] LowMid Level Harmonic Change Detection Function (HCDF) A 12 bins chromagram is mapped to a 6-D space using a tonal centroid transform and smoothed using a Guassian window. The HCDF is defined as the rate of change of the smoothed tonal centroid signal Output is a dense vector representing the peak change between tonal centorid frames Complete. version is deployed VAMP Plugin is available Transition times between harmonically stable regions can be obtained by peak picking the HCDF Mid Level Chord Estimation The algorithm relies on a 36-bins tuned chromagram obtained from a constant-q transform. The identification is performed using chord templates Standard IO A complete C implementation is not currently available. [CH2005],[MC2004],[BP2002] Output is a sequence of estimated chord symbols. MidHigh Level Key Estimation The key space is modelled by a 24-state HMM. Each state represents one of the 24 major or minor chords and each observation represents a chord transition. Standard IO Input is sequence of estimated chord symbols A complete C implementation is not currently available.

12 [KN2006] Output is estimated key either on a frame or per-track basis. Mid Level Similarity Retrieval Harmonic Content Estimation A 36-bins tuned chromagram is averaged between detected beats. The resulting averaged chromagram is further reduced to 12 bins by summing all three bins for each pitch class Standard IO The output represents a sequence of major and minor triads. A complete C implementation is not currently available. The state transition matrix, mean vector and covariance matrix of a HMM are initialised using musical knowledge and selectively trained using the 12 bins chromagram. Time base consists in detected beats (tactus) The chord sequence is then inferred from the HMM using Viterbi decoding [JPB2005A] High Level Instrument Identification Libraries The instrument identifier relies on a mono-feature timbre modelling approach, using Line Spectrum Frequencies (LSF) as the unique identifier. undetermined C Status unknown. Code is allegedly in DSPMac repository Various classifier are implemented, in particular k-means, Gaussian mixture models and Support Vector Machines. [NC2005],[NC2006],[FI1975],[PK1986] Similarity Retrieval Enriched Access SoundBite For a given track, the space of possible timbres is divided into N timbre types, each of which generates timbre features according to a Gaussian distribution The sequence of timbre features through the track Is modelled by an N-state Hidden Markov Model where the hidden states correspond to the N timbre-types, The most likely sequence of timbre types to have generated the features is Viterbi decoded from the HMM Output is a sequence of labelled segments. C A C demonstrator is available for Mac. OSX. VAMP Plugin is available

13 The most likely segmentation is found by clustering histograms of the timbre types. The features vector consists in the first 20 PCA components extracted from the normalised constant-q spectrum of the audio under analysis along with the normalised envelope. Analysis hop size is chosen as the estimated beat length of the audio under analysis. [ML2006A],[ML2006B] High Level Tempo & Meter The algorithm is based on the beat tracker described above. Meter estimation is currently limited to 44 and 34. The tempo value is estimated by analysing the beat histogram generated using tempo tracking across the audio file and a measure of reliability can be inferred from the distribution of bins in the histogram Outputs are: - Histogram of detected tempos - Estimated main tempo - Estimated time signature code completed and deployed. Some further experimental work is needed. Enriched Access Time-Scaling Time scaling is performed using a FFT-based phase vocoder. Percussive onsets are identified using a multi-band onset detection algorithm and only steady state portions of the signals are time scaled, thus preserving the integrity of transients. Coherence in stereo signals is maintained by using a single reference channel for the identification of transient and steady state frames. Output is the time-scaled audio data. A non-optimal implementation is available [ER2005]

14 4. References: [JF2000] J. Foote, Automatic audio segmentation using a measure of audio novelty, in Proc. IEEE Int. Conf. Multimedia and Expo (ICME2000), vol. I, New York, Jul. 2000, pp [JPB2005] J.P.Bello et al, A Tutorial on Onset Detection in Music Signals, in IEEE Transactions on speech and audio processing, vol. 13, no. 5, September [IK2002] I. Kauppinen, Methods for detecting impulsive noise in speech and audio signals, in Proc. 14th Int. Conf. Digital Signal Processing (DSP2002), vol. 2, Santorini, Greece, Jul. 2002, pp [CD2004] C Duxbury. Signal Models for Polyphonic Music. PhD Thesis, [ER2005] E.Ravelli et al, Fast implementation for non-linear time-scaling of stereo signals, in Proc. of the 8th Int. Conference on Digital Audio Effects (DAFx 05), Madrid, Spain, September 20-22, 2005 [JB1991] Judith Brown, Calculation of a Constant Q Spectral Transform,Journal of the Acoustical Society of America, vol. 89, no. 1, , [JB1992] Judith C. Brown, Miller S. Puckette, An Efficient Algorithm for the Calculation of a Constant Q Transform, Journal of the Acoustical Society of America, vol. 92, no. 5, , [CH2005] C.Harte, M.B. Sandler, Automatic Chord Identification Using a Quantised Chromagram, in Proc. Of the 118th AES Convention 2005 May Barcelona, Spain [MD2004] M. E. P. Davies and M. D. Plumbley, Causal tempo tracking of audio, in 5th International Symposium on Music Information Retrieval, October [MD2005] M. E. P. Davies and M. D. Plumbley, Beat tracking with a two state model, in Proceedings of ICASSP, Philadelphia, USA, March 18 23, 2005 [CH2006] C. Harte, M. Gasser, M.B. Sandler, Detecting Harmonic Ch ange in musical audio, in Proc of AMCMM 06, Santa Barbara, USA, October 27, 2006 a, vol. 89, no. 1, [MC2004] Markus Cremer and Claus Derboven, A System for Harmonic Ananlysis of Polyphonic Music,Proceedings of the AES 25th International Conference, 2004, London, UK, [BP2002] Bryan Pardo and William P. Birmingham, Algorithms for Chordal Analysis, 2002, Computer Music Journal, vol. 26, no. 2, [KN2006] Katy Noland, Mark Sandler, Key Estimation using a Hidden Markov Model, in Proc of ISMIR, Victoria, Canada, 2006 [JPB2005A] J.P. Bello, J. Pickens, A Robust Mid-level Representation for Harmonic Content in Music Signals, in 6th International Symposium on Music Information Retrieval, London, [PK1986] P. Kabal and R.P. Ramachandran, The Computation of line spectral frequencies using Chebyshev polynomials,, IEEE trans. on Acoustics, Speech and Signal Processing, vol. ASSP-34, no. 6, , 1986 [FI1975] F. Itakura, Line spectrum representation linear predictive coefficients of speech signals,, J. Acoust. Soc. Amer., vol. 57, S35, 1975 [NC2005] N. Chetry et al, Musical Instrument Identification using LSF and K-means, in Proc. AES 118th Convention, Barcelona, Spain, 2005 May [NC2006] N. Chetry et al, Computer Models for Musical Instrument Identification, PhD Thesis, [ML2006A] M.Levy et al, New methods in structural segmentation of musical audio, in Proc. Eusipco [ML2006B] M.Levy et al, Extraction of High-Level Musical Structure from audio data and its application to thumbnail generation,, in Prc ICASSP 2006 [FFMPEG]

15 [MPEG7] ISOIEC JTC1SC29WG11, Information Technology Multimedia Content Description Interface Part 3: Multimedia Description Schemes, ISOIEC FDIS , [MP7XM] MPEG-7 experimentation Model, [EMD2006] Dan Barry et al, EASAIER Metadata & s, Internal Note, ver. 1.0 Draft, [ED312006] EASAIER Deliverable 3.1: Retrieval System Functionality and Specifications, ver. 1.12, November 1, 2006.

FAST MIR IN A SPARSE TRANSFORM DOMAIN

FAST MIR IN A SPARSE TRANSFORM DOMAIN ISMIR 28 Session 4c Automatic Music Analysis and Transcription FAST MIR IN A SPARSE TRANSFORM DOMAIN Emmanuel Ravelli Université Paris 6 ravelli@lam.jussieu.fr Gaël Richard TELECOM ParisTech gael.richard@enst.fr

More information

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Annotated bibliographies for presentations in MUMT 611, Winter 2006 Stephen Sinclair Music Technology Area, McGill University. Montreal, Canada Annotated bibliographies for presentations in MUMT 611, Winter 2006 Presentation 4: Musical Genre Similarity Aucouturier, J.-J.

More information

Advanced Signal Processing and Digital Noise Reduction

Advanced Signal Processing and Digital Noise Reduction Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New

More information

Automatic Transcription: An Enabling Technology for Music Analysis

Automatic Transcription: An Enabling Technology for Music Analysis Automatic Transcription: An Enabling Technology for Music Analysis Simon Dixon simon.dixon@eecs.qmul.ac.uk Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University

More information

Separation and Classification of Harmonic Sounds for Singing Voice Detection

Separation and Classification of Harmonic Sounds for Singing Voice Detection Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay

More information

Onset Detection and Music Transcription for the Irish Tin Whistle

Onset Detection and Music Transcription for the Irish Tin Whistle ISSC 24, Belfast, June 3 - July 2 Onset Detection and Music Transcription for the Irish Tin Whistle Mikel Gainza φ, Bob Lawlor* and Eugene Coyle φ φ Digital Media Centre Dublin Institute of Technology

More information

Automatic Chord Recognition from Audio Using an HMM with Supervised. learning

Automatic Chord Recognition from Audio Using an HMM with Supervised. learning Automatic hord Recognition from Audio Using an HMM with Supervised Learning Kyogu Lee enter for omputer Research in Music and Acoustics Department of Music, Stanford University kglee@ccrma.stanford.edu

More information

Recent advances in Digital Music Processing and Indexing

Recent advances in Digital Music Processing and Indexing Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components

More information

M3039 MPEG 97/ January 1998

M3039 MPEG 97/ January 1998 INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039

More information

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final

More information

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;

More information

Figure1. Acoustic feedback in packet based video conferencing system

Figure1. Acoustic feedback in packet based video conferencing system Real-Time Howling Detection for Hands-Free Video Conferencing System Mi Suk Lee and Do Young Kim Future Internet Research Department ETRI, Daejeon, Korea {lms, dyk}@etri.re.kr Abstract: This paper presents

More information

Emotion Detection from Speech

Emotion Detection from Speech Emotion Detection from Speech 1. Introduction Although emotion detection from speech is a relatively new field of research, it has many potential applications. In human-computer or human-human interaction

More information

MUSICAL INSTRUMENT FAMILY CLASSIFICATION

MUSICAL INSTRUMENT FAMILY CLASSIFICATION MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.

More information

1 About This Proposal

1 About This Proposal 1 About This Proposal 1. This proposal describes a six-month pilot data-management project, entitled Sustainable Management of Digital Music Research Data, to run at the Centre for Digital Music (C4DM)

More information

DYNAMIC CHORD ANALYSIS FOR SYMBOLIC MUSIC

DYNAMIC CHORD ANALYSIS FOR SYMBOLIC MUSIC DYNAMIC CHORD ANALYSIS FOR SYMBOLIC MUSIC Thomas Rocher, Matthias Robine, Pierre Hanna, Robert Strandh LaBRI University of Bordeaux 1 351 avenue de la Libration F 33405 Talence, France simbals@labri.fr

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features

Semantic Video Annotation by Mining Association Patterns from Visual and Speech Features Semantic Video Annotation by Mining Association Patterns from and Speech Features Vincent. S. Tseng, Ja-Hwung Su, Jhih-Hong Huang and Chih-Jen Chen Department of Computer Science and Information Engineering

More information

Key Estimation Using a Hidden Markov Model

Key Estimation Using a Hidden Markov Model Estimation Using a Hidden Markov Model Katy Noland, Mark Sandler Centre for Digital Music, Queen Mary, University of London, Mile End Road, London, E1 4NS. katy.noland,mark.sandler@elec.qmul.ac.uk Abstract

More information

Music Mood Classification

Music Mood Classification Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may

More information

Music technology. Draft GCE A level and AS subject content

Music technology. Draft GCE A level and AS subject content Music technology Draft GCE A level and AS subject content July 2015 Contents The content for music technology AS and A level 3 Introduction 3 Aims and objectives 3 Subject content 4 Recording and production

More information

GCE APPLIED ICT A2 COURSEWORK TIPS

GCE APPLIED ICT A2 COURSEWORK TIPS GCE APPLIED ICT A2 COURSEWORK TIPS COURSEWORK TIPS A2 GCE APPLIED ICT If you are studying for the six-unit GCE Single Award or the twelve-unit Double Award, then you may study some of the following coursework

More information

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM

Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Video Affective Content Recognition Based on Genetic Algorithm Combined HMM Kai Sun and Junqing Yu Computer College of Science & Technology, Huazhong University of Science & Technology, Wuhan 430074, China

More information

Establishing the Uniqueness of the Human Voice for Security Applications

Establishing the Uniqueness of the Human Voice for Security Applications Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.

More information

UNIVERSITY OF CENTRAL FLORIDA AT TRECVID 2003. Yun Zhai, Zeeshan Rasheed, Mubarak Shah

UNIVERSITY OF CENTRAL FLORIDA AT TRECVID 2003. Yun Zhai, Zeeshan Rasheed, Mubarak Shah UNIVERSITY OF CENTRAL FLORIDA AT TRECVID 2003 Yun Zhai, Zeeshan Rasheed, Mubarak Shah Computer Vision Laboratory School of Computer Science University of Central Florida, Orlando, Florida ABSTRACT In this

More information

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal

In: Proceedings of RECPAD 2002-12th Portuguese Conference on Pattern Recognition June 27th- 28th, 2002 Aveiro, Portugal Paper Title: Generic Framework for Video Analysis Authors: Luís Filipe Tavares INESC Porto lft@inescporto.pt Luís Teixeira INESC Porto, Universidade Católica Portuguesa lmt@inescporto.pt Luís Corte-Real

More information

3D Content-Based Visualization of Databases

3D Content-Based Visualization of Databases 3D Content-Based Visualization of Databases Anestis KOUTSOUDIS Dept. of Electrical and Computer Engineering, Democritus University of Thrace 12 Vas. Sofias Str., Xanthi, 67100, Greece Fotis ARNAOUTOGLOU

More information

Structural Health Monitoring Tools (SHMTools)

Structural Health Monitoring Tools (SHMTools) Structural Health Monitoring Tools (SHMTools) Getting Started LANL/UCSD Engineering Institute LA-CC-14-046 c Copyright 2014, Los Alamos National Security, LLC All rights reserved. May 30, 2014 Contents

More information

AMUSICAL key and a chord are important attributes of

AMUSICAL key and a chord are important attributes of IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 16, NO. 2, FEBRUARY 2008 291 Acoustic Chord Transcription and Key Extraction From Audio Using Key-Dependent HMMs Trained on Synthesized

More information

EDM SOFTWARE ENGINEERING DATA MANAGEMENT SOFTWARE

EDM SOFTWARE ENGINEERING DATA MANAGEMENT SOFTWARE EDM SOFTWARE ENGINEERING DATA MANAGEMENT SOFTWARE MODERN, UPATED INTERFACE WITH INTUITIVE LAYOUT DRAG & DROP SCREENS, GENERATE REPORTS WITH ONE CLICK, AND UPDATE SOFTWARE ONLINE ipad APP VERSION AVAILABLE

More information

Image Compression through DCT and Huffman Coding Technique

Image Compression through DCT and Huffman Coding Technique International Journal of Current Engineering and Technology E-ISSN 2277 4106, P-ISSN 2347 5161 2015 INPRESSCO, All Rights Reserved Available at http://inpressco.com/category/ijcet Research Article Rahul

More information

Florida International University - University of Miami TRECVID 2014

Florida International University - University of Miami TRECVID 2014 Florida International University - University of Miami TRECVID 2014 Miguel Gavidia 3, Tarek Sayed 1, Yilin Yan 1, Quisha Zhu 1, Mei-Ling Shyu 1, Shu-Ching Chen 2, Hsin-Yu Ha 2, Ming Ma 1, Winnie Chen 4,

More information

Ericsson T18s Voice Dialing Simulator

Ericsson T18s Voice Dialing Simulator Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of

More information

City of Dublin Education & Training Board. Programme Module for. Music Technology. Leading to. Level 5 FETAC. Music Technology 5N1640

City of Dublin Education & Training Board. Programme Module for. Music Technology. Leading to. Level 5 FETAC. Music Technology 5N1640 City of Dublin Education & Training Board Programme Module for Music Technology Leading to Level 5 FETAC Music Technology 5N1640 Music Technology 5N1640 1 Introduction This programme module may be delivered

More information

Grant: LIFE08 NAT/GR/000539 Total Budget: 1,664,282.00 Life+ Contribution: 830,641.00 Year of Finance: 2008 Duration: 01 FEB 2010 to 30 JUN 2013

Grant: LIFE08 NAT/GR/000539 Total Budget: 1,664,282.00 Life+ Contribution: 830,641.00 Year of Finance: 2008 Duration: 01 FEB 2010 to 30 JUN 2013 Coordinating Beneficiary: UOP Associated Beneficiaries: TEIC Project Coordinator: Nikos Fakotakis, Professor Wire Communications Laboratory University of Patras, Rion-Patras 26500, Greece Email: fakotaki@upatras.gr

More information

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION P. Vanroose Katholieke Universiteit Leuven, div. ESAT/PSI Kasteelpark Arenberg 10, B 3001 Heverlee, Belgium Peter.Vanroose@esat.kuleuven.ac.be

More information

AUDIO CODING: BASICS AND STATE OF THE ART

AUDIO CODING: BASICS AND STATE OF THE ART AUDIO CODING: BASICS AND STATE OF THE ART PACS REFERENCE: 43.75.CD Brandenburg, Karlheinz Fraunhofer Institut Integrierte Schaltungen, Arbeitsgruppe Elektronische Medientechnolgie Am Helmholtzring 1 98603

More information

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting

FREE TV AUSTRALIA OPERATIONAL PRACTICE OP- 59 Measurement and Management of Loudness in Soundtracks for Television Broadcasting Page 1 of 9 1. SCOPE This Operational Practice is recommended by Free TV Australia and refers to the measurement of audio loudness as distinct from audio level. It sets out guidelines for measuring and

More information

AN ENVIRONMENT FOR EFFICIENT HANDLING OF DIGITAL ASSETS

AN ENVIRONMENT FOR EFFICIENT HANDLING OF DIGITAL ASSETS AN ENVIRONMENT FOR EFFICIENT HANDLING OF DIGITAL ASSETS PAULO VILLEGAS, STEPHAN HERRMANN, EBROUL IZQUIERDO, JONATHAN TEH AND LI-QUN XU IST BUSMAN Project, www.ist-basman.org We present a system designed

More information

Mike Perkins, Ph.D. perk@cardinalpeak.com

Mike Perkins, Ph.D. perk@cardinalpeak.com Mike Perkins, Ph.D. perk@cardinalpeak.com Summary More than 28 years of experience in research, algorithm development, system design, engineering management, executive management, and Board of Directors

More information

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING

A TOOL FOR TEACHING LINEAR PREDICTIVE CODING A TOOL FOR TEACHING LINEAR PREDICTIVE CODING Branislav Gerazov 1, Venceslav Kafedziski 2, Goce Shutinoski 1 1) Department of Electronics, 2) Department of Telecommunications Faculty of Electrical Engineering

More information

DeNoiser Plug-In. for USER S MANUAL

DeNoiser Plug-In. for USER S MANUAL DeNoiser Plug-In for USER S MANUAL 2001 Algorithmix All rights reserved Algorithmix DeNoiser User s Manual MT Version 1.1 7/2001 De-NOISER MANUAL CONTENTS INTRODUCTION TO NOISE REMOVAL...2 Encode/Decode

More information

Digital Asset Management. Content Control for Valuable Media Assets

Digital Asset Management. Content Control for Valuable Media Assets Digital Asset Management Content Control for Valuable Media Assets Overview Digital asset management is a core infrastructure requirement for media organizations and marketing departments that need to

More information

SPRACH - WP 6 & 8: Software engineering work at ICSI

SPRACH - WP 6 & 8: Software engineering work at ICSI SPRACH - WP 6 & 8: Software engineering work at ICSI March 1998 Dan Ellis International Computer Science Institute, Berkeley CA 1 2 3 Hardware: MultiSPERT Software: speech & visualization

More information

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification

Audio Content Analysis for Online Audiovisual Data Segmentation and Classification IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, VOL. 9, NO. 4, MAY 2001 441 Audio Content Analysis for Online Audiovisual Data Segmentation and Classification Tong Zhang, Member, IEEE, and C.-C. Jay

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES

CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES CLOUDDMSS: CLOUD-BASED DISTRIBUTED MULTIMEDIA STREAMING SERVICE SYSTEM FOR HETEROGENEOUS DEVICES 1 MYOUNGJIN KIM, 2 CUI YUN, 3 SEUNGHO HAN, 4 HANKU LEE 1,2,3,4 Department of Internet & Multimedia Engineering,

More information

Available from Deakin Research Online:

Available from Deakin Research Online: This is the authors final peered reviewed (post print) version of the item published as: Adibi,S 2014, A low overhead scaled equalized harmonic-based voice authentication system, Telematics and informatics,

More information

Greenwich Public Schools Electronic Music Curriculum 9-12

Greenwich Public Schools Electronic Music Curriculum 9-12 Greenwich Public Schools Electronic Music Curriculum 9-12 Overview Electronic Music courses at the high school are elective music courses. The Electronic Music Units of Instruction include four strands

More information

Event Detection in Basketball Video Using Multiple Modalities

Event Detection in Basketball Video Using Multiple Modalities Event Detection in Basketball Video Using Multiple Modalities Min Xu, Ling-Yu Duan, Changsheng Xu, *Mohan Kankanhalli, Qi Tian Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore, 119613

More information

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet DICTA2002: Digital Image Computing Techniques and Applications, 21--22 January 2002, Melbourne, Australia Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet K. Ramkishor James. P. Mammen

More information

AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION

AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION AUTOMATIC VIDEO STRUCTURING BASED ON HMMS AND AUDIO VISUAL INTEGRATION P. Gros (1), E. Kijak (2) and G. Gravier (1) (1) IRISA CNRS (2) IRISA Université de Rennes 1 Campus Universitaire de Beaulieu 35042

More information

engin erzin the use of speech processing applications is expected to surge in multimedia-rich scenarios

engin erzin the use of speech processing applications is expected to surge in multimedia-rich scenarios engin erzin Associate Professor Department of Computer Engineering Ph.D. Bilkent University http://home.ku.edu.tr/ eerzin eerzin@ku.edu.tr Engin Erzin s research interests include speech processing, multimodal

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David

AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES Valentin Emiya, Roland Badeau, Bertrand David TELECOM ParisTech (ENST, CNRS LTCI 46, rue Barrault, 75634 Paris

More information

WATERMARKING FOR IMAGE AUTHENTICATION

WATERMARKING FOR IMAGE AUTHENTICATION WATERMARKING FOR IMAGE AUTHENTICATION Min Wu Bede Liu Department of Electrical Engineering Princeton University, Princeton, NJ 08544, USA Fax: +1-609-258-3745 {minwu, liu}@ee.princeton.edu ABSTRACT A data

More information

School Class Monitoring System Based on Audio Signal Processing

School Class Monitoring System Based on Audio Signal Processing C. R. Rashmi 1,,C.P.Shantala 2 andt.r.yashavanth 3 1 Department of CSE, PG Student, CIT, Gubbi, Tumkur, Karnataka, India. 2 Department of CSE, Vice Principal & HOD, CIT, Gubbi, Tumkur, Karnataka, India.

More information

Hardware Implementation of Probabilistic State Machine for Word Recognition

Hardware Implementation of Probabilistic State Machine for Word Recognition IJECT Vo l. 4, Is s u e Sp l - 5, Ju l y - Se p t 2013 ISSN : 2230-7109 (Online) ISSN : 2230-9543 (Print) Hardware Implementation of Probabilistic State Machine for Word Recognition 1 Soorya Asokan, 2

More information

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song

A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song , pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,

More information

Training Ircam s Score Follower

Training Ircam s Score Follower Training Ircam s Follower Arshia Cont, Diemo Schwarz, Norbert Schnell To cite this version: Arshia Cont, Diemo Schwarz, Norbert Schnell. Training Ircam s Follower. IEEE International Conference on Acoustics,

More information

Industrial IT Ó Melody Composer

Industrial IT Ó Melody Composer Overview Industrial IT Ó Melody Composer Features and Benefits Support of concurrent engineering for Control Systems Operation on Windows NT and Windows 2000 Multiple client/server architecture Off-Line

More information

Control of affective content in music production

Control of affective content in music production International Symposium on Performance Science ISBN 978-90-9022484-8 The Author 2007, Published by the AEC All rights reserved Control of affective content in music production António Pedro Oliveira and

More information

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition IWNEST PUBLISHER Journal of Industrial Engineering Research (ISSN: 2077-4559) Journal home page: http://www.iwnest.com/aace/ Adaptive sequence of Key Pose Detection for Human Action Recognition 1 T. Sindhu

More information

PCM Encoding and Decoding:

PCM Encoding and Decoding: PCM Encoding and Decoding: Aim: Introduction to PCM encoding and decoding. Introduction: PCM Encoding: The input to the PCM ENCODER module is an analog message. This must be constrained to a defined bandwidth

More information

Beethoven, Bach und Billionen Bytes

Beethoven, Bach und Billionen Bytes Meinard Müller Beethoven, Bach und Billionen Bytes Automatisierte Analyse von Musik und Klängen Meinard Müller Tutzing-Symposium Oktober 2014 2001 PhD, Bonn University 2002/2003 Postdoc, Keio University,

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Flattening Enterprise Knowledge

Flattening Enterprise Knowledge Flattening Enterprise Knowledge Do you Control Your Content or Does Your Content Control You? 1 Executive Summary: Enterprise Content Management (ECM) is a common buzz term and every IT manager knows it

More information

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29. Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet

More information

Audio Coding Algorithm for One-Segment Broadcasting

Audio Coding Algorithm for One-Segment Broadcasting Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient

More information

Audacity 1.2.4 Sound Editing Software

Audacity 1.2.4 Sound Editing Software Audacity 1.2.4 Sound Editing Software Developed by Paul Waite Davis School District This is not an official training handout of the Educational Technology Center, Davis School District Possibilities...

More information

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN. zl2211@columbia.edu. ml3088@columbia.edu MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN Zheng Lai Zhao Liu Meng Li Quan Yuan zl2215@columbia.edu zl2211@columbia.edu ml3088@columbia.edu qy2123@columbia.edu I. Overview Architecture The purpose

More information

Internet Video Streaming and Cloud-based Multimedia Applications. Outline

Internet Video Streaming and Cloud-based Multimedia Applications. Outline Internet Video Streaming and Cloud-based Multimedia Applications Yifeng He, yhe@ee.ryerson.ca Ling Guan, lguan@ee.ryerson.ca 1 Outline Internet video streaming Overview Video coding Approaches for video

More information

Parametric Comparison of H.264 with Existing Video Standards

Parametric Comparison of H.264 with Existing Video Standards Parametric Comparison of H.264 with Existing Video Standards Sumit Bhardwaj Department of Electronics and Communication Engineering Amity School of Engineering, Noida, Uttar Pradesh,INDIA Jyoti Bhardwaj

More information

Transana 2.60 Distinguishing features and functions

Transana 2.60 Distinguishing features and functions Transana 2.60 Distinguishing features and functions This document is intended to be read in conjunction with the Choosing a CAQDAS Package Working Paper which provides a more general commentary of common

More information

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006

Practical Tour of Visual tracking. David Fleet and Allan Jepson January, 2006 Practical Tour of Visual tracking David Fleet and Allan Jepson January, 2006 Designing a Visual Tracker: What is the state? pose and motion (position, velocity, acceleration, ) shape (size, deformation,

More information

Azure Machine Learning, SQL Data Mining and R

Azure Machine Learning, SQL Data Mining and R Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:

More information

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,

More information

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X Image Estimation Algorithm for Out of Focus and Blur Images to Retrieve the Barcode

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

HOW MUSICAL ARE IMAGES? FROM SOUND REPRESENTATION TO IMAGE SONIFICATION: AN ECO SYSTEMIC APPROACH

HOW MUSICAL ARE IMAGES? FROM SOUND REPRESENTATION TO IMAGE SONIFICATION: AN ECO SYSTEMIC APPROACH HOW MUSICAL ARE IMAGES? FROM SOUND REPRESENTATION TO IMAGE SONIFICATION: AN ECO SYSTEMIC APPROACH Jean-Baptiste Thiebaut Juan Pablo Bello Diemo Schwarz Dept. of Computer Science Queen Mary, Univ. of London

More information

Habilitation. Bonn University. Information Retrieval. Dec. 2007. PhD students. General Goals. Music Synchronization: Audio-Audio

Habilitation. Bonn University. Information Retrieval. Dec. 2007. PhD students. General Goals. Music Synchronization: Audio-Audio Perspektivenvorlesung Information Retrieval Music and Motion Bonn University Prof. Dr. Michael Clausen PD Dr. Frank Kurth Dipl.-Inform. Christian Fremerey Dipl.-Inform. David Damm Dipl.-Inform. Sebastian

More information

Sub-class Error-Correcting Output Codes

Sub-class Error-Correcting Output Codes Sub-class Error-Correcting Output Codes Sergio Escalera, Oriol Pujol and Petia Radeva Computer Vision Center, Campus UAB, Edifici O, 08193, Bellaterra, Spain. Dept. Matemàtica Aplicada i Anàlisi, Universitat

More information

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION Adiel Ben-Shalom, Michael Werman School of Computer Science Hebrew University Jerusalem, Israel. {chopin,werman}@cs.huji.ac.il

More information

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur Module 8 VIDEO CODING STANDARDS Version ECE IIT, Kharagpur Lesson H. andh.3 Standards Version ECE IIT, Kharagpur Lesson Objectives At the end of this lesson the students should be able to :. State the

More information

HIDDEN MARKOV MODELS FOR SPECTRAL SIMILARITY OF SONGS. Arthur Flexer, Elias Pampalk, Gerhard Widmer

HIDDEN MARKOV MODELS FOR SPECTRAL SIMILARITY OF SONGS. Arthur Flexer, Elias Pampalk, Gerhard Widmer Proc. of the th Int. Conference on Digital Audio Effects (DAFx 5), Madrid, Spain, September 2-22, 25 HIDDEN MARKOV MODELS FOR SPECTRAL SIMILARITY OF SONGS Arthur Flexer, Elias Pampalk, Gerhard Widmer The

More information

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm

Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm Video Authentication for H.264/AVC using Digital Signature Standard and Secure Hash Algorithm Nandakishore Ramaswamy Qualcomm Inc 5775 Morehouse Dr, Sam Diego, CA 92122. USA nandakishore@qualcomm.com K.

More information

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification

SIPAC. Signals and Data Identification, Processing, Analysis, and Classification SIPAC Signals and Data Identification, Processing, Analysis, and Classification Framework for Mass Data Processing with Modules for Data Storage, Production and Configuration SIPAC key features SIPAC is

More information

Tracking and Recognition in Sports Videos

Tracking and Recognition in Sports Videos Tracking and Recognition in Sports Videos Mustafa Teke a, Masoud Sattari b a Graduate School of Informatics, Middle East Technical University, Ankara, Turkey mustafa.teke@gmail.com b Department of Computer

More information

Machine Learning with MATLAB David Willingham Application Engineer

Machine Learning with MATLAB David Willingham Application Engineer Machine Learning with MATLAB David Willingham Application Engineer 2014 The MathWorks, Inc. 1 Goals Overview of machine learning Machine learning models & techniques available in MATLAB Streamlining the

More information

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS Norbert Buch 1, Mark Cracknell 2, James Orwell 1 and Sergio A. Velastin 1 1. Kingston University, Penrhyn Road, Kingston upon Thames, KT1 2EE,

More information

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present

CHANWOO KIM (BIRTH: APR. 9, 1976) Language Technologies Institute School of Computer Science Aug. 8, 2005 present CHANWOO KIM (BIRTH: APR. 9, 1976) 2602E NSH Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 Phone: +1-412-726-3996 Email: chanwook@cs.cmu.edu RESEARCH INTERESTS Speech recognition system,

More information

Waves Trans-X. Software Audio Processor. User s Guide

Waves Trans-X. Software Audio Processor. User s Guide Waves Trans-X Software Audio Processor User s Guide Waves Trans-X software guide page 1 of 8 Chapter 1 Introduction and Overview The Waves Trans-X transient processor is a special breed of dynamics processor

More information

How To Use A High Definition Oscilloscope

How To Use A High Definition Oscilloscope PRELIMINARY High Definition Oscilloscopes HDO4000 and HDO6000 Key Features 12-bit ADC resolution, up to 15-bit with enhanced resolution 200 MHz, 350 MHz, 500 MHz, 1 GHz bandwidths Long Memory up to 250

More information

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder. A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER Manoj Kumar 1 Mohammad Zubair 1 1 IBM T.J. Watson Research Center, Yorktown Hgts, NY, USA ABSTRACT The MPEG/Audio is a standard for both

More information

Security and protection of digital images by using watermarking methods

Security and protection of digital images by using watermarking methods Security and protection of digital images by using watermarking methods Andreja Samčović Faculty of Transport and Traffic Engineering University of Belgrade, Serbia Gjovik, june 2014. Digital watermarking

More information

Figure 1: Relation between codec, data containers and compression algorithms.

Figure 1: Relation between codec, data containers and compression algorithms. Video Compression Djordje Mitrovic University of Edinburgh This document deals with the issues of video compression. The algorithm, which is used by the MPEG standards, will be elucidated upon in order

More information

How To Create A Beat Tracking Effect On A Computer Or A Drumkit

How To Create A Beat Tracking Effect On A Computer Or A Drumkit Audio Engineering Society Convention Paper Presented at the 122nd Convention 2007 May 5 8 Vienna, Austria The papers at this Convention have been selected on the basis of a submitted abstract and extended

More information

Salisbury Township School District Planned Course of Study - Music Production Salisbury Inspire, Think, Learn, Grow Together!

Salisbury Township School District Planned Course of Study - Music Production Salisbury Inspire, Think, Learn, Grow Together! Topic/Unit: Music Production: Recording and Microphones Suggested Timeline: 1-2 weeks Big Ideas/Enduring Understandings: Students will listen to, analyze, and describe music that has been created using

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information