Audio-visual vibraphone transcription in real time
|
|
|
- Dortha Simon
- 10 years ago
- Views:
Transcription
1 Audio-visual vibraphone transcription in real time Tiago F. Tavares #1, Gabrielle Odowichuck 2, Sonmaz Zehtabi #3, George Tzanetakis #4 # Department of Computer Science, University of Victoria Victoria, Canada 1 [email protected], 3 [email protected], 4 [email protected] Department of Electrical and Computer Engineering, University of Victoria Victoria, Canada 2 [email protected] Abstract Music transcription refers to the process of detecting musical events (typically consisting of notes, starting times and durations) from an audio signal. Most existing work in automatic music transcription has focused on offline processing. In this work we describe our efforts in building a system for real time music transcription for the vibraphone. We describe experiments with three audio-based methods for music transcription that are representative of the state of the art. One method is based on multiple pitch estimation and the other two methods are based on factorization of the audio spectrogram. In addition we show how information from a video camera can be used to impose constraints on the symbol search space based on the gestures of the performer. Experimental results with various system configurations show that this multi-modal approach leads to a significant reduction of false positives and increases the overall accuracy. This improvement is observed for all three audio methods, and indicates that visual information is complimentary to the audio information in this context. Index Terms Audiovisual, Music, Transcription I. INTRODUCTION There is an increasing trend of interfacing musical instruments with computers. The most common approach is to create specialized digital instruments that send digital signals about what is played using protocols such as the Musical Instrument Digital Interface (MIDI) or Open Sound Control (OSC) protocols. However, these instruments frequently lack the haptic response and expressive capabilities of acoustic instruments. A more recent approach has been to retrofit acoustic instruments with digital sensors to capture what is being played. These so called hyperinstruments [1] combine the haptic feel and expressive capabilities of acoustic instruments while providing digital control information. However, they tend to be expensive, hard to replicate, and require invasive modifications to the instrument which many musicians dislike. Indirect acquisition [2] refers to the process of acquiring the control information provided by invasive sensors by analyzing the audio signal acquired by a microphone. When successful such systems can provide similar capabilities to hyperinstruments without requiring invasive modification. They are also easy to replicate, and have significantly lower cost. MMSP 10, October 4-6, 2010, Saint-Malo, France.???-?-????-????-?/10/$??.?? c 2010 IEEE. The focus of this paper is the indirect acquisition of control information for the vibraphone which is a member of the pitched percussion family. The major components of a vibraphone are aluminum bars, tube resonators placed under the bars, and a damper mechanism that controls whether notes are sustained after they are struck or not. There are some digital controllers with similar layout to a vibraphone that simply trigger digital samples and do not produce acoustic sound. It is also possible to augment acoustic vibraphones with digital sensors to obtain information about what bar is being played and when. However, this is costly and hard to accomplish without altering the acoustic response of the instrument. In this paper we describe how information about what bar is being played and when can be extracted by computer analysis of audio-visual information. The audio signal acquired by a single microphone, that is not attached to the instrument, is used as input to automatic transcription algorithms. Such algorithms rely on digital signal processing techniques to detect and characterize musical events (typically, the onset, offset and pitch of musical notes). The detection of musical notes is generally based on the harmonic model, that is, a note with a certain pitch f 0 is modeled by a sum of M sinusoidal signals whose frequencies are multiples of f 0 and whose amplitudes and phases are individually defined [3], as in: M a m cos(2πmf 0 + φ m ). (1) m=1 The output of an automatic transcription algorithm is a set of notes characterized by their pitch (that is, which note in the equal tempered scale is being played), onset (the instant when the note is played) and offset (the instant when the note is damped). Figure 1(a) shows a piano roll representation of a short excerpt. In this representation, boxes represent the detected notes. They are positioned in the row of their corresponding note and they span horizontally from their onset to their onset times. The spectrogram corresponding that same excerpt is shown in Figure 1(b). The goal of our system is to produce the piano roll representation given as input the spectrogram as well as visual information from a camera. This process needs to be carried out in real time which requires all the processing to be causal.
2 Fig. 1. Frequency (Hz) A4 G#4 G4 F#4 F4 E4 D#4 D4 C#4 C4 B3 A#3 t (s) (a) Piano roll Time (s) (b) Spectrogram. Piano roll representation and spectrogram for a short excerpt. When several harmonic signals are summed, the joint detection of their pitches becomes more difficult. This is especially true in cases where the harmonic series of multiple signals overlap, as can be seen in Figure 1(b) where the third harmonic of note F4 coincides with the fourth harmonic of note C4 at 0.55 seconds. This is reflected in the results of most automatic event detection algorithms, which often yield results that correspond to a harmonic or sub-harmonic of the ground truth. As music signals frequently contain combinations of notes with overlapping harmonics this makes the separation and detection of multiple sound sources in music more challenging, in some ways, than speech signals. Another source of errors in event detection algorithms is noise. Most event detection algorithms assume that the analyzed signal contains only audio from musical sources, but that is not true for many cases. There are noises that are inherent to the audio acquisition environment, like the noise of computer fans, heating systems, steps, or even crowds, in the case of live performances. These sources can be identified as musical notes, which may harm the performance of the detection algorithm by increasing the number of false positives. It is important to note, however, that musical note events are also correlated to certain physical gestures that are performed by the musician. It has been shown that the combination of audio and visual information tends to increase the performance of speech recognition systems [4]. The use of audio and visual information for musical transcription, to the best of our knowledge, was first proposed by Gillet and Richard [5] in a system capable of integrating data acquired from a microphone and a video camera to detect and characterize the hits on a drum kit. In this work we apply a similar approach to the live detection of note events from a vibraphone. Gillet and Richard show that the fusion of audio and video features can lead to better transcription results than the results obtained using solely audio or video. Later, Kapur et al. [6] used multiple sensor fusion to detect certain gestures in sitar performances which may easily be confused with each other if only auditory information is considered. This paper proposes a multi-modal solution for the detection of events in vibraphone performances, aimed at reducing the harmonic errors and the errors due to ambient noise when compared to the traditional audio-only solution. Three different audio signal processing methods were tested: the Non- Negative Least Squares fitting (NNLSQ) [7], the Probabilistic Latent Component Analysis (PLCA) [8], and the auditoryinspired multiple fundamental frequency detection method proposed by Klapuri [9]. The computer vision technique used in the paper relies on a video camera that is mounted on top of the vibraphone to estimate the position of the mallets in relation to the bars. This information is combined with the audio-based results so that positioning the mallet over a bar is a necessary condition for a note to be detected. The paper is organized as follows. In Section II, the proposed system and its modules are described. In Section III, the evaluation methods are described and the results are shown. Finally, in Section IV further discussions are conducted and conclusive remarks are presented. II. SYSTEM DESCRIPTION The system proposed in this paper relies on both audio and video data, which are jointly analyzed in order to detect when a musician plays a certain note. The audio data is analyzed by a real-time transcription algorithm, which consists of a signal processing front-end and an event detection method. The three signal processing front-end considered were based on NNLSQ [7], PLCA [8], and the method proposed by Klapuri [9]. The event detection method is based on adaptive thresholding. The video data is processed by a computer vision algorithm that is responsible for detecting the position of the mallets over the instruments s body. Since it is necessary to hit a certain bar with the mallet in order to produce sound, the information on the location of the mallet is used to inhibit the detection of notes that are clearly not being hit by the mallet at that moment. This Section is divided in two parts. In the first one, the computer vision algorithm and the hardware requirements for the video camera are described. In the second one, the different audio analysis algorithms are described, as well as how the audio and visual information is integrated. A. Computer vision In this work, computer vision techniques are used to track the position of the percussionist s mallet tips relative to the vibraphone bars. This technique allows us to determine which bars of the vibraphone are covered by the mallets and therefore active. The assumption is that if the mallets are covering a certain bar, then it is likely that the pitch associated with that bar will be heard. Of course there is also the possibility of
3 Fig. 2. Computer Vision Mallet Detection with a) blob detection and b) a virtual scaled copy of the vibraphone. the mallet covering a bar without the note being played as the performer moves the mallets across the instrument. Using a VGA webcam (in this work, we used the one that is integrated in the Microsoft Kinect) mounted above the vibraphone and computer vision libraries [10], we are able to track the vibraphone performer s mallet tips based on their color. This process involves filtering out everything except a desired color range, and then performing contour detection on the filtered image. The color filtering process is shown in Figure 3. The color image is first separated into Hue, Saturation, and intensity Value (HSV) components. A binary threshold is applied to each of these components, setting all values within a desired range to 1 and the rest to 0. Recombining these images by performing boolean AND operations yields an image where nearly everything except the tips of the mallet sticks have been removed. 1 if min < x < max xbinary thres = (2) 0 otherwise Contour detection is performed on the binary image using a technique called border following [11]. This technique involves traversing edges between 0 and 1 in the binary image and creating a sequence. If this sequence returns to it s starting point then a contour has been detected. For our purposes, a bounding rectangle around each contour is returned. Finally, the possible correct results are limited by area, excluding contours that are too large or too small. Tracking objects with blob detection algorithms returns position data relative to the pixel space of the camera. However, we would like these positions to exist in a virtual space that also includes a 3D model of the vibraphone bars. Using explicit measurements, a scaleable model of the vibraphone was created. Assuming linearity in the position output from the camera, our mallet positions are transformed into our virtual space using the linear normalization shown in Expression 3. Currently, a calibration stage is needed to determine the position of the vibraphone relative to the camera pixel space. pnorm = po pmin pmax pmin (3) Although this method is simple, sensitive to lighting conditions, and requires manually setting the color of the mallet tips, it has given results that are accurate enough for the purposes of this paper. Fig. 3. Computer Vision Flow Diagram B. Machine listening The audio analysis techniques we use are based on the assumption that audio signals resulting from the mixing of several different sources are, in terms of physical measures, the sum of the signals corresponding to each individual source. Similarly, the human perception of listening to that sound is essentially the superposition of the sensations triggered when listening to each individual source. Therefore, it is reasonable to assume that, to a certain extent, the phenomenon of detection of sound sources may be described using the following linear model: X = BA. (4) In that model, B is a set of basis functions forming a dictionary, that is, a set of vector representations of the sources to be identified, X is the representation of several measures of the phenomenon in the same domain as B, and A is a set of activation coefficients that represent how much each basis function (corresponding to a source) defined in B is active in each measurement. Figure 4 shows an example activation matrix, in which each line represents a different note and each column represents a different time frame. According to the harmonic model described in Expression 1, a particular note tends to have a stationary representation when considering the magnitude of its Discrete Fourier Transform (DFT). This means that the magnitude spectrum is, for harmonic signals, one possible representation that enables the use of the linear model in equation 4. The assumption of linearity in the harmonic model has been
4 Base number Time (s) Fig. 4. Example activation matrix. used successfully in several experiments, both using Non- Negative Matrix Factorization (NMF) [12], [13], in which the basis matrix B is obtained by unsupervised learning, and the NNLSQ approach [14], in which the basis matrix B is obtained previously by means of supervised learning and only the activation matrix is estimated. Although the NNLSQ requires a previous stage of training, it allows causal processing, which is essential for real-time applications. Another way of calculating a conceptually similar activation matrix is by means of the Probabilistic Latent Component Analysis (PLCA) [8]. This technique consists of assuming that both B and A denote probability distribution functions and executing the Expectation-Maximization (EM) technique to obtain them. In the implementation used in this paper, the basis matrix B is obtained in a previous stage from training data and is not adapted (supervised version). Both NNLSQ and PLCA are search algorithms that aim at minimizing the norm of the approximation error X BA so their results are similar but are different due to the optimization process utilized. An alternative method to obtain the activation matrix A is to perform an auditory inspired search for multiple fundamental frequencies, as proposed by Klapuri [9]. This search method incorporates psycho-acoustic knowledge both to design the basis matrix and to obtain the activation of each base. Klapuri s algorithm is based on calculating a salience function for each fundamental frequency candidate, which indicates how much it is active. The most salient candidate is then subtracted from the spectrum and these two steps are iteratively performed until the desired number of notes is obtained. In this paper, the algorithm was adapted so that the activation value for the four notes corresponding to the first four estimated fundamental frequencies are equal to their saliences, and the activation for the other notes are equal to their saliences after the removal of those four notes. The audio processing algorithms rely on a framewise spectrogram representation, calculated as follows. The input signal is sampled at 48 khz and processed in discrete frames of 46ms, with a 23 ms overlap. Each frame is multiplied by a Hanning window, zero-padded to twice its original length and is transformed to the frequency domain using the DFT, obtaining X. In order to minimize the influence of of variation in harmonic amplitudes, X is logarithmically scaled as in Y = log 10 (1+ X ) [14] (the logarithmic scaling is bypassed when using Klapuri s method, as it already performs spectral normalization [9]). Finally, the frequency domain representation is trimmed in frequency ignoring values ignore values outside the frequency range in which the vibraphone operates. The basis functions are obtained by taking the spectrogram of a recording containing a single note hit and averaging the first few frames. Only these few frames are used because, in the vibraphone sounds, the harmonics decay quickly. Using more data from a particular note would converge to a spectrogram that would be dominated by the fundamental frequency of the series instead of all the harmonics. The spectrogram is then normalized in order to have unity variance (but not zero mean). The normalization is used in order to ensure that the values in each row of the activation matrix A represent the energy of the activation of the corresponding note. For the factorization-based methods (NNLSQ and PLCA), a set of additional basis functions, called noise basis functions, are also included in the basis matrix (or dictionary). Noise basis functions are calculated as triangles in the frequency domain, which overlap by 50% and have center frequencies that start at 20 Hz and increase by one octave from one noise base to the next one. This shape aims to give the system flexibility in modeling background noise, specifically nonharmonic sounds, as filtered white noise. This way, background noise is less likely to be modeled as a sum of basis vectors corresponding to notes, that is, background noise is less expected to affect the contents of the activation matrix. For each incoming frame of audio, the activation vector is calculated. and is then analyzed by an event detection method that yields decisions as to when notes are played. The method works as follows. In order to trigger the detection of a note, the corresponding value in the activation matrix must be over a fixed threshold α and its derivative (that is, a n,t a n,t 1 ), must be over another threshold β. When a note is found, an adaptive threshold τ value is set to that level multiplied by an overshoot factor γ. The adaptive threshold decays linearly at a known rate θ following the rule τ t+1 = τ t (1 θ). A new note may only be detected if its activation value is greater than τ, in addition to the thresholding rules regarding α and β. Figure 5 shows artificial activation values and values for the detection threshold max(α, τ). Three positions are marked, showing when false positives are avoided due to the fixed threshold related to α, the adaptive threshold defined by γ and θ and, finally, due to the minimum derivative rule related to β. Additionally to the thresholding related to each individual Activation Fixed threshold Adaptive threshold Time Activation Detection threshold Derivative Fig. 5. Example detection by thresholding. False positives are avoided due to the fixed threshold, the adaptive threshold and the derivative rule. note, the system deals with polyphony by assuming that a
5 certain activation level only denotes an onset if it is greater than a ratio φ of the sum of all activation values for that frame. If the audio signal passes all tests described above, the estimated bar for the mallet using the computer vision algorithm is used as a necessary condition for detecting a particular note. With this additional condition, it is expected that octave errors and other common mistakes regarding ghost notes can be avoided. It was observed, however, that the signal containing position data was noisy, presenting several false negatives. To deal with that problem, the signal was transformed in a multichannel binary sinal, where a 1 in channel b means that the mallet is detected over bar b during a certain time frame. The length and hop size of the frames, for this processing, are identical to the ones used in the audio processing algorithm. After this conversion, the binary signal is filtered with a decay filter, defined as: { y[n] = x[n] x[n] x[n 1] dx[n 1] x[n] < x[n 1], (5) where the coefficient d is the decay amount. The filtered signal is thresholded at 0.5 (that is, is equal to 0 if y[n] < 0.5 and equal to 1 otherwise). This smoothing process forces the system consider that the mallet is over a bar if it was actually detected there in the last few frames, inhibiting false negatives in that detection. In the next section, the objective evaluation of the effects of using this multi-modal approach for transcription are considered. III. EVALUATION The experiments described in this section were conducted using audio and video recordings of an acoustic vibraphone. The acoustic data was gathered using a microphone placed in the middle of the vibraphone. The video data was acquired by a videocamera placed on top of the instrument. The recordings intentionally contain harmonically-related notes and heavy use of the sustain pedal. The Signal-to-Noise Ratio varied between 40 db and 40 db, both due to the room (heating system, computer fans, etc.), the instrument itself, and artificiallymixed crowd noise. The values for the parameter used in the experiments described here were α = 0.05, β = 0.1, γ = 1.2, θ = 0.1, φ = 0.2 and d = 0.9. These parameters were obtained empirically, by observing the typical values found in the activation matrix and optimizing for better results. The melodies in the evaluation dataset were manually annotated, providing a ground-truth for evaluation. The evaluation data consisted of a total of around 5 minutes of recordings, with 150 annotated notes. The melodic patterns intentionally include polyphony with harmonically-related intervals such as major thirds, fifths and octaves. The number of simultaneous notes varied from one, in simple monophonic phrases, to eight, when the sustain pedal was used. The dataset, as well as the source code used are available upon request. The transcription algorithm was executed over the evaluation data, yielding a series of symbols. In order to evaluate the performance of the system, the symbols in the automatic transcription are matched to the symbols in the ground truth using a proximity criterion [15]. This matching procedure finds out what note in the automatic transcription was the best attempt (considering a distance measure that accounts for time and pitch deviation) to describe each note in the ground truth. A particular note in the automatic transcription is considered correct if its pitch is equal to the pitch in the ground truth and its onset does not deviate from the ground truth by more than 100 ms. The time tolerance accounts for an inevitable deviation due the framewise nature of the transcription algorithm as well as an inherent human error when the ground truth reference is built. This allows calculating the Recall (R, number of correct events divided by the total number of events in the ground truth), Precision (P, number of correct events divided by the total number of yielded events) and, finally, their harmonic mean called the F-Measure (F = 2RP/(R + P )). This process was executed utilizing the three different audio analysis algorithms (Klapuri s, PLCA and NNLSQ). All three of them were also tested in a configuration that does not use computer vision (blind ). We also consider a modification of the algorithm using only vision data (that is, a deaf algorithm), which works by considering an onset as soon as a mallet is placed over a bar and an offset when it is removed from the bar. Two different pieces were used, one of them with heavy use of the sustain pedal and another without using the sustain pedal. The F-Measure for both of them was calculated in each test, and the average of them is reported in Figure 6. Fig. 6. Average F-Measure Deaf Blind Multimodal Klapuri PLCA NNLSQ Average F-Measure for each modification of the proposed algorithm. The Recall, Precision and F-Measure for each variation of the algorithm are averaged over the two teste tracks and reported in Table I. TABLE I RESULTS FOR DIFFERENT ALGORITMS AND PIECES. Variant Audio R P F Deaf NNLSQ Blind PLCA Klapuri NNLSQ Multimodal PLCA Klapuri The results above show that the use of computer vision information has consistently improved the transcription accuracy of the tested systems. Also, regardless of the method used to
6 obtain the activation matrix A, the improvement obtained for the results were roughly the same. In order to evaluate the changes in the pitch errors due to the use of the vision algorithm, a more detailed evaluation was held. In this evaluation, each note yielded by the transcription system was related to the note that is the closest to it in the ground truth, according to the same distance metric used to obtain the results above [15]. The pitch errors were classified as octave errors (if the same pitch class was found, but in a different octave), tone errors (if the pitch difference is equal to one or two semitones) and other errors (for other pitch deviations). The fraction of errors in each of these categories, considering the blind and multimodal variants of the method that performed best in our experimentos NNLSQ as well as the deaf approach, are shown in Figure 7. Fig. 7. Errors (%) Deaf Blind Multimodal Tone Other Octave Pitch errors for deaf and NNLSQ-based variants of the algorithm. As it can be seen, when the multimodal approach is used, the fraction of octave errors was reduced. At the same time, it can be seen that the fraction tone errors increases, representing a performance tradeoff indicating further possibilities for improvement. IV. CONCLUSION This paper has presented a method for the multi-modal real time transcription of the vibraphone. The method relies on a causal transcription algorithm that operates in parallel with a computer vision algorithm. The transcription algorithm is responsible to detect, in the audio signal, characteristics that indicate note onsets. The computer vision algorithm yields information on the position of the mallets. Both are combined so that an onset is only detected if the transcription algorithm indicates so and, at the same time, the mallet is over the corresponding bar of the instrument. Three different signal processing techniques were tested for the audio analysis part of the algorithm: factorization using NNLSQ [7] and PLCA [8], and the multiple fundamental frequency method proposed by Klapuri [9]. The results obtained show that using any of these algorithms result in roughly the same transcription accuracy. Results also show that, regardless of the signal processing used for audio analysis, the use of information from computer vision has consistently improved the transcription accuracy. The proposed system does not present enough accuracy for applications such as performance analysis or documentation. However, it may still be used in applications that support more errors in the transcription, like chord recognition, automatic harmonization or creative soundscape design. The are several interesting directions for future work. The computer vision algorithm could be modified tracking algorithm would work in different lighting conditions an important factor for live performance applications and in three dimensions, so that the proximity between the mallet and a certain bar can be measured more accurately. We also plan to explore how our systems can be used in applications such as automaic score following, chord detection and harmonization that can take advantage of imperfect transcription results. ACKNOWLEDGMENT Tiago Fernandes Tavares thanks CAPES - Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior - for funding. The support of the National Science and Research Council of Canada is gratefully acknowledged. The authors thank Shawn Trail for the assistance with the vibraphone and the research as well as Steven Ness for help with the figures. REFERENCES [1] T.Machover and J.Chung, Hyperinstruments: Musically intelligent and interactive performance and creativity systems hyperinstruments, in In Proc. Intl. Computer Music Conference, [2] C. Traube, P. Depalle, and M. Wanderley, Indirect acquisition of instrumental gesture based on signal, physical, and perceptual information, in In Proceedings of the 2003 Conference on New Interfaces for Musical Expression (NIME-03, 2003, pp [3] H. F. Olson, Music, Physics and Engineering, 2nd ed. Dover Publications Inc., [4] G. Galatas, G. Potamianos, A. Papangelis, and F. Makedon, Audio visual speech recognition in noisy visual environments, in Proceedings of the 4th International Conference on PErvasive Technologies Related to Assistive Environments, ser. PETRA 11. New York, NY, USA: ACM, 2011, pp. 19:1 19:4. [Online]. Available: [5] O. Gillet and G. Richard, Automatic transcription of drum sequences using audiovisual features, in Acoustics, Speech, and Signal Processing, Proceedings. (ICASSP 05). IEEE International Conference on, vol. 3, march 2005, pp. iii/205 iii/208 Vol. 3. [6] A. Kapur, G. Percival, M. Lagrange, and G. Tzanetakis, Pedagogical transcription for multimodal sitar performance, in International Society for Musical Information Retrieval, september [7] R. J. Hanson and C. L. Lawson, Solving least squares problems. Philadelphia, [8] P. Smaragdis, B. Raj, and M. Shashanka, Sparse and shift-invariant feature extraction from non-negative data, in Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on, april , pp [9] A. Klapuri, Multiple fundamental frequency estimation by summing harmonic amplitudes, in Proc. 7th International Conference on Music Information Retrieval, Victoria, BC, Canada, Oct. 2006, pp [10] Intel Corporation, Open Source Computer Vision Library. USA: [11] S. Suzuki and K. be, Topological structural analysis of digitized binary images by border following, Computer Vision, Graphics, and Image Processing, vol. 30, no. 1, pp , [Online]. Available: [12] N. Bertin, R. Badeau, and G. Richard, Blind signal decompositions for automatic transcription of polyphonic music: Nmf and k-svd on the benchmark, in Acoustics, Speech and Signal Processing, ICASSP IEEE International Conference on, vol. 1, april 2007, pp. I 65 I 68. [13] S. Phon-Amnuaisuk, Transcribing bach chorales using non-negative matrix factorisation, in Audio Language and Image Processing (ICALIP), 2010 International Conference on, nov. 2010, pp [14] B. Niedermayer, Non-negative matrix division for the automatic transcription of polyphonic music, in Proceedings of the ISMIR, [15] T. F. Tavares, J. G. A. Barbedo, and A. Lopes, Towards the evaluation of automatic transcription of music, in Anais do VI Congresso de Engeharia de udio, may 2008, pp
Automatic Transcription: An Enabling Technology for Music Analysis
Automatic Transcription: An Enabling Technology for Music Analysis Simon Dixon [email protected] Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University
Recent advances in Digital Music Processing and Indexing
Recent advances in Digital Music Processing and Indexing Acoustics 08 warm-up TELECOM ParisTech Gaël RICHARD Telecom ParisTech (ENST) www.enst.fr/~grichard/ Content Introduction and Applications Components
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.
This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore. Title Transcription of polyphonic signals using fast filter bank( Accepted version ) Author(s) Foo, Say Wei;
A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song
, pp.347-354 http://dx.doi.org/10.14257/ijmue.2014.9.8.32 A Sound Analysis and Synthesis System for Generating an Instrumental Piri Song Myeongsu Kang and Jong-Myon Kim School of Electrical Engineering,
MUSICAL INSTRUMENT FAMILY CLASSIFICATION
MUSICAL INSTRUMENT FAMILY CLASSIFICATION Ricardo A. Garcia Media Lab, Massachusetts Institute of Technology 0 Ames Street Room E5-40, Cambridge, MA 039 USA PH: 67-53-0 FAX: 67-58-664 e-mail: rago @ media.
Speech Signal Processing: An Overview
Speech Signal Processing: An Overview S. R. M. Prasanna Department of Electronics and Electrical Engineering Indian Institute of Technology Guwahati December, 2012 Prasanna (EMST Lab, EEE, IITG) Speech
Separation and Classification of Harmonic Sounds for Singing Voice Detection
Separation and Classification of Harmonic Sounds for Singing Voice Detection Martín Rocamora and Alvaro Pardo Institute of Electrical Engineering - School of Engineering Universidad de la República, Uruguay
Email: [email protected]
USE OF VIRTUAL INSTRUMENTS IN RADIO AND ATMOSPHERIC EXPERIMENTS P.N. VIJAYAKUMAR, THOMAS JOHN AND S.C. GARG RADIO AND ATMOSPHERIC SCIENCE DIVISION, NATIONAL PHYSICAL LABORATORY, NEW DELHI 110012, INDIA
An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network
Proceedings of the 8th WSEAS Int. Conf. on ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING & DATA BASES (AIKED '9) ISSN: 179-519 435 ISBN: 978-96-474-51-2 An Energy-Based Vehicle Tracking System using Principal
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers
Automatic Detection of Emergency Vehicles for Hearing Impaired Drivers Sung-won ark and Jose Trevino Texas A&M University-Kingsville, EE/CS Department, MSC 92, Kingsville, TX 78363 TEL (36) 593-2638, FAX
VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS
VEHICLE TRACKING USING ACOUSTIC AND VIDEO SENSORS Aswin C Sankaranayanan, Qinfen Zheng, Rama Chellappa University of Maryland College Park, MD - 277 {aswch, qinfen, rama}@cfar.umd.edu Volkan Cevher, James
Auto-Tuning Using Fourier Coefficients
Auto-Tuning Using Fourier Coefficients Math 56 Tom Whalen May 20, 2013 The Fourier transform is an integral part of signal processing of any kind. To be able to analyze an input signal as a superposition
Artificial Neural Network for Speech Recognition
Artificial Neural Network for Speech Recognition Austin Marshall March 3, 2005 2nd Annual Student Research Showcase Overview Presenting an Artificial Neural Network to recognize and classify speech Spoken
Environmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
B3. Short Time Fourier Transform (STFT)
B3. Short Time Fourier Transform (STFT) Objectives: Understand the concept of a time varying frequency spectrum and the spectrogram Understand the effect of different windows on the spectrogram; Understand
Analysis/resynthesis with the short time Fourier transform
Analysis/resynthesis with the short time Fourier transform summer 2006 lecture on analysis, modeling and transformation of audio signals Axel Röbel Institute of communication science TU-Berlin IRCAM Analysis/Synthesis
Onset Detection and Music Transcription for the Irish Tin Whistle
ISSC 24, Belfast, June 3 - July 2 Onset Detection and Music Transcription for the Irish Tin Whistle Mikel Gainza φ, Bob Lawlor* and Eugene Coyle φ φ Digital Media Centre Dublin Institute of Technology
Assessment. Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall
Automatic Photo Quality Assessment Presenter: Yupu Zhang, Guoliang Jin, Tuo Wang Computer Vision 2008 Fall Estimating i the photorealism of images: Distinguishing i i paintings from photographs h Florin
GETTING STARTED WITH LABVIEW POINT-BY-POINT VIS
USER GUIDE GETTING STARTED WITH LABVIEW POINT-BY-POINT VIS Contents Using the LabVIEW Point-By-Point VI Libraries... 2 Initializing Point-By-Point VIs... 3 Frequently Asked Questions... 5 What Are the
Wavelet analysis. Wavelet requirements. Example signals. Stationary signal 2 Hz + 10 Hz + 20Hz. Zero mean, oscillatory (wave) Fast decay (let)
Wavelet analysis In the case of Fourier series, the orthonormal basis is generated by integral dilation of a single function e jx Every 2π-periodic square-integrable function is generated by a superposition
Ericsson T18s Voice Dialing Simulator
Ericsson T18s Voice Dialing Simulator Mauricio Aracena Kovacevic, Anna Dehlbom, Jakob Ekeberg, Guillaume Gariazzo, Eric Lästh and Vanessa Troncoso Dept. of Signals Sensors and Systems Royal Institute of
Piano Transcription and Radio Travel Model
Unsupervised Transcription of Piano Music Taylor Berg-Kirkpatrick Jacob Andreas Dan Klein Computer Science Division University of California, Berkeley {tberg,jda,klein}@cs.berkeley.edu Abstract We present
EVALUATION OF A SCORE-INFORMED SOURCE SEPARATION SYSTEM
EVALUATION OF A SCORE-INFORMED SOURCE SEPARATION SYSTEM Joachim Ganseman, Paul Scheunders IBBT - Visielab Department of Physics, University of Antwerp 2000 Antwerp, Belgium Gautham J. Mysore, Jonathan
ANALYZER BASICS WHAT IS AN FFT SPECTRUM ANALYZER? 2-1
WHAT IS AN FFT SPECTRUM ANALYZER? ANALYZER BASICS The SR760 FFT Spectrum Analyzer takes a time varying input signal, like you would see on an oscilloscope trace, and computes its frequency spectrum. Fourier's
Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT)
Page 1 Electronic Communications Committee (ECC) within the European Conference of Postal and Telecommunications Administrations (CEPT) ECC RECOMMENDATION (06)01 Bandwidth measurements using FFT techniques
LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE. [email protected]
LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE 1 S.Manikandan, 2 S.Abirami, 2 R.Indumathi, 2 R.Nandhini, 2 T.Nanthini 1 Assistant Professor, VSA group of institution, Salem. 2 BE(ECE), VSA
The Effect of Network Cabling on Bit Error Rate Performance. By Paul Kish NORDX/CDT
The Effect of Network Cabling on Bit Error Rate Performance By Paul Kish NORDX/CDT Table of Contents Introduction... 2 Probability of Causing Errors... 3 Noise Sources Contributing to Errors... 4 Bit Error
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES. Valentin Emiya, Roland Badeau, Bertrand David
AUTOMATIC TRANSCRIPTION OF PIANO MUSIC BASED ON HMM TRACKING OF JOINTLY-ESTIMATED PITCHES Valentin Emiya, Roland Badeau, Bertrand David TELECOM ParisTech (ENST, CNRS LTCI 46, rue Barrault, 75634 Paris
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning SAMSI 10 May 2013 Outline Introduction to NMF Applications Motivations NMF as a middle step
PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS
PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUM OF REFERENCE SYMBOLS Benjamin R. Wiederholt The MITRE Corporation Bedford, MA and Mario A. Blanco The MITRE
Automatic Labeling of Lane Markings for Autonomous Vehicles
Automatic Labeling of Lane Markings for Autonomous Vehicles Jeffrey Kiske Stanford University 450 Serra Mall, Stanford, CA 94305 [email protected] 1. Introduction As autonomous vehicles become more popular,
Mathematical Harmonies Mark Petersen
1 Mathematical Harmonies Mark Petersen What is music? When you hear a flutist, a signal is sent from her fingers to your ears. As the flute is played, it vibrates. The vibrations travel through the air
CHAPTER 6 Frequency Response, Bode Plots, and Resonance
ELECTRICAL CHAPTER 6 Frequency Response, Bode Plots, and Resonance 1. State the fundamental concepts of Fourier analysis. 2. Determine the output of a filter for a given input consisting of sinusoidal
SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY
3 th World Conference on Earthquake Engineering Vancouver, B.C., Canada August -6, 24 Paper No. 296 SOFTWARE FOR GENERATION OF SPECTRUM COMPATIBLE TIME HISTORY ASHOK KUMAR SUMMARY One of the important
1 Multi-channel frequency division multiplex frequency modulation (FDM-FM) emissions
Rec. ITU-R SM.853-1 1 RECOMMENDATION ITU-R SM.853-1 NECESSARY BANDWIDTH (Question ITU-R 77/1) Rec. ITU-R SM.853-1 (1992-1997) The ITU Radiocommunication Assembly, considering a) that the concept of necessary
A Study on M2M-based AR Multiple Objects Loading Technology using PPHT
A Study on M2M-based AR Multiple Objects Loading Technology using PPHT Sungmo Jung, Seoksoo Kim * Department of Multimedia Hannam University 133, Ojeong-dong, Daedeok-gu, Daejeon-city Korea [email protected],
MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC
12th International Society for Music Information Retrieval Conference (ISMIR 2011) MODELING DYNAMIC PATTERNS FOR EMOTIONAL CONTENT IN MUSIC Yonatan Vaizman Edmond & Lily Safra Center for Brain Sciences,
Building an Advanced Invariant Real-Time Human Tracking System
UDC 004.41 Building an Advanced Invariant Real-Time Human Tracking System Fayez Idris 1, Mazen Abu_Zaher 2, Rashad J. Rasras 3, and Ibrahiem M. M. El Emary 4 1 School of Informatics and Computing, German-Jordanian
Advanced Signal Processing and Digital Noise Reduction
Advanced Signal Processing and Digital Noise Reduction Saeed V. Vaseghi Queen's University of Belfast UK WILEY HTEUBNER A Partnership between John Wiley & Sons and B. G. Teubner Publishers Chichester New
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
Should we Really Care about Building Business. Cycle Coincident Indexes!
Should we Really Care about Building Business Cycle Coincident Indexes! Alain Hecq University of Maastricht The Netherlands August 2, 2004 Abstract Quite often, the goal of the game when developing new
Audio Engineering Society. Convention Paper. Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA
Audio Engineering Society Convention Paper Presented at the 129th Convention 2010 November 4 7 San Francisco, CA, USA The papers at this Convention have been selected on the basis of a submitted abstract
HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER
HSI BASED COLOUR IMAGE EQUALIZATION USING ITERATIVE n th ROOT AND n th POWER Gholamreza Anbarjafari icv Group, IMS Lab, Institute of Technology, University of Tartu, Tartu 50411, Estonia [email protected]
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches
Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic
Manual Analysis Software AFD 1201
AFD 1200 - AcoustiTube Manual Analysis Software AFD 1201 Measurement of Transmission loss acc. to Song and Bolton 1 Table of Contents Introduction - Analysis Software AFD 1201... 3 AFD 1200 - AcoustiTube
A causal algorithm for beat-tracking
A causal algorithm for beat-tracking Benoit Meudic Ircam - Centre Pompidou 1, place Igor Stravinsky, 75004 Paris, France [email protected] ABSTRACT This paper presents a system which can perform automatic
Little LFO. Little LFO. User Manual. by Little IO Co.
1 Little LFO User Manual Little LFO by Little IO Co. 2 Contents Overview Oscillator Status Switch Status Light Oscillator Label Volume and Envelope Volume Envelope Attack (ATT) Decay (DEC) Sustain (SUS)
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, 2013. ACCEPTED FOR PUBLICATION 1 Active-Set Newton Algorithm for Overcomplete Non-Negative Representations of Audio Tuomas Virtanen, Member,
TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY
4 4th International Workshop on Acoustic Signal Enhancement (IWAENC) TRAFFIC MONITORING WITH AD-HOC MICROPHONE ARRAY Takuya Toyoda, Nobutaka Ono,3, Shigeki Miyabe, Takeshi Yamada, Shoji Makino University
Solutions to Exam in Speech Signal Processing EN2300
Solutions to Exam in Speech Signal Processing EN23 Date: Thursday, Dec 2, 8: 3: Place: Allowed: Grades: Language: Solutions: Q34, Q36 Beta Math Handbook (or corresponding), calculator with empty memory.
USB 3.0 CDR Model White Paper Revision 0.5
USB 3.0 CDR Model White Paper Revision 0.5 January 15, 2009 INTELLECTUAL PROPERTY DISCLAIMER THIS WHITE PAPER IS PROVIDED TO YOU AS IS WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY,
The Computer Music Tutorial
Curtis Roads with John Strawn, Curtis Abbott, John Gordon, and Philip Greenspun The Computer Music Tutorial Corrections and Revisions The MIT Press Cambridge, Massachusetts London, England 2 Corrections
Impedance 50 (75 connectors via adapters)
VECTOR NETWORK ANALYZER PLANAR TR1300/1 DATA SHEET Frequency range: 300 khz to 1.3 GHz Measured parameters: S11, S21 Dynamic range of transmission measurement magnitude: 130 db Measurement time per point:
Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.
Broadband Networks Prof. Dr. Abhay Karandikar Electrical Engineering Department Indian Institute of Technology, Bombay Lecture - 29 Voice over IP So, today we will discuss about voice over IP and internet
FAST MIR IN A SPARSE TRANSFORM DOMAIN
ISMIR 28 Session 4c Automatic Music Analysis and Transcription FAST MIR IN A SPARSE TRANSFORM DOMAIN Emmanuel Ravelli Université Paris 6 [email protected] Gaël Richard TELECOM ParisTech [email protected]
REAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING
REAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING Ms.PALLAVI CHOUDEKAR Ajay Kumar Garg Engineering College, Department of electrical and electronics Ms.SAYANTI BANERJEE Ajay Kumar Garg Engineering
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication
Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication Thomas Reilly Data Physics Corporation 1741 Technology Drive, Suite 260 San Jose, CA 95110 (408) 216-8440 This paper
Neovision2 Performance Evaluation Protocol
Neovision2 Performance Evaluation Protocol Version 3.0 4/16/2012 Public Release Prepared by Rajmadhan Ekambaram [email protected] Dmitry Goldgof, Ph.D. [email protected] Rangachar Kasturi, Ph.D.
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING
LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING RasPi Kaveri Ratanpara 1, Priyan Shah 2 1 Student, M.E Biomedical Engineering, Government Engineering college, Sector-28, Gandhinagar (Gujarat)-382028,
Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin
Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error
Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA
Are Image Quality Metrics Adequate to Evaluate the Quality of Geometric Objects? Bernice E. Rogowitz and Holly E. Rushmeier IBM TJ Watson Research Center, P.O. Box 704, Yorktown Heights, NY USA ABSTRACT
Optimizing IP3 and ACPR Measurements
Optimizing IP3 and ACPR Measurements Table of Contents 1. Overview... 2 2. Theory of Intermodulation Distortion... 2 3. Optimizing IP3 Measurements... 4 4. Theory of Adjacent Channel Power Ratio... 9 5.
Speed Performance Improvement of Vehicle Blob Tracking System
Speed Performance Improvement of Vehicle Blob Tracking System Sung Chun Lee and Ram Nevatia University of Southern California, Los Angeles, CA 90089, USA [email protected], [email protected] Abstract. A speed
Analecta Vol. 8, No. 2 ISSN 2064-7964
EXPERIMENTAL APPLICATIONS OF ARTIFICIAL NEURAL NETWORKS IN ENGINEERING PROCESSING SYSTEM S. Dadvandipour Institute of Information Engineering, University of Miskolc, Egyetemváros, 3515, Miskolc, Hungary,
Doppler Effect Plug-in in Music Production and Engineering
, pp.287-292 http://dx.doi.org/10.14257/ijmue.2014.9.8.26 Doppler Effect Plug-in in Music Production and Engineering Yoemun Yun Department of Applied Music, Chungwoon University San 29, Namjang-ri, Hongseong,
How To Filter Spam Image From A Picture By Color Or Color
Image Content-Based Email Spam Image Filtering Jianyi Wang and Kazuki Katagishi Abstract With the population of Internet around the world, email has become one of the main methods of communication among
A Novel Method to Improve Resolution of Satellite Images Using DWT and Interpolation
A Novel Method to Improve Resolution of Satellite Images Using DWT and Interpolation S.VENKATA RAMANA ¹, S. NARAYANA REDDY ² M.Tech student, Department of ECE, SVU college of Engineering, Tirupati, 517502,
Automatic Evaluation Software for Contact Centre Agents voice Handling Performance
International Journal of Scientific and Research Publications, Volume 5, Issue 1, January 2015 1 Automatic Evaluation Software for Contact Centre Agents voice Handling Performance K.K.A. Nipuni N. Perera,
Simultaneous Gamma Correction and Registration in the Frequency Domain
Simultaneous Gamma Correction and Registration in the Frequency Domain Alexander Wong [email protected] William Bishop [email protected] Department of Electrical and Computer Engineering University
A Segmentation Algorithm for Zebra Finch Song at the Note Level. Ping Du and Todd W. Troyer
A Segmentation Algorithm for Zebra Finch Song at the Note Level Ping Du and Todd W. Troyer Neuroscience and Cognitive Science Program, Dept. of Psychology University of Maryland, College Park, MD 20742
Evaluating System Suitability CE, GC, LC and A/D ChemStation Revisions: A.03.0x- A.08.0x
CE, GC, LC and A/D ChemStation Revisions: A.03.0x- A.08.0x This document is believed to be accurate and up-to-date. However, Agilent Technologies, Inc. cannot assume responsibility for the use of this
Sound Power Measurement
Sound Power Measurement A sound source will radiate different sound powers in different environments, especially at low frequencies when the wavelength is comparable to the size of the room 1. Fortunately
AN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD INTRODUCTION CEA-2010 (ANSI) TEST PROCEDURE
AUDIOMATICA AN-007 APPLICATION NOTE MEASURING MAXIMUM SUBWOOFER OUTPUT ACCORDING ANSI/CEA-2010 STANDARD by Daniele Ponteggia - [email protected] INTRODUCTION The Consumer Electronics Association (CEA),
Music Mood Classification
Music Mood Classification CS 229 Project Report Jose Padial Ashish Goel Introduction The aim of the project was to develop a music mood classifier. There are many categories of mood into which songs may
Drum-Set Tuning Guide
Drum-Set Tuning Guide Tune-Bot enables you to accurately tune your drums to a specific notes or frequencies and once you know the notes or frequencies you want, you can quickly tune and retune your drums.
System Identification for Acoustic Comms.:
System Identification for Acoustic Comms.: New Insights and Approaches for Tracking Sparse and Rapidly Fluctuating Channels Weichang Li and James Preisig Woods Hole Oceanographic Institution The demodulation
Non-Data Aided Carrier Offset Compensation for SDR Implementation
Non-Data Aided Carrier Offset Compensation for SDR Implementation Anders Riis Jensen 1, Niels Terp Kjeldgaard Jørgensen 1 Kim Laugesen 1, Yannick Le Moullec 1,2 1 Department of Electronic Systems, 2 Center
Signal to Noise Instrumental Excel Assignment
Signal to Noise Instrumental Excel Assignment Instrumental methods, as all techniques involved in physical measurements, are limited by both the precision and accuracy. The precision and accuracy of a
Establishing the Uniqueness of the Human Voice for Security Applications
Proceedings of Student/Faculty Research Day, CSIS, Pace University, May 7th, 2004 Establishing the Uniqueness of the Human Voice for Security Applications Naresh P. Trilok, Sung-Hyuk Cha, and Charles C.
Audio Coding Algorithm for One-Segment Broadcasting
Audio Coding Algorithm for One-Segment Broadcasting V Masanao Suzuki V Yasuji Ota V Takashi Itoh (Manuscript received November 29, 2007) With the recent progress in coding technologies, a more efficient
MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music
ISO/IEC MPEG USAC Unified Speech and Audio Coding MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music The standardization of MPEG USAC in ISO/IEC is now in its final
Quarterly Progress and Status Report. Measuring inharmonicity through pitch extraction
Dept. for Speech, Music and Hearing Quarterly Progress and Status Report Measuring inharmonicity through pitch extraction Galembo, A. and Askenfelt, A. journal: STL-QPSR volume: 35 number: 1 year: 1994
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
A Method of Caption Detection in News Video
3rd International Conference on Multimedia Technology(ICMT 3) A Method of Caption Detection in News Video He HUANG, Ping SHI Abstract. News video is one of the most important media for people to get information.
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS
USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS Natarajan Meghanathan Jackson State University, 1400 Lynch St, Jackson, MS, USA [email protected]
DYNAMIC RANGE IMPROVEMENT THROUGH MULTIPLE EXPOSURES. Mark A. Robertson, Sean Borman, and Robert L. Stevenson
c 1999 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or
Face detection is a process of localizing and extracting the face region from the
Chapter 4 FACE NORMALIZATION 4.1 INTRODUCTION Face detection is a process of localizing and extracting the face region from the background. The detected face varies in rotation, brightness, size, etc.
An accent-based approach to performance rendering: Music theory meets music psychology
International Symposium on Performance Science ISBN 978-94-90306-02-1 The Author 2011, Published by the AEC All rights reserved An accent-based approach to performance rendering: Music theory meets music
