Audio Signal Separation and Classification: A Review Paper

Similar documents
ONE of the most challenging problems addressed by the

A Similarity Search Scheme over Encrypted Cloud Images based on Secure Transformation

Face Hallucination and Recognition

Secure Network Coding with a Cost Criterion

An Idiot s guide to Support vector machines (SVMs)

Australian Bureau of Statistics Management of Business Providers

Simultaneous Routing and Power Allocation in CDMA Wireless Data Networks

COMPARISON OF DIFFUSION MODELS IN ASTRONOMICAL OBJECT LOCALIZATION

FRAME BASED TEXTURE CLASSIFICATION BY CONSIDERING VARIOUS SPATIAL NEIGHBORHOODS. Karl Skretting and John Håkon Husøy

Multi-Robot Task Scheduling

Vendor Performance Measurement Using Fuzzy Logic Controller

SABRe B2.1: Design & Development. Supplier Briefing Pack.

A Supplier Evaluation System for Automotive Industry According To Iso/Ts Requirements

A Latent Variable Pairwise Classification Model of a Clustering Ensemble

Electronic Commerce Research and Applications

CONTRIBUTION OF INTERNAL AUDITING IN THE VALUE OF A NURSING UNIT WITHIN THREE YEARS

US Al (19) United States (12) Patent Application Publication (10) Pub. No.: US 2006/ A1 Wu (57) A sender is selectively input- S301

Chapter 3: e-business Integration Patterns

BLIND SOURCE SEPARATION OF SPEECH AND BACKGROUND MUSIC FOR IMPROVED SPEECH RECOGNITION

3.3 SOFTWARE RISK MANAGEMENT (SRM)

Telephony Trainers with Discovery Software

Teamwork. Abstract. 2.1 Overview

Virtual trunk simulation

Traffic classification-based spam filter

Dynamic Pricing Trade Market for Shared Resources in IIU Federated Cloud

A New Statistical Approach to Network Anomaly Detection

University of Southern California

Network/Communicational Vulnerability

Spherical Correlation of Visual Representations for 3D Model Retrieval

Leakage detection in water pipe networks using a Bayesian probabilistic framework

ABSTRACT. Categories and Subject Descriptors. General Terms. Keywords 1. INTRODUCTION. Jun Yin, Ye Wang and David Hsu

A 3D Motion Trajectory Signature Identification for Human- Computer Interaction (HCI) Applications

Learning from evaluations Processes and instruments used by GIZ as a learning organisation and their contribution to interorganisational learning

An Integrated Data Management Framework of Wireless Sensor Network

Figure 1. A Simple Centrifugal Speed Governor.

Fast Robust Hashing. ) [7] will be re-mapped (and therefore discarded), due to the load-balancing property of hashing.

Creat-Poreen Power Electronics Co., Ltd

ACO and SVM Selection Feature Weighting of Network Intrusion Detection Method

READING A CREDIT REPORT

Restoration of blue scratches in digital image sequences

GREEN: An Active Queue Management Algorithm for a Self Managed Internet

Fixed income managers: evolution or revolution

Lecture 7 Datalink Ethernet, Home. Datalink Layer Architectures

Sentiment Analysis with Global Topics and Local Dependency

Presented at the 107th Convention 1999 September New York

IEICE TRANS. INF. & SYST., VOL.E200 D, NO.1 JANUARY

EFFICIENT CLUSTERING OF VERY LARGE DOCUMENT COLLECTIONS

Swingtum A Computational Theory of Fractal Dynamic Swings and Physical Cycles of Stock Market in A Quantum Price-Time Space

Enabling Direct Interest-Aware Audience Selection

SELECTING THE SUITABLE ERP SYSTEM: A FUZZY AHP APPROACH. Ufuk Cebeci

Art of Java Web Development By Neal Ford 624 pages US$44.95 Manning Publications, 2004 ISBN:

3.5 Pendulum period :40:05 UTC / rev 4d4a39156f1e. g = 4π2 l T 2. g = 4π2 x1 m 4 s 2 = π 2 m s Pendulum period 68

Pricing Internet Services With Multiple Providers

How To Restore A Bue Scratch In Digita Image Sequences

Assignment of Multiplicative Mixtures in Natural Images

Measuring operational risk in financial institutions

Maintenance activities planning and grouping for complex structure systems

No. of Pages 15, Model 5G ARTICLE IN PRESS. Contents lists available at ScienceDirect. Electronic Commerce Research and Applications

Chapter 6 Signal Data Mining from Wearable Systems

Advanced ColdFusion 4.0 Application Development Server Clustering Using Bright Tiger

Nordic Ecolabelling of Copy and printing paper - supplementary module

Chapter 3: JavaScript in Action Page 1 of 10. How to practice reading and writing JavaScript on a Web page

Precise assessment of partial discharge in underground MV/HV power cables and terminations

Oracle Hyperion Tax Provision. User's Guide Release

Fast Image Acquisition in Pulse-Echo Ultrasound Imaging Using Compressed Sensing

Insertion and deletion correcting DNA barcodes based on watermarks

Oracle Project Financial Planning. User's Guide Release

Non-orthogonal Direct Access for Small Data Transmission in Cellular MTC Networks

Packet Classification with Network Traffic Statistics

Enhanced continuous, real-time detection, alarming and analysis of partial discharge events

Understanding. nystagmus. RCOphth

NCH Software Bolt PDF Printer

Let s get usable! Usability studies for indexes. Susan C. Olason. Study plan

Certificate in Contemporary Music 2016 For International Applicants

Breakeven analysis and short-term decision making

Chapter 1 Structural Mechanics

The Use of Cooling-Factor Curves for Coordinating Fuses and Reclosers

Human Capital & Human Resources Certificate Programs

Iterative Water-filling for Load-balancing in Wireless LAN or Microcellular Networks

Introduction to XSL. Max Froumentin - W3C

Integrating Risk into your Plant Lifecycle A next generation software architecture for risk based

Normalization of Database Tables. Functional Dependency. Examples of Functional Dependencies: So Now what is Normalization? Transitive Dependencies

PREFACE. Comptroller General of the United States. Page i

Technical Support Guide for online instrumental lessons

Design of Follow-Up Experiments for Improving Model Discrimination and Parameter Estimation

LADDER SAFETY Table of Contents

PENALTY TAXES ON CORPORATE ACCUMULATIONS

The width of single glazing. The warmth of double glazing.

TCP/IP Gateways and Firewalls

Best Practices for Push & Pull Using Oracle Inventory Stock Locators. Introduction to Master Data and Master Data Management (MDM): Part 1

Annotated bibliographies for presentations in MUMT 611, Winter 2006

Spatio-Temporal Asynchronous Co-Occurrence Pattern for Big Climate Data towards Long-Lead Flood Prediction

Chapter 2 Traditional Software Development

Information Processing and Management

effect on major accidents

Leadership & Management Certificate Programs

STRUCTURING WAYFINDING TASKS WITH IMAGE SCHEMATA

Scheduling in Multi-Channel Wireless Networks

The Simple Pendulum. by Dr. James E. Parks

Software Quality - Getting Right Metrics, Getting Metrics Right

Transcription:

Audio Signa Separation and Cassification: A Review Paper Sik Smita 1, Sharmia Biswas 2, Sandeep Singh Soanki 3 Bira Institute of Technoogy, Mesra, Ranchi, India 1,2,3 ABSTRACT: Music signas are not soey characterized because of other mixed audio signas. Mixed audio signas contain music signas mixed with speech signas, voice and even background noise.thus, mixed signas need to cassify separatey. Researchers have deveoped many agorithms to sove this probem keeping in mind with their characteristic features of music signas: by timbre, harmony, pitch, oudness etc. The agorithm ICA (Independent component anaysis) uses basis of Bind source separation, HSS(Harmonic structure stabiity), "SOSM" APPROACH (Second Order Statistica Measures Approach), Sinusoida Parameters based audio cassification using FDMSM etc. are some of the mixed signa cassification agorithms. This paper highights a these existing methods and their experimenta resuts. KEYWORDS: Harmonic structure stabiity;ica; HSS; FDMSM; SOSM; audio separation; voice; music I. INTRODUCTION In this digita word, efficient management of the digita content of audio data has become an important area for the researcher. However, due to timeconsuming and expensive indexing and abeing that are performed manuay, an automatic contentbased cassification system of mixed type audio signa is a must for various appications of signas used these days. Separation of voice (speech), music and noise in a mixed audio signa is an important probem in music research. Here, voice means the singing voice in a song, music means the instrument. Separation of mixed signa is hepfu for many other music researches, such as Music Retrieva, Cassification and Segmentation, Mutipitch estimation, etc. Aong with audio signa separation, audio cassification is important due to foowing reasons: (a) Different audio types shoud be processed differenty. (b) The searching space after cassification is reduced into a particuar subcass during the retrieva process. This approach reduces the computationa compexity and one can choose specified fied of research for that particuar signa. For exampe, speech signas can be used for speech recognition, music signas for music anaysis and voice for speaker recognition. II. RELATED WORK In this review, music signas are of prime interest. Music signas are characterized according to their pitch, oudness, duration, harmonic structure and timbre. Timbre is mainy used for the recognition of sound sources. Music signas are more ordered than voice. The entropy of music is much constant in time than that of speech [1]. Among various methods [116] for signa separation, harmonic structure mode [4,5] has extended mutipitch estimation agorithm used to extract harmonic structures, and custering agorithm is used to cacuate harmonic structure modes. Then, signas are separated by using these modes to distinguish harmonic structures of different signas. ICA (Independent Component Anaysis) is one of the methods for extracting individua signas from mixtures. Its power resides in the physica assumptions that the different physica processes generate unreated signas. The simpe and generic nature of this assumption aows ICA to be successfuy appied in the diverse range of research fieds [5]. Vanroose used ICA to remove music background from speech by subtracting ICA components with the owest entropy [6]. Genera signa separation methods do not sufficienty utiize the specia character of music signas. GiJin and Te Copyright to IJIRCCE www.ijircce.com 6960

Won proposed a probabiistic approach to singe channe bind signa separation [7], which is based on expoiting the inherent time structure of sound sources by earning a priori sets of basis fiters. In Harmonic Structure Stabiity approach [4], training sets are not required, and a information is directy earned from the mixed audio signa. Feng et a. appied FastICA to extract singing and accompaniment from a mixture [11]. Based on different features and characteristics, audio cassification is used to cassify most audio into speech, music and noise [23]. Many features are used incuding Cepstrum [4], [5], power spectrum or ZCR proposed in [6] using Support Vector Machine (SVM) cassifier. Sinusoida parameter based audio cassification method [5] introduce a new unsupervised Feature seection method; in order to determine which features are optimum is sense of high accuracy in cassification. Using ony these obtained features it is demonstrated that acceptabe cassification accuracy is achievabe. In addition, the SVM and Reevance Vector Machine (RVM) are empoyed since they were aready proven to be the best cassifiers for musica genre recognition in many iteratures. This cassification system categorizes audio based on new audio features, sinusoida parameters denoting harmonicity as we as Subband energy. These sinusoida parameters jointy represent harmonicity ratio (HR) and energies of different frequency subbands. Another method used for cassification is Statistica Discrimination and Identification of Some Acoustic Sounds [6].This method finds a preprocessing technique which aows discriminating between the different sounds. This technique is based on the simiarity measure μ Gc introduced by Bimbot and used for speaker recognition tasks. III. METHODOLOGY A. INDEPENDENT COMPONENT ANALYSIS[9] Bind Signa Separation (BSS) or Independent Component Anaysis (ICA) is the identification & separation of mixtures of sources with itte prior information. Nongaussianity is used to estimate the ICA mode. ICA invoves two basic steps (a) Noninear decorreation" In this step, matrix W is cacuated so that for any i j, the components y i and y j are uncorreated, and the transformed components g(y i ) and h(y j ) are uncorreated, where g and h are some suitabe noninear functions. (b) Maximum nongaussianity The oca maxima of nongaussianity of a inear combination y=wxunder the constraint that the variance of x is constant. Each oca maximum gives one independent component. B. HARMONIC STRUCTURE STABILITY ANALYSIS[4,5] This agorithm finds the average harmonic structure of the music, and then separate signas by using it to distinguish voice and music harmonic structures. Steps invoved in this agorithm are preprocessing, harmonic structure extraction, music Average Harmonic Structure anaysis, separation of signas. S(t) is a monophonic music signa. R s( t) A r( t) cos[ r( t)] e( t) r 1 A r( t) and r( t) t 2 o r f ( ) d 0 are the instantaneous ampitude and phase of the r th harmonic, respectivey, R is the maxima harmonic number, f 0 (τ) is the fundamentafrequency, e(t) is the nonharmonic or noise component. Divide s(t) into overapped frames and cacuate f 0 and A r by detecting peaks in the magnitude spectrum. A r =0, if there doesn t exist the r th harmonic. = 1,..., L is the frame index. f 0 and [A 1,...,A R ] describe the position and ampitudes of harmonics. Normaize A r by mutipying a factor ρ = C/A 1 ( C is an arbitrary constant) to eiminate the infuence of the ampitude. Transate the ampitudes into a og scae, because the human ear has a roughy ogarithmic sensitivity to signa intensity. Harmonic Structure Coefficient is then defined as an equation. The timbre of a sound is mosty controed by the number of harmonics and the ratio of their ampitudes, so B = [B 1,...,B R], which is free from the fundamenta frequency and ampitude, exacty represents the timbre of a sound. In this paper, these coefficients are used to represent the harmonic structure of a sound. Average Harmonic Structure and Harmonic Copyright to IJIRCCE www.ijircce.com 6961

Structure Stabiity are defined as foows to mode music signas and measure the stabiity of harmonic structures. Harmonic Structure B, B i is Harmonic Structure Coefficient: B = [B 1,...,B R], B i=og ( ρ A i )/og ( ρ A 1 ),i=1,,r Average Harmonic Structure (AHS): L 1 B B L 1 Harmonic Structure Stabiity (HSS): L 2 R L 1 1 1 2 H S S B B ( B r B r ) R L 1 R L r 1 1 AHS and HSS are the mean and variance of B. Since timbres of most instruments are stabe, B varies itte in different frames in a music signa and AHS is a good mode to represent music signas. On the contrary B, B varies much in a voice signa and the corresponding HSS is much bigger than that of the music signa. C. Second Order Statistica Measures Approach[7] This method [5], based on monogaussian mode [3], uses some measures of simiarity, which are caed Second Order Statistica Measures (SOSM). These measures are used in order to recognize the speaker at each segment of the speech signa. Two measures (μ GC0.5 and μ G0.5 ) are used inthis experiment. In the case of sounds cassification, measuring the simiarity rate by the μ Gc0.5 permits to have an idea on the type of considered sound. If μ Gc 0.5 [Threshod_minj, Threshod _maxj] then we identify the considered sound as type «j».where, Threshod_min j = min(μgc 0.5 ( speech,sound f )) Threshod_max j = max(μgc 0.5 ( speech,sound j )) Where, the word «sound» represents a set of sounds with the same type. And «speech» represents a set of reference speakers. J denotes a certain sound beonging to a certain cass j. D. Sinusoida Parameters based audio cassification [8] Speech signa contains both periodic and nonperiodic information due to the impusive nature of events or noiseike processes occurring in unvoiced regions. As a resut, we can write for a time window segment of the underying observed audio signa as foows: L a c o s (2 f n + ) + (n ) = 1 y(n) =,, where n = 1,,N s is the sampe index, θ= ( a f ) denote the sinusoida parameters incuding the ampitude, frequency, and phase of the th sinusoida component, respectivey, L is the number of sinusoida components in the signa, and (n) is the observation noise modeed as a zeromean, additive Gaussian noise sequence. In genera, it is of interest to estimate the corresponding sinusoida parameters incuding frequencies f, ampitudes a, phases and the number of sinusoids, L. Figure 1[8] shows the basic bock diagram of the process which is FDMSM mode. Whereas, Figure 2 shows the bock diagram of the proposed audio cassification system [8]. Figure 1 For extracting the sinusoida parameters incuding ampitudes and their reated frequencies in some Mebands, modified version ofthestateoftheart sinusoida mode aso caed Fixed Dimension Modified Sinusoida Mode (FDMSM) is empoyed [16]. At the end of the sinusoida anaysis using the FDMSM approach we reach at tripe features for each time window frame as: Copyright to IJIRCCE www.ijircce.com 6962

φ= ( a, f, ) Figure 2 Further on Feature seection is done to find the smaest subset of the origina features using an unsupervised method as preprocess. IV. DISCUSSIONS ON RESULTS OBTAINED A. INDEPENDENT COMPONENT ANALYSIS [9] ICAexperimenta resuts are shown in Figure 3[9].The observed mixture of signas is shown in Figure 4[9].The origina speech signas are presented in Figure 3 and the mixed signas is shown in Figure4. The ICA agorithm is used to recover the data in Figure3 using ony the data in Figure4. Figure 3 Figure 4 B. HARMONIC STRUCTURE STABILITY ANALYSIS [4,5] In Figure 5[4,5], it can be seenthat the mixtures are we separated. The distance between music harmonic structures and the corresponding music AHS is sma (the mean distances is 0.01 and 0.006 in experiment 1 and 2, respectivey), and the distance between voice harmonic structures and the music AHS is bigger (the mean distances is 0.1 and 0.13 in experiment 1 and 2, respectivey). So, the music AHS is a good mode for music signa representation and for voice and music separation. Copyright to IJIRCCE www.ijircce.com 6963

Fig. 5 aso shows speech enhancement resuts obtained by speech enhancement software which tries to estimate the spectrum of noise in the pause of speech and enhance the speech by spectra subtraction. Figure 5 C. Second Order Statistica Measures Approach[7] This method has two aims: firsty, the discrimination between speech and other sounds; secondy the cassification of the different acoustica sounds according to the μ Gc vaues, used as a measure of simiarity, corresponding for each sound. For exampe, if μ Gc is within [2.52 4.92] then we can state that given sound shoud be music (Tabe 1).Tabe 1[7] shows Cassification of sounds according to the μ Gc range. Tabe 1 μ Gc range Probabe cassification 0.13 0.27 Speech of a known speaker 0.27 0.78 Speech of a different speaker Speech of a different speaker 0.78 1.8 or human noise 1.8 2.46 Office noise 2.52 4.92 Music 8.23 10.34 Background noise D. Sinusoida Parameters based audio cassification[8] In this approach, assume that S(k) be the spectrum of current speech frame. To accompish the peak picking process 8 msec hop size is chosen. It has fixed number of sinusoida parameters and aso preserve the synthesize quaity as cose as possibe to the origina signa in terms of perception. This resuts in the significant reduction as we as better custering performance. Tabe 2[8] shows mixed type audio cassification accuracy. Tabe 2 Cassifier Accuracy Parameters Kerne MLP 91.3 SVM KNN 93.79 97. 93 99.31 98.62 82.00 81.09 2 input 6 hidden ayer neurons Linear γ=0.5 γ =0.3 Max number of neighbors= 3 Max number of neighbors= 2 RBF RBF RBF RVM 97.14 γ = 3 RBF As it was enightened in simuation resuts, empoying such sinusoida parameters aong with the socaed SVM Copyright to IJIRCCE www.ijircce.com 6964

cassifier, wi successfuy improve the performance of the proposed cassification. Figure 6[8] is showing the decision boundaries using SVM[8]. Figure 6 V. CONCLUSION Signa separation is a difficut probem and no reiabe methods are avaiabe for the genera case. The detaied research on a the works done on separation of audio signas has ed us to the concusion that Harmonic Structure characteristic is the effective with respect to others as it preserves the audio quaity. Moreover, most of the instrument sounds are harmonic in nature. Harmonic structure of music signa is stabe. Harmonic structures of the music signas performed by different instruments are different. So, we can easiy cassify and separate different signas. However, this agorithm doesn t work for nonharmonic instruments, such as some drums. Some rhythm tracking agorithms can be used instead to separate drum sounds. Preprocessing techniques ike sinusoida parameter approach and SOSM approach have aso ed us to the concusion that seection of feature is an important step in the research fied especiay in music. That is why, the need to seect the most efficient and accurate feature is essentia for audio cassification. Cassifiers ike SVM are proven to be the best cassifiers as they more accurate than other athough it may get trade off by the eapsed time. REFERENCES 1. J. Pinquier, J. Rouas, and R. A. Obrecht, Robust speech / music cassification in audio documents, Internationa ConferenceOn Spoken Language Processing (ICSLP), pp. 2005 2008, 2002. 2. G. R. Naik and D. K. Kumar An Overview of Independent Component Anaysis and Its Appications, Informatica 35,pp. 63 81,2011. 3. P. Vanroose, Bind source separation of speech and background music for improved speech recognition, The 24th Symposium on Information Theory, pp. 103 108, May 2003. 4. Y. G. Zhang and C. S. Zhang Separation of voice and music by Harmonic Structure Stabity Anaysis, Mutimedia and Expo, ICME 2005, pp.562 565, 2005. 5. Y. G. Zhang and C. S. Zhang Separation of Music Signas by Harmonic Structure Modeing,NIPS, 2005. 6. P. M. B. Mahae, M. Rashidi, K. Faez, A. Sayadiyan A New SVMbased Mix Audio Cassification, NIPS, 2004. 7. H. Sayoud, S. Ouamour Statistica Discrimination and Identification of Some Acoustic Sounds, GCC Conference (GCC),pp.1 5, 2006. 8. P. MowaeeBegzadeMahae, A. Sayadiyan, and K. Faez Mixed Type Audio Cassification using Sinusoida Parameters, 3rd Internationa Conference on Information and Communication Technoogies (ICTTA),2008. 9. A. Hyvärinen and E. Oja Independent Component Anaysis: Agorithms and Appications, Vo 13, pp, 411430,2000. 10. A. Ozerov, P. Phiippe, F. Bimbot, and R. Gribonva Adaptation of Bayesian Modes for SingeChanne Source Separation and its Appicationto Voice/Music Separation in Popuar Songs,IEEE Transactions on Audio, Speech, and Language Processing, Vo.15, Issue: 5,pp. 1564 1578,2007. 11. S. Kova, M. Stobov, and M. Khitrov, Broadband noise canceation systems: new approach to working performance optimization, in EUROSPEECH 99, pp. 2607 2610, 1999. 12. G. J. Jang and T. W. Lee, A probabiistic approachto singe channe bind signa separation, NIPS, 2003. 13. Y. Feng, Y. Zhuang, and Y. Pan, Popuar music retrieva by independent component anaysis, in ISMIR, pp. 281 282, 2002. 14. Anssikapuri and Manue Davy Signa Processing methods for transcription, springer science, pp. 69, 2006. 15. F. BIMBOT, I. MAGRINCHAGNOLLEAU, et L. MATHAN 1995, SecondOrder Statistica measures for textindependent Broadcaster Identification. Speech Communication, Voume. 17, Number, 12, pp. 177192, August 1995. 16. S. Z. Li, Contentbased audio cassification and retrieva using the nearest feature ine method, IEEE Transaction on Speech and Audio Processing, Vo. 8, Issue: 5, pp. 619 6252000. Copyright to IJIRCCE www.ijircce.com 6965

BIOGRAPHY Sik Smitais a Masters in Engineering student in the Eectronics Department, Bira Institute of Technoogy, Deemed University. Her research interests are music signa processing and signa processing. SharmiaBiswasis PhD student in the Eectronics Department, Bira Institute of Technoogy, Deemed University. Her research interests are music signa processing and signa processing. Sandeep Singh Soankiis Associate Professor in the Eectronics Department, Bira Institute of Technoogy, Deemed University. His research interests are music and speech signa processing and automation. Copyright to IJIRCCE www.ijircce.com 6966