Stress management with Music Therapy

Transcription

1 Stress management with Music Therapy Sougata Das 1 and Ayan Mukherjee 2 1 Senior Systems Engineer, IBM India Private Limited, Kolkata, India 2 Assistant Professor, Dept. of MCA, Brainware Group of Institutions, Barasat, Kolkata, India Abstract This paper is a culmination of two dynamic fields which is of major importance in the practical world Voice Recognition and Music Therapy. Voice Recognition, which is a very important field and is under development by various commercial and research organizations all over the world. It is implemented in critical applications such as healthcare, military, to regular applications such as smart phones, microwaves, biometric security. However, a portion of our speech which is less quantitative in nature is the emotion involved in speech. It is that portion of our speech which adds dynamism to our speech. This paper is an attempt to extract the emotional part of a speech and detect the emotional content of the speech. The detection of emotion is independent of any adjective-oriented word spoken but rather the emphasis on the pitch, tonality and stress on particular words. Keywords:- DCT, DFT, FFT, MFCC, Cepstrum, Mel Scale, Mel Spectrum, Hamming Window, Music Therapy, Emotion Detection. 1. INTRODUCTION 1.1 History of Music Therapy Music Therapy [6] is the systematic application of music in the treatment of the physiological and psychosocial aspects of an illness or disability. It focuses on the acquisition of non-musical skills and behaviors, as determined by a board certified music therapist through systematic assessment and treatment planning. Therefore, it is an allied health profession and one of the expressive therapies, consisting of an interpersonal process in which a certified music therapist uses music and all of its facets physical, emotional, mental, social, aesthetic, and spiritual to help clients to improve or maintain their health. Music therapy in the United States of America began in the late 18th century. However, using music as a healing medium dates back to ancient times. This is evident in biblical scriptures and historical writings of ancient civilizations such as Egypt, China, India, Greece and Rome. Today, the power of music remains the same but music is used much differently than it was in ancient times. The profession of music therapy in the United States began to develop during W.W.I and W.W. II, when music was used in Veterans Administration Hospitals as an intervention to address traumatic war injuries. Veterans actively and passively engaged in music activities that focused on relieving pain perception. Numerous doctors and nurses witnessed the effect music had on veterans' psychological, physiological, cognitive, and emotional state. Since then, colleges and universities developed programs to train musicians how to use music for therapeutic purposes. In 1950, a professional organization was formed by a collaboration of music therapists that worked with veterans, mentally retarded, hearing/visually impaired, and psychiatric populations. This was the birth of the National Association for Music Therapy (NAMT). In 1998, NAMT joined forces with another music therapy organization to become what is now known as the American Music Therapy Association (AMTA). 1.2 History of Voice Recognition Voice Recognition or a more accurate term which we can say is speech recognition is a form of technology which was developed solely to remove the concept of typing or writing or rather introduces the human voice as an input. Speech Recognition was first developed in the institute which is the mother of all computer oriented inventions, the Bell Labs. The first voice operated system developed was AUDREY in the year 1952, which had the ability to identify the spoken digits. Exactly 10 years from then, the leading corporate giants and another powerhouse of innovation, IBM, first demonstrated the ShoeBox, voice recognition software which had the ability to recognized 16 spoken English words. Slowly the idea of voice or rather speech recognition was spreading far and wide and laboratories in United States, Japan, England, and Soviet Union were developing voice recognition systems and also developing dedicated hardware to support such systems. Despite how little these efforts might sound but it was impressive beginning given to the fact that computation was quite primitive and not so developed. In 1970s, the US Department of Defense started taking interest in voice recognition and a Speech Understanding Research (SUR) cell was formed. Research work was going on all over the world when Carnegie-Mellon University first developed HARPY speech understanding system which was capable or recognizing 1011 words words could be supposed to be an average vocabulary of a 3 year old child. The most interesting point in HARPY was the search techniques involved a heuristic search based algorithm called the BEAM SEARCH, which provided an optimal Volume 3, Issue 6, November-December 2014 Page 273

2 solution. At the end of the 70s, Voice Recognition went around from a single voice to identifying and operating on the basis of multi-people voices. Also due to development on linguistics dedicated towards speech recognition, speech recognition went from a free hundred words to a few thousand words and potentially the ability to recognize unlimited words. 12 CONTROL FLOW DIAGRAM 12.1 Block Diagram 2. PROBLEM STATEMENT This project aims to detect the negative emotion from a given sound and determine that which music is to be implemented as a therapy on the given subject. The steps are as discussed below: Voice Recording At application level, a voice is to be recorded and fed as input. Detection of emotion The program will provide us with the output of an emotion detected. Final Result On the basis of the emotion detected, a mapping function will generate which music will be used as a therapy on the subject. 10 OBJECTIVES The objective of this project is to do stress management using music therapy but the innovation is to apply the music therapy using an automated method of detecting the stress in the voice sample. The primary objective and motivation of this project is to reduce stress problems by utilizing music therapy. The ragas involved in the music therapy happen to produce a positive effect on the subject and reduce the temporary negative emotions present. 11 BENEFITS It is an effective automation system which will determine the negative emotion and rather help the subject from going in negative psychological state. It will remove the necessity of any human intervention in the following process Reduce human mortality by reducing the chances of deaths related to psychiatric issues. It can be implemented on any device such as smart phones which will be available to each and every people Flow Diagram Figure 1 Block Diagram Figure 2 Flow Diagram 13 PROCESS DESCRIPTION The Stress Management [1] using Music Therapy is the complete culmination of two different well known fields. It is a combination of Music Therapy and Voice Recognition but involving the ability to detect emotional content from a given voice/speech sample. The entire system is divided in two sections. One section deals with the detection of emotion from the given voice/speech sample and the other section works with the mapping Volume 3, Issue 6, November-December 2014 Page 274

3 of the given emotion with the required raga which can be used to calm that emotion Framing It is generally known that a given speech sample will not be stationary over time or we will not be able to find any consistency in its waveform, But given in short term analysis of signals, if we consider an interval which is quite short in length, then we can consider the wave to stationary. The non-uniformity of the speech or voice waveform is given due to the fact of the rate of movement of speech articulators i.e. the lips, jaw, tongue etc. As the change of the voice spectrum is directly dependent to the rate of change of the articulation of the speech. We take a frame where the speech waveform is cropped and any extra silence of acoustic interference is removed which may be present in the staring or ending of the file Windowing The frames are taken and processed to remove any sort of signal discontinuities in the beginning or end of the frame. The concept here is to minimize the spectral distortion by using the window to taper the signal to zero at the beginning and end of each frame. In other words, when we perform Fourier Transform, it assumes that the signal repeats, and the end of one frame does not connect smoothly with the beginning of the next one. This introduces some glitches at regular intervals. So we have to make the ends of each frame smooth enough to connect with each other The technique involved in preparing the windows of the signal requires the implementation of soft window. However, it is mostly seen that Hamming Window is the mostly preferred technique because it causes the window to smoothly taper at both the ends. The Hamming Window function involved is: w(n) = cos(2 n/n-1) (1) where w(n) = hamming window function. n = any given sample from the total number of frames. N-1 = the total number of frames obtained 13.3 Fourier Transformation The given frame is now to be processed by a Fourier Transformation (Discrete Fourier Transformation) which converts each frame from the given N frames from time domain to the frequency domain. As the frames obtained were on the basis of the time domain against the amplitude but now after the implementation of the Fourier transformation, it converts it from time domain to frequency domain. The Fourier Transformation is implemented using the Fast Fourier Transform (FFT) algorithm which is a faster method of Discrete Fourier Transformation. The FFT is a fast algorithm to implement the Discrete Fourier Transform (DFT) which is defined on the set of N samples {x} as follow: N 1 X n = xk e -2i k(n/n) n=0,1,2,..,n-1 (2) k Mel - Frequency Wrapping The spectrum obtained in the frequency domain is now taken as input for this stage. The signal is plotted against the Mel-Scale [3] to mimic the human hearing. Human perception of frequency contents of sounds for speech signal does not follow a linear scale. Thus for each tone with an actual frequency, f, measured in Hz, a subjective pitch is measured on a scale called the Mel scale. The Mel-frequency scale is a linear frequency spacing below 1000 Hz and a logarithmic spacing above 1000 Hz. As a reference point, the pitch of a 1 KHz tone, 40dB above the perceptual hearing threshold, is defined as 1000 Mels. Therefore we can use the following approximate formula to compute the Mel on the basis of a given frequency. Mel(f) = 2595*log 10 (1 + f/700) (3) The approach is to simulate the subjective spectrum is to use a filter bank, one filter for each desired Mel-frequency component. That filter bank has a triangular band pass frequency response and the spacing as well as the bandwidth is determined by a constant Mel-frequency interval. The Mel scale filter bank is a series of l triangular band pass filters that have been designed to simulate the band pass filtering believed to occur in the auditory system. This corresponds to series of band pass filters with constant bandwidth and spacing on a Mel frequency scale Cepstrum The Cepstrum [4] name is derived from the word Spectrum by reversing the first four letters ''Spec becomes Ceps trum. We can additionally say that cepstrum is the Fourier Transform of the logarithm of the Fourier Transform of the Window Signal. Cepstrum = FT(log(FT(window signal))+j2πm) (4) The real values real cepstrum uses the logarithm function. While for defining the complex values whereas the complex cepstrum uses the complex logarithm function. The real cepstrum uses the information of the magnitude of the spectrum whereas complex cepstrum holds information about both magnitude and phase of the initial spectrum, which allows the reconstruction of the signal. The cepstral representation of the speech spectrum provides us with a very good representation of the local spectral properties of the signal for a given frame analysis. Volume 3, Issue 6, November-December 2014 Page 275

4 14 TEST RESULTS On the basis of the value generated, we play the necessary raga file. A mapping table is provided under the section of MUSIC THERAPY [8] where we have provided a list of diseases which can be used to cure using a particular music file. Table 1: Classification of various moods according to the Ragas Mood Sad Depression Hypertension Anger Fear Ragas Kafi Kapi Bageshri Sahana Mishra Mand Figure 3 Cepstral Representation The values of the cepstrum are then converted from frequency domain to time domain using Discrete Cosine Transformation. Thus we can calculate the MFCC's as: The following set of figures show the plot of the signal on left side and the power spectrum on the right Fear 13.6 Mel Frequency Cepstrum Coefficient (MFCC) Mel Frequency Cepstrum Coefficient are coefficients which represent audio on the basis of perception. It was developed by Paul Mermelstein along with Bridle and Brown who proposed the idea. It generates a 20 dimensional matrix from the signal and we utilize the value and generate and algorithm to deduce what emotion we have as the sample. Algorithmically, the concept of cepstrum [2] is presented here in the form of a block diagram. Figure below shows the flow chart that describes as to how to obtain cepstrum from a signal. (4) Anger Figure 5 Figure 3: Flow chart of Cepstrum The MFCCs are the amplitudes of the resulting spectrum. This procedure is represented step- wise in the figure below Depression Figure 6 Figure 4: MFCC Flow Chart Figure 7 Volume 3, Issue 6, November-December 2014 Page 276

5 HyperTension Figure 8 Figure 9: Data Store for sad mood 15 CONCLUSION The project is an example of two distinct fields of Computer Science and Para medicine merged into a single field and though at a nascent stage with very narrow production, it will lead to a very promising field. The project provides a solution to Stress Management which is very prevalent today in the modern world and if the project is taken up at a higher level it can be applied at a commercial level also. Though the technique used to emulate human hearing and perception, the Mel Frequency Cepstrum Coefficient is very much prone to noise and even after the noise is reduced, the results produced are sometimes very limited as we are unable to properly detect the emotion, however by utilising proper scales and on further research, it will be very much possible to detect the correct emotion with much accuracy. Even though with all the limitations, we have tried our level best to produce satisfactory result and generate a solution by which we can map the negative emotions with the ragas which can be used to cure them. References [1] Kumar, Ch.Srinivasan (2011), Design Of An Automatic Speaker Recognition System [2] Neiberg, Daniel (2006), Emotion Recognition in Spontaneous Speech Using GMMs, [3] Cornaz, Christian (2003), An Automatic Speaker Recognition System, February 03 [4] Tiwari, Vibha (2010), MFCC And Its Applications In Speaker Recognition, February 10 [5] Sairam,T.V: Music And Moods 2,[Online] (Sept 13, 2013) [6] MusicTherapy,[Online] Music_therapy (Aug 21, 2013) [7] Mahesh, Anuradha: Music-Therapy For Wellness,[Online] com/music-therapy/ ( Sept 20, 2013) [8] Raga Therapy For Healing Mind And Body,[Online] ntinfo/raga-therapy-for-healing-mind-and-bodyhealing-ragas_print.htm (Aug 30, 2013) [9] Music As Medicine, [Online] (Aug 29, 2013) 16 LIMITATIONS AND SCOPE FOR FUTURE Firstly, this application is still at a nascent stage and due to some hardware irregularities we have to work on stored audio file instead of real time recording. Secondly, highly noisy audio input produces deviating result and does not produce the correct result. Thirdly, not all emotions can be detected at present due to unavailability of exact mood emotion source audio files. This can be implemented as a smart phone app or web application. The range of emotions can be increased by professional actors. A professional database application can be used for efficient storage and retrieval of data. Volume 3, Issue 6, November-December 2014 Page 277