A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER. Figure 1. Basic structure of an encoder.



Similar documents
STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

An Optimised Software Solution for an ARM Powered TM MP3 Decoder. By Barney Wragg and Paul Carpenter

AUDIO CODING: BASICS AND STATE OF THE ART

Image Compression through DCT and Huffman Coding Technique

Digital Audio Compression

MPEG-1 / MPEG-2 BC Audio. Prof. Dr.-Ing. K. Brandenburg, bdg@idmt.fraunhofer.de Dr.-Ing. G. Schuller, shl@idmt.fraunhofer.de

4 Digital Video Signal According to ITU-BT.R.601 (CCIR 601) 43

A Comparison of the ATRAC and MPEG-1 Layer 3 Audio Compression Algorithms Christopher Hoult, 18/11/2002 University of Southampton

MP3 Player CSEE 4840 SPRING 2010 PROJECT DESIGN.


Audio Coding, Psycho- Accoustic model and MP3

Digital Audio Compression: Why, What, and How

Audio Coding Algorithm for One-Segment Broadcasting

Real-Time Audio Watermarking Based on Characteristics of PCM in Digital Instrument

Figure 1: Relation between codec, data containers and compression algorithms.

Voice---is analog in character and moves in the form of waves. 3-important wave-characteristics:

A Secure File Transfer based on Discrete Wavelet Transformation and Audio Watermarking Techniques

Introduction to Medical Image Compression Using Wavelet Transform

High performance digital video servers: storage. Seungyup Paek and Shih-Fu Chang. Columbia University

White Paper: An Overview of the Coherent Acoustics Coding System

The Theory Behind Mp3

Bandwidth Adaptation for MPEG-4 Video Streaming over the Internet

(Refer Slide Time: 2:10)

Digital terrestrial television broadcasting Audio coding

System-On Chip Modeling and Design A case study on MP3 Decoder

MPEG Unified Speech and Audio Coding Enabling Efficient Coding of both Speech and Music

Quality Estimation for Scalable Video Codec. Presented by Ann Ukhanova (DTU Fotonik, Denmark) Kashaf Mazhar (KTH, Sweden)

MP3 AND AAC EXPLAINED

Performance Analysis of medical Image Using Fractal Image Compression

An Efficient Architecture for Image Compression and Lightweight Encryption using Parameterized DWT

Starlink 9003T1 T1/E1 Dig i tal Trans mis sion Sys tem

Reconfigurable Low Area Complexity Filter Bank Architecture for Software Defined Radio

Classes of multimedia Applications


Analog Representations of Sound

Speech Signal Processing: An Overview

RECOMMENDATION ITU-R BO.786 *

For Articulation Purpose Only

Video Coding Basics. Yao Wang Polytechnic University, Brooklyn, NY11201

TUTORIAL FOR CHAPTER 8

White paper. H.264 video compression standard. New possibilities within video surveillance.

FFT Algorithms. Chapter 6. Contents 6.1

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

Watermarking Techniques for Protecting Intellectual Properties in a Digital Environment

Development and Evaluation of Point Cloud Compression for the Point Cloud Library

A Dynamic Programming Approach for Generating N-ary Reflected Gray Code List

Hybrid Lossless Compression Method For Binary Images

Video Sequence. Time. Temporal Loss. Propagation. Temporal Loss Propagation. P or BPicture. Spatial Loss. Propagation. P or B Picture.

MEDICAL IMAGE COMPRESSION USING HYBRID CODER WITH FUZZY EDGE DETECTION

High-Fidelity Multichannel Audio Coding With Karhunen-Loève Transform

BROADBAND AND HIGH SPEED NETWORKS

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Introduction to image coding

IBM Research Report. CSR: Speaker Recognition from Compressed VoIP Packet Stream

MULTI-STREAM VOICE OVER IP USING PACKET PATH DIVERSITY

1.4 Fast Fourier Transform (FFT) Algorithm

Digital TV: Overview. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao

Audio Coding Introduction

encoding compression encryption

From Concept to Production in Secure Voice Communications

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Convention Paper 5553

MPEG Layer-3. An introduction to. 1. Introduction

CM0340 SOLNS. Do not turn this page over until instructed to do so by the Senior Invigilator.

(51) Int Cl.: H04S 3/00 ( )

Introduction and Comparison of Common Videoconferencing Audio Protocols I. Digital Audio Principles

ALFFT FAST FOURIER Transform Core Application Notes

THE NAS KERNEL BENCHMARK PROGRAM

Non-Data Aided Carrier Offset Compensation for SDR Implementation

Digital Subscriber Line (DSL) Transmission Methods

Module 8 VIDEO CODING STANDARDS. Version 2 ECE IIT, Kharagpur

Computer Networks and Internets, 5e Chapter 6 Information Sources and Signals. Introduction

The enhancement of the operating speed of the algorithm of adaptive compression of binary bitmap images

Simulative Investigation of QoS parameters for VoIP over WiMAX networks

Statistical Modeling of Huffman Tables Coding

Mike Perkins, Ph.D.

Video-Conferencing System

H 261. Video Compression 1: H 261 Multimedia Systems (Module 4 Lesson 2) H 261 Coding Basics. Sources: Summary:

A Digital Audio Watermark Embedding Algorithm

DYNAMIC DOMAIN CLASSIFICATION FOR FRACTAL IMAGE COMPRESSION

RN-Codings: New Insights and Some Applications

Chapter 6 CDMA/802.11i

Course 12 Synchronous transmission multiplexing systems used in digital telephone networks

HDMI Modulator single DVB-T. High Definition. User Manual

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Power Reduction Techniques in the SoC Clock Network. Clock Power

H.264/MPEG-4 AVC Video Compression Tutorial

Security Based Data Transfer and Privacy Storage through Watermark Detection

A comprehensive survey on various ETC techniques for secure Data transmission

SPEECH SIGNAL CODING FOR VOIP APPLICATIONS USING WAVELET PACKET TRANSFORM A

Figure1. Acoustic feedback in packet based video conferencing system

SOFTWARE AND HARDWARE-IN-THE-LOOP MODELING OF AN AUDIO WATERMARKING ALGORITHM. Ismael Zárate Orozco, B.E. Thesis Prepared for the Degree of

Transcription:

A HIGH PERFORMANCE SOFTWARE IMPLEMENTATION OF MPEG AUDIO ENCODER Manoj Kumar 1 Mohammad Zubair 1 1 IBM T.J. Watson Research Center, Yorktown Hgts, NY, USA ABSTRACT The MPEG/Audio is a standard for both transmitting and recording compressed audio. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. The standard denes the decoding process and also the syntax of the coded bitstream. However, there is room for having dierent implementations to generate the compressed bitstream. In this paper we propose a high performance software implementation of the MPEG/Audio encoder. We obtained more than a factor of ve improvement over a straightforward implementation on the IBM PowerPC, Model 250. 1. INTRODUCTION The basic structure of the audio encoder as described in the ISO/IEC 11172-3 document [4] is shown in Figure 1. The input audio samples from each channel pass through a subband lter which divides the input into 32 equal-width frequency subbands. The psychoacoustic model calculates the masking threshold for each subband below which noise is imperceptible to human ear. In the bit-allocation procedure, bits are allocated to the subbands according to the masking threshold provided by the psychoacoustic model. The objective of bit allocation is to minimize the maxima of noise to mask ratio, the maximum taken over all channels and subbands. The subband samples are then scaled, quantized according to the bit allocation, and formatted into an encoded stream together with a header, bit allocation and scaling information. In this paper we give a high performance implementation of the subband lter and bit-allocation procedures. The proposed technique for subband lter computation exploits the architectural features of the RISC machines to get better performance. In particular, we make better use of memory hierarchy, fused multiply-add instruction, and arithmetic pipelines present onatypical RISC machine. The bits are allocated to multiple subbands such that the quantization noise is masked to the greatest extent. This allocation is an iterative process and for each bit allocated, the minimum mask to noise ratio is computed over all subbands of all channels. We describe an ecient implementation of this computation using the heap data structure. PCM Audio Samples Filter Bank Psychoacoustic Model Bit Allocation Formatter Encoded Bitstream Figure 1. Basic structure of an encoder. 2. SUBBAND FILTER The analysis subband lter divides the audio signal into 32 equal-width frequency subbands. A high level description of this computation as described in the MPEG ISO/IEC 11172-3 document is given below. 1. Shift in 32 new samples into a 512 point buer x. xi = xi 32 for i = 511 down to 32 next-input for i =31down to 0 2. Window vector x by the coecient vector c. The coef- cients are given in the Mpeg document. 3. Partial calculation. zi = ci xi; for i = 0 to 511: yi = 7X j=0 zi+64j; for i = 0 to 63: 4. Calculate the 32 output subband samples si by matrixing. si = 63X k=0 mik yk; for i = 0 to 31: The coecient for the matrix mik are given by, mik = cos ((2i + 1)(k 16)=64); for i = 0 to 31; and for k = 0 to 63:

Steps 1-3 calculates the polyphase components of the prototype lter. Step 4 is the polyphase implementation of the subband lters. The vector c in Step 2 is the coecients for the prototype lter of the polyphase lter bank. Several researchers in the past have reformulated the above computation to reduced the number of operations. This is basically achieved by using a DCT or DFT algorithms for some part of the computation [1, 3, 2]. For example, the formulation in [1, 2] exploits the symmetry of cosine function to replace the 32 64 matrix, mik, bya32 32 DCT matrix. (This requires some pre-computation to reduce the size of the y vector to 32.) The operation count can be further reduced by using a fast DCT formulation (FDCT). However, we discovered that the FDCT formulation is of limited advantage because of short DCT lengths (32 point DCT). In fact, as we discuss later, one level of FFT like decomposition of the DCT matrix is enough to give FDCT like performance. We also found that the rest of the computation (Steps 1-3) is a signicant part of the overall computation, and hence we focused on those steps. The key reasons for the poor performance of Steps 1-3 are: (i) The number of loads and store per operation is quite high. Once the data item is brought into the register, there is much not reuse. (ii) There is no use of fused multiply add store instruction available on most of the RISC machines. We addressed these problems by restructuring the computation and doing it for a group of 12 blocks of 32 samples together. A high level description of the new technique is outlined below. 1. Shift in 32*12 new samples into a 864 point buer x. xi = xi 384 for i = 863 down to 384 next-input for i = 383 down to 0 2. Compute y's for a group of 12 samples as follows. for j =0to 63 do tk =0,0k<12 for i =0to 7 do m=i64 + j c0 =cm t 0=t 0+c0xm+352; t 1 = t 1 + c0 xm+320 t 2 = t 2 + c0 xm+288; t 3 = t 3 + c0 xm+256 t 4 = t 4 + c0 xm+224; t 5 = t 5 + c0 xm+192 t 6 = t 6 + c0 xm+160; t 7 = t 7 + c0 xm+128 t 8 = t 8 + c0 xm+96; t 9 = t 9 + c0 xm+64 t 10 = t 10 + c0 xm+32; t 11 = t 11 + c0 xm y 0;j = t 0; y 1;j = t 1; y 2;j = t 2; y 3;j = t 3 y 4;j = t 4; y 5;j = t 5; y 6;j = t 6; y 7;j = t 7 y 8;j = t 8; y 9;j = t 9; y 10;j = t 10; y 11;j = t 11 3. Calculate the 32*12 output subband samples si by matrixing. As outlined in [1, 2], we rst exploits the symmetry of cosine function to replace the 32 64 matrix, mik, by a 32 32 DCT matrix rik. The coecient for the DCT matrix rik are given by, rik = cos ((2i +1)k=64); for i = 0 to 31; and for k = 0 to 31: The necessary pre-computation to reduce the size of the y vector to an f vector of size 32 is given by, for k =0to 11 do fk;0 = yk;16 for j =0to 16 do fk;j = yk;j+16 + yk;16 j for j =17to 31 do fk;j = yk;j+16 yk;80 j As mentioned before, we can reduce the numberofoperations further by one level of FFT like decomposition of the DCT matrix. This decomposition is clear from the structure of the DCT matrix. The even columns of r are symmetric about their center, and the odd columns are symmetric with a change of sign. The code which uses this property along with the inner-loop unrolling to minimize loads and stores is illustrated below. for k =0to 11 do for i =0to 15 do s0 =0 s1=0 for j =0to 31 in steps of 2 do s0 =s0+ri;j fk;j s0 =s0+ri;j+1 fk;j+1 sk;i = s0+s1 sk;31 i = s0 s1 Note that in the above restructured code we have merged the widowing and partial calculation step (Steps 2 and 3). This allow us to use the fused multiply and add instruction of the PowerPC architecture. Also, by performing computation for a group of 12 samples together, the inner loop of the restructured code consists of (i) twelve independent multiply-add operations, and (ii) a signicant reuse of elements of c vector. As a result the arithmetic pipelines of the PowerPC architecture are fully utilized, and the number of loads/store per arithmetic operation is minimized. 3. BIT ALLOCATION Allocation of bits to a subband reduces the quantization noise in that subband. For each audio frame (384 samples/channel for layer 1 and 1152 samples/channel for layer 2), bits must be distributed across the subbands from a pool of predetermined number of bits. Before this bit distribution is attempted, a psychoacoustic model is used to determine the quantization noise for each subband that will be masked in the human auditory system. This masking level in each subband gives us the masking curve shown in Figure 2, which is a function of short term Fourier transform of the audio frame being coded and the psychoacoustic model used.

element being inserted Figure 2. Masking curve. The objective of the bit allocation process is to minimize the noise to mask ratio (NMR) over all subbands. (Here, noise is the quantization noise introduced by the bit allocation process.) As a result the quantization noise curve after bit allocation is an oset of the mask curve. The high level description of the bit allocation computation in the MPEG ISO/IEC is an iterative procedure. Bits are allocated one at a time to the subband with the maximum NMR, and that subband's NMR is reduced accordingly. Furthermore, for each bit allocated, the NMR values of all subbands in all channels are scanned to select the subband-channel pair with maximum NMR value. This is a very time consuming process. We optimize the bit allocation process by using the heap data structure. A heap is a complete binary tree with the property that the value (key) of each node is at least as large as the value of its children nodes (if they exist). For our case, NMR is used as the value. There is a heap element for every channel-subband pair which has yet not received a pre-determined maximum number of bits. The commonly used operations on a heap are delete and insert. A heap is built from a collection of elements by starting with a one element heap, and iteratively inserting all elements in it. The insert operation is performed by initially placing the element to be inserted at the left most vacant location of the bottom level. Then the element being inserted is exchanged with its parent if its value or key (NMR in our case) is less than that of its parent. We keep exchanging the element being inserted with its parent until it reaches a position where its value is less than that of its parent. The insert process is shown in Figure 3. An element with maximum value is at the top of the heap. If we delete it, the heap must be rebalanced as shown in Figure 4. The right most element in the bottom level of the heap is moved to the vacant spot at the top of the heap. Then this element is compared with both of its child nodes, and if its key is not greater than that of both child nodes, it is exchanged with the child node having the larger key. This exchange process is repeated until the exchanges take place. In the bit allocation process, we repeatedly delete the element at the top of the heap, modify its NMR value (key) by allocating a bit to it, and then reinsert this element back into the heap. This delete-insert operation pair can be combined to eliminate the overhead of insert operation. Essentially after deleting the element from top of the heap, Figure 3. Adding an element to a heap. 2 Move 1 3 Delete Balance heap Figure 4. Deleting an element from a heap. Step 1 in Figure 4, we do not carry out Steps 2 and 3 to balance the heap immediately. When the deleted item has to be reinserted in the heap with a modied NMR value, we bring it back to the top of the heap instead of using Step 2 of the delete process, and then carry out Step 3 of the heap deletion procedure shown in Figure 4. In joint stereo coding, once we select the channel-subband pair with the maximum NMR value, bits are allocated and NMR values are modied for that particular subband in all channels. Furthermore, any of these channel-subband pairs may get the predetermined maximum number of bits as a result of this allocation, and therefore must be deleted from the heap. We note that the subtree under any element of the heap has all the properties of the heap. Thus, any element of the heap can be deleted by replacing it by the rightmost element in the last level of the heap, and then balancing it as outlined in Step 3 of Figure 4. Similarly, the delete-insert operation pair can be combined for interior heap elements just as is done for the top element of the heap. The advantages of this approach over the traditional one using a linear search are: (i) for every bit allocated we need to work on logn elements as opposed to n elements (here n is the list size), and (ii) the heap size becomes smaller as the subband becomes ineligible for further bit allocation resulting in the reduced search time. 4. PERFORMANCE RESULTS We evaluated the performance of the MPEG audio encoder on a Model 250, 66MHz, PowerPC 601 machine. The performance results are summarized in Figure 4. For comparison, we have also included the performance for the code

Figure 5. Execution time for encoding a 30 sec. stereo audio sampled at 44.1KHz. The audio is compressed to 384 Kbps using Layer 2 Psychoacoustic Model 2. based on the description in the MPEG ISO/IEC 11172-3 document. For all our experiments the output of both the implementations were bit compatible. Note that we were able to get a factor of ve overall performance improvement over the standard code. A twenty fold improvement was observed for bit allocation procedure. The major factor for this was the new proposed algorithm for bit allocation based on heap data structure. We saw an improvement of a factor of ve for subband lter computation. Again, this was due to the proposed new algorithm as outlined in Section 2. The improvement in the Psychoacoustic model routine was due to some major modications which will be discussed elsewhere. The rest of the modules were optimized using standard optimization techniques. REFERENCES [1] D. Pan, \Digital Audio Compression," Digital Technical Journal, Vol. 5, No.2, 1993, pp.28-40. [2] K. Konstantinides, \Fast subband ltering in MPEG Audio Coding," IEEE Signal Processing Letters, 1(2), Feb. 1994, pp.26-28. [3] H.J. Nussbaumer and M. Vetterli, \Computationally Ecient QMF Filter Banks," Proceeding International Conference IEEE ASSP, IEEE Press, N.J., 1984, pp. 11.3.1-11.3.4. [4] International Standard. Information Technology { Coding of Moving Pictures and Associated Audio Information: Audio, 1993. ISO/IEC 11172-3.

A HIGH PERFORMANCE SOFTWARE IMPLEMENTA- TION OF MPEG AUDIO ENCODER Manoj Kumar 1 and Mohammad Zubair 1 1 IBM T.J. Watson Research Center, Yorktown Hgts, NY, USA The MPEG/Audio is a standard for both transmitting and recording compressed audio. The MPEG algorithm achieves compression by exploiting the perceptual limitation of the human ear. The standard denes the decoding process and also the syntax of the coded bitstream. However, there is room for having dierent implementations to generate the compressed bitstream. In this paper we propose a high performance software implementation of the MPEG/Audio encoder. We obtained more than a factor of ve improvement over a straightforward implementation on the IBM PowerPC, Model 250.