HELSINKI UNIVERSITY OF TECHNOLOGY 30.11.2004 Telecommunications Software and Multimedia Laboratory T-111.550 Multimedia Seminar Fall 2004: Mobile Multimedia Application Platforms Mobile Audio from MP3 to AAC and further Henri Autti 51194K Johnny Biström 21548C
Mobile Audio from MP3 to AAC and further Henri Autti and Johnny Biström HUT, Telecommunications Software and Multimedia Laboratory henri.autti@hut.fi johnny.bistrom@hut.fi Abstract The purpose of this paper is to evaluate the advanced audio codec s and reflect over their suitability for mobile needs of today and tomorrow. The historical development of different codec s for different purposes is analyzed. The features of the most common codec s are discussed in parallel with performance and other criteria. The capabilities of mobile devices and the telecommunication possibilities now and in the future are also considered in the analysis. Finally some comparisons of the codec s performances are done. Some existing applications on mobile audio are presented and ideas for audio applications in the future are discussed. Some of the questions to be answered here are. Can one codec, that is superior for the mobile world, be found or do we have to prepare for a wide diversity in the future? Will the codec s continue to develop as rapidly as they have done so far? 1 INTRODUCTION A look at audio applications for the mobile world today reveals that the diversity in implementations is wide. The solutions chosen for the representation of audio streams in certain situations differ greatly and the number of codec s used for the purpose of encoding and decoding audio data streams is large. It is not obvious which codec should be used for what purpose. The selection of codec depends on several factors as on content type of the audio material, the available communication speed and the quality requirements of the listening situation. Other factors that might influence the selection of codec are the standardization situation, the licensing policy and the competitors choices in the market. During the last years the mp3 format has been a great success but it does not fit well into mobile devices. Lately more efficient codec s as AAC and AMR have been presented and they have been refined for mobile audio purposes. The purpose of this paper is to evaluate the most important audio codec s by revealing the technical principles of the en- and decoding, the standardization situation and the suitability of the codec in relation to technology available and the market needs. The analysis also takes into consideration the development situation of mobile and telecommunication hardware and software. Using technical literature and documented 1
listener testing combined with mobile manufacturer specifications and published white papers we try to find out if there is one superior codec for mobile audio applications and which codec it would be. To do that, an analysis of existing and future audio applications in the market has to be done to clarify the needs and expectations on mobile audio. Finally the result is reflected against the development trends of the mobile technology and the persistency of the chosen solution is judged. 2 BACKGROUND In this chapter we take a look at the background of mobile audio formats. First we present some basic facts about the development of audio codec s and the reasons for developing them. Then we discuss some facts about the development of mobile devices including phones, PDA s and Laptops. Later in this part we take a quick look at the applications available today and the demands these applications pose The short history of audio codec s dates back to the mid-1980s, in the Fraunhofer Institut in Erlangen (Fraunhofer, 2004), Germany, which first began working on a high quality, low bit-rate audio coding with the help of Dieter Seitzer, a professor at the University of Erlangen. Their project was financed by the European Union as a part of the market-oriented Eureka research program where it was commonly known as EU- 147. In Germany in 1989, Fraunhofer was granted a patent for mp3, which we are going to discuss more thoroughly in the next chapter. A few years later it was submitted to the International Standards Organization (ISO), and mp3 was introduced as a part of the official MPEG-1 standard in 1992. It was in January 1995 that Fraunhofer applied patent on mp3 in America as well and it was granted in November 1996. The revolutionary thing created was, that using mp3-compression PC-users were for the first time in history able to compress an ordinary music-cd to one tenth of its original size, with only a small sacrifice in the sound quality - thus 12 hours of music could be stored on a recordable CD that on the other hand could be played by a mp3-cd-player or an ordinary PC. In the rapidly evolving world of mobile content development things have changed a lot since those days. Nowadays ranging from small laptops through palmtops to phones, these devices are more available, and high-speed wireless networks are getting better day by day. At the same time speech and audio compression have advanced rapidly in recent years spurred on by cost effective digital technology and diverse commercial application. Wideband speech and high fidelity audio compression have also made great progressions in recent years, accelerated by the commercial success of consumer and professional digital audio products. Telephone speech, wideband speech and wideband audio signals differ not only in bandwidth and dynamic range, but also in listener expectations of the offered quality. Using of wideband not only improves the intelligibility and naturalness of speech, but also adds a feeling of transparent communication and eases speaker recognition. The commercial applications in the mobile content area of today are also developing at a growing rate. Mobile device services contain, according to Ericsson (Bruhn, 2004) streaming, messaging, downloading and broadcasting. Streaming scenarios include news listening, monitoring of sports events, audio books, music listening, commercial 2
advertisements, access to information systems and interactive gaming. Broadcasting scenarios are very close to screaming scenarios including web casting or Internet radio broadcasting. They have become especially popular allowing listeners to "stream" audio on their computers. Unlike downloaded audio files, streamed audio files are not stored on the user s hard drive, but are broadcasted like traditional radio through the user s audio player. Messaging scenarios are also similar to streaming, but with size limitations, including business-to-person and person-to-person scenarios. Download scenarios include music, books and comics downloading over the network. Important for all of these scenarios named above, is to be able to handle mixed content - covering, music, speech, speech-between-music and speech-over-music. The demands these applications pose today on audio codec s for mobile services include the ability to cope with generic content, sufficient and consistent quality at lowest rates, best quality at lowest rates, and high quality operation with relaxed bit rate requirement. The new audio codec s also have to be optimized for low-resource devices (low memory and computational resources) and have to be supportive for a variety of operating systems, e.g. Symbian, WinCE, Palm OS5 and OS6. Developing and standardizing the codec s is at the moment focusing on 3GPP, which is the body standardizing GSM, evolved GSM UMTS and 3G. In the next chapter we are going to introduce some of the most important audio standards and codec s, which play an important role in the 3GPP. 3 AUDIO STANDARDS AND CODEC S In this chapter we describe what we consider the most important audio standards and codec s at the moment. In the first part of this chapter, we are going to discuss the 3GPP audio standard format families AAC and AMR, introducing the underlying technology. First we present mp3 (predecessor to AAC), AAC, HE-AAC and EAAC+ and then the challengers; AMR, AMR-WB and AMR-WB+. In the second part we are going to discuss an open source codec Vorbis Ogg ACM and some of the most important nonstandard audio formats, using the streaming technology, Windows Media Audio and RealMedia. For terminology, architecture and technology see Wales (2004) and ARM Developer s Guide (2004). 3.1 MPEG-1 (mp3) Mp3 stands for MPEG-1 Audio Layer III. It is not a separate format, but a part of the MPEG-1 video encoding format, as described earlier. Mp3 is a lossy data compression method (meaning that compressing a file and then decompressing it retrieves a file that may well be different to the original, but is "close enough") to store good quality audio into small files by using psychoacoustics in order to get rid of the data from the audio that most humans can't hear. Mp3's bit rates vary from 8 kbps to 320 kbps. When the mp3 phenomenon began in 1996, most of the audio files were encoded using 128 kbps bit rate, which is still the most popular bit rate in the world - although most of the people agree that by using slightly higher bit rates, like 192 kbps or 256 kbps, the audio quality can be compared with the CD quality. 3
The problem with mp3s takes place at lower bit rates (64 kbps and below), because the sound starts lacking the high frequency components. The reason is that mp3 at these bit rates runs out of bits to compress the music in full audio bandwidth and with significant detail. Mp3PRO was created to solve the problem of limited bandwidth mp3 files. To improve the sound quality of mp3 at lower bit rates, an enhancement technology that gives back the sound the high frequency components has been developed. The technology is called "Spectral Band Replication" (SBR). SBR is a very efficient method to generate the high frequency components of an audio signal. The resulting audio format is composed out of two components, the mp3 part for the low frequencies and the SBR or "PRO" part for the high frequencies. The first part analyses the low frequency band information and encodes it into a normal mp3 stream. This enables the encoder to concentrate on less information and allows it to do a better job of encoding. This also maintains complete compatibility to existing mp3 players. The second part analyses the high frequency band information and encodes it into a part of the mp3 stream that is normally ignored by existing mp3 decoders. Detailed information can be found at mp3pro Zone (2004). 3.2 MPEG-2 AAC AAC (Advanced Audio Coding), also known as MPEG-2 AAC, is a lossy data compression scheme intended for audio streams. AAC was designed to replace mp3. It is part of the MPEG-2 standard introduced in 1994 and developed by the MPEG group that includes Dolby, Fraunhofer (FhG), AT&T, Sony, and Nokia - companies that have also been involved in the development of audio codec s such as mp3 and AC3 (also known as Dolby Digital). Unlike older MPEG audio encoding methods, MPEG-2 AAC is not backwards compatible to older MPEG audio formats. For example, mp3 is backwards compatible to mp2. The function of AAC is based on a wideband audio coding algorithm that exploits two primary coding strategies to dramatically reduce the amount of data needed to convey high-quality digital audio. First, the signal components that are "perceptually irrelevant" and can be discarded without a perceived loss of audio quality are removed. Next, redundancies in the coded audio signal are eliminated. Efficient audio compression is achieved by a variety of perceptual audio coding and data compression tools. When compared side-by-side with its predecessor, mp3, AAC is proving itself worthy of replacing mp3 as the new Internet audio standard. It has improved compression, which provides higher-quality results with smaller file sizes. It has support for multi-channel audio, providing up to 48 full frequency channels, higher resolution audio, yielding sampling rates from 8 up to 96 khz and improved decoding efficiency, requiring less processing power for the decoding. These result in higher quality output at lower data rates, allowing even modem users to hear a difference. It also enables the listener to get a better and more stable quality than mp3 at equivalent or slightly lower bit rates. Depending on the AAC profile and the mp3 encoder, 96 kbps AAC can give nearly the same or better perceptional quality as 128 kbps mp3. 4
3.3 MPEG-4 HE-AAC MPEG-4 High Efficiency AAC is the combination of MPEG-2 AAC and the SBR Bandwidth Extension amendment that is based on SBR (Spectral Band Replication) technology. HE-AAC is not a replacement for AAC, but rather a superset, that extends the reach of high-quality MPEG-4 audio to much lower bit rates (as low as 32 kbps). HE-AAC is able to achieve superior audio quality without losing treble sound or the collapsing of the stereo image. HE-AAC decoders will decode plain AAC and the enhanced AAC plus SBR. The result is a backward compatible extension of the standard that nearly doubles the efficiency of MPEG-4 audio. As discussed before, SBR is a unique bandwidth extension technique, that doesn t replace the core codec, but operates in conjunction with it to create a more efficient superset, that can cut the required bit rate in half. Present in both the encoding and the decoding process, SBR leverages the correlation between the low and the high frequencies in an audio signal to describe the high-end of the signal using only a very small amount of data. This SBR data describing the high-frequencies is coupled with the low-frequency compressed data from the AAC codec. Once combined, the complete HE-AAC bit stream contains enough data to recreate the original signal. (See figure 1.) For example, to create 48 kbps stereo HE-AAC, the encoder generates two signals: an MPEG AAC signal at about 42 kbps and a SBR signal at about 6 kbps. The SBR signal is then placed into the MPEG AAC auxiliary fields as defined in MPEG-4 and sent out as a complete 48 kbps MPEG-4 HE-AAC bit stream. Figure 1. The encoding and decoding process of HE-AAC. Because the SBR data is placed within the AAC auxiliary fields, the enhanced signal will be accepted by both an existing AAC and a new HE-AAC decoder. If sent to an AAC decoder, only the low-frequency audio signal will be recognized and decoded. If sent to an HE-AAC decoder, the SBR and the AAC will be decoded to recreate the full frequency signal. This technique makes the new profile forward compatible with AAC. Because the HE-AAC decoder contains a full-fledged AAC decoder, it is also able to decode both the Plain AAC and HE-AAC MPEG-4 Audio profiles. This combination makes HE-AAC backward compatible with AAC. As a result, HE-AAC delivers cd-quality stereo at 48 kbps and 5.1 channel surround sound at 128 kbps. This level of efficiency is ideal for Internet content delivery and 5
fundamentally enables new applications in the markets of mobile and digital broadcasting. However HE-AAC is not good enough for two-way communications, due to its very high delay according to Frerichs (2003). 3.4 EAAC+ Enhanced AAC+ was introduced in 3GPP release-6 standard in 2004. It has an optimal operating range from 18 kbps and higher. According to 3rd Generation Partnership Project, enhanced AAC+ general audio codec consist of MPEG-4 AAC, MPEG-4 SBR and MPEG-4 Parametric Stereo. The AAC is a general audio codec, SBR is a bandwidth extension technique offering substantial coding gain in combination with AAC, and Parametric Stereo enables stereo coding at very low bit rates. According to IBC 2003 Conference Papers, the basic principle behind the parametric stereo is similar to the SBR principle - a guided reconstruction of a stereo signal based on a transmitted mono signal. In addition to a coded mono mixdown of the stereo input signal, parameters describing the stereo image are transmitted. The stereo parameters require a small fraction of the total bit rate, ensuring a high quality of the mono signal at the given bit rate. Two parameters are used to describe the stereo information, a panorama parameter and an ambience parameter. The panorama parameter contains information about the left to right level differences within different frequency bands. Similarly, the ambience parameter depicts the stereo ambience for a set of frequency bands. The encoding of both parameters uses the same principle of entropy coding of time- or frequency-direction differences as is used for the SBR envelopes. In addition, the quantization steps are frequency dependent. Also in addition to the older codec s, there are 3 additional tools included in the Enhanced AAC+ decoder. Error concealment tools for AAC, SBR, and PS make the decoder robust against transmission errors like frame loss. These tools mitigate audible effects of such errors. The stereo-to-mono down mix tool enables a decoder only capable of mono output to down mix a stereo bit stream. For the AAC part this is done in the time domain after the stereo decoding but for SBR this is done on the SBR parameters and thus saving complexity since only a mono decoding of SBR is needed. The Spline resampler tool gives the possibility to resample the output to a sampling frequency different than what was supplied in the bit stream. This gives for example handsets with a D/A converter only capable of 16 khz sampling frequency the possibility to play bit streams encoded with 22.05 khz sampling frequency. Figure 2 shows a block diagram of the EAAC+ encoder. The encoder basically consists of the AAC waveform encoder, the SBR high frequency reconstruction encoding tool and the PS encoding tool. The encoder operates in a dual rate mode, whereas the SBR encoder operates at the encoding sampling rate fsenc as delivered from the IIR resampler and the AAC encoder at half of this sampling rate fsenc/2. Consequently a 2:1 down sampler is present at the input to the AAC encoder. The PS tool is used for low bit rate stereo coding, i.e. up to and including a bit rate of 32 kbps. The AAC encoder implementation complies with the AAC Low Complexity Object Type and is a highly optimized low-resource implementation, requiring only little computational complexity and memory resources. This is basically achieved by mapping the psychoacoustic based threshold estimation directly to scale factor amplification values to shape the encoding quantization noise according to the input signal 6
characteristics, rather than employing time-consuming iterative analysis-by-synthesis methods. The SBR encoder consists of a QMF (Quadrature Mirror Filter) analysis filter bank, which is used to derive the spectral envelope of the original input signal. Furthermore the SBR related modules control the selection of an input signal adaptive grid partitioning of the QMF samples on the time axis (i.e. control the framing), analyze of the relation of noise floor to tonal components in the high band, collect guidance information for the transposition process in the decoder and detect missing harmonic components which could not be reconstructed by pure transposition. This gathered information about the characteristics of the input signal, together with the spectral envelope data forms the SBR stream. The amount of bits for the SBR stream is subtracted from the bits available to the AAC encoder in order to achieve a constant bit rate encoding of the multiplexed EAAC+ stream. The Parametric Stereo encoding tool in the EAAC+ encoder estimates parameters characterizing the perceived stereo image of the input signal. These stereo parameters are embedded in the SBR stream. At the same time, a signal adaptive mono down mix of the input signal is generated in the QMF domain and fed into the SBR encoder operating in mono. This down mix is also processed by a down sampled QMF synthesis filter bank to obtain the time domain input signal for the AAC core encoder with the sampling rate fsenc/2. In this case, the 2:1 IIR down sampler is not active. Figure 2. 3rd Generation Partnership Project; EAAC+ Encoder overview In the decoder (figure 3) the bit stream is de-multiplexed into the AAC and the SBR stream. Error concealment, e.g. in case of frame loss, is achieved by designated algorithms in the decoder for AAC, SBR and PS: the AAC core decoder employs signaladaptive spectrally shaped noise generation for error concealment, in the SBR and PS decoders, error concealment is based on extrapolation of guidance, envelope, and stereo information. For the SBR processing, a Low-Power tool of SBR is used for full stereo decoding in order to keep the peak computational complexity as low as possible over all channel 7
modes. Usage of the SBR Low-Power tool provides a computational complexity of an HE-AAC stereo decoder in the same range as plain AAC stereo decoders. The low band AAC time domain signal, sampled at fsenc/2, is first fed to a 32-channel QMF analysis filter bank. The QMF low band samples are then used to generate a high band signal, whereas the transmitted transposition guidance information is used to best match the original input signal characteristics. The transposed high band signal is then adjusted according to the transmitted spectral envelope signal to best match the original s spectral envelope. Missing components that could not be reconstructed by the transposition process are also introduced. Finally, the low band and the reconstructed high band are combined to obtain the complete output signal in the QMF domain. In case of a stream using parametric stereo, the mono output signal from the underlying HE-AAC+ decoder is converted into a stereo signal. This processing is carried out in the QMF domain and is controlled by the parametric stereo parameters embedded in the SBR stream. Figure 3. 3rd Generation Partnership Project; EAAC+ Decoder overview 3.5 AMR The AMR (Adaptive Multi-Rate) standard was introduced in 1998. Its main function is mobile baseline speech. It operates at variable mono bit rates in the range of 4.75 to 12.2 kbps in its narrowband (bandwidth 3.5 khz) configuration. It was adapted by the 3GPP as the mandatory codec for 3G wireless systems based on the evolved GSM core network (WCDMA, EDGE, GPRS). The philosophy behind AMR is to lower the codec rate as the interference increases and thus enabling more error correction to be applied. The AMR codec is also used to harmonize the codec standards amongst different cellular systems. This is based on technology called ACELP (Algebraic Code Excited Linear Prediction). ACELP is a 8
speech compression system, used to provide a good standard of speech quality when the network is operating at low data rates (narrow bandwidth). The analogue voice signal is converted to a digital data signal, so that it can be compressed for transmission over the network, and the process is then reversed at the other end when the digital data is converted back to an analogue voice signal. The quality of the reproduced speech will appear to be much better at the receiving phone than without the ACELP system. 3.6 AMR-WB AMR-WB (wideband extension) is a speech coding standard developed after the AMR using the same technology as ACELP. The AMR Wideband codec was standardized by ETSI/3GPP in December 2000, and selected and approved by the ITU-T in July 2001 and January 2002, respectively. The ITU-T standard is referred to as G.722.2. The codec provides excellent speech quality due to its wider speech bandwidth of 50-7000 Hz, significantly improving the intelligibility and naturalness of speech and adding a feeling of face-to-face communication. The AMR-WB speech codec consists of nine speech codec modes with mono bit rates of 23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85 and 6.6 kbps. The lowest bit rate providing excellent speech quality in clean environment is 12.65 kbps. Higher bit rates are useful in background noise conditions and in case of music. Also lower bit rates of 6.60 and 8.85 provide reasonable quality especially if compared to narrowband codec s. Background noise mode is designed to be used in discontinuous transmission (DTX) operation in GSM and as a low bit-rate source dependent mode for coding background noise in other systems. AMR-WB can also carry narrowband signals. It eliminates the need for transcoding and eases the implementation of wideband applications and services across a wide range of wireless and wire line communication systems and platforms. AMR-WB is already standardized for future usage in networks such as UMTS. There it provides so much higher speech quality that it seems probable that also older networks will have to gradually be transformed to support wideband. 3.7 AMR-WB+ Adopted as an audio codec standard in September 2004 by ETSI/3GPP, AMR-WB+ is an audio extension of AMR-WB, which utilizes a hybrid of two technologies: ACELP and TCX (Transform Coded Excitation) to deliver very high sound quality for both speech and audio content types, including music, voice-between-music, and voice-overmusic. AMR-WB+ adds support for stereo signals and higher sampling rates. Also, highefficiency parametric stereo (HE-PS), as discussed under EAAC+, provides high-fidelity stereo image reproduction at the lowest bit rates. Another main improvement is the use of transform coding additionally to ACELP. This greatly improves the generic audio coding. Automatic switching between transform coding and ACELP provides both very good speech and other audio quality with moderate bit rates. Sound quality is not 9
compromised even in networks where the bandwidth is limited. The AMR-WB+ codec has a wide bit-rate range, from 6 to 48 kbps. Mono rates are scalable from 6 to 36 kbps, and stereo rates are scalable from 8 to 48 kbps, reproducing bandwidth up to 24 khz (approaching CD quality). Moreover, it provides backward compatibility with AMR wideband. AMR-WB+ brings speech and music to mobile phones (VoiceAge, 2004). 3.8 Vorbis Ogg ACM Due to numerous patenting and licensing issues with various parts of the MPEG specifications, there has been a significant movement to create and popularize audio formats and/or algorithms which lack that significant problem. The most popular of these is probably Ogg Vorbis, which is a completely open and free codec project from Xiph.org Foundation (2004). Vorbis was started as a result of a plan to charge licensing fees for the mp3 format, which was announced in September 1998. The first version 1.0 of the codec was released on July 19, 2002. The latest version is 1.1.0 released on September 22, 2004. The Ogg Vorbis format has proved popular among open source communities; they argue that its higher fidelity and completely free nature make it a natural replacement for the entrenched mp3 format. In the commercial sector, Vorbis has already had success with many newer video game titles employing Vorbis as opposed to mp3. Given 44.1 khz as the standard CD audio sample frequency stereo input, the current encoder will produce output 45-500 kbps, depending on the specified quality setting. Though Vorbis 1.0.1 is tuned for bit rates of 16-128 Kbps/channel, it is still possible to encode arbitrary bit rates chosen by the user. Such figures are only approximate, however, as Vorbis is inherently variable-bit rate. Vorbis uses the modified discrete cosine transform (MDCT) for converting sound data from the time domain to the frequency domain. The resulting frequency-domain data is broken into noise floor and residue components, and then quantized and entropy coded using a codebook-based vector quantization algorithm. The decompression algorithm reverses these stages. 3.9 Windows Media Windows Media Audio (WMA) is a proprietary compressed audio file format used by Microsoft. It has a large user base through Windows. It was initially a competitor to the mp3 format, but with the introduction of Apple s itunes Music Store, it has positioned itself as a competitor to the AAC format used by Apple. It is part of the Microsoft Corporation (2004) Windows Media framework. An initial reason for the development of WMA might have been that mp3 technology is patented and has to be licensed from Thomson, which controls licensing of the mp3 patents in many countries including the United States of America, for inclusion in the Microsoft Windows operating system. It includes multi-channel-coding. 10
With the publishing of Windows Media Audio 9, the codec was updated to WMA. It is considered to reach close to AAC in quality. Pro and a new lossless codec has been introduced to accompany the existing lossy codec. Support for variable bit rates has also been introduced. WMA Pro has not been reverse engineered yet. Microsoft's Windows Media Audio (WMA) file format, which they claim is a higher quality audio format at smaller file sizes, is starting to gain more acceptance as it comes bundled as the standard audio format in Windows 98/2000/XP. Microsoft might be able to challenge the dominance of MP3s or at the very least offer a second, popular audio format choice. 3.10 Real Media RealAudio is a proprietary audio codec developed by RealNetwork. It is especially designed to conform to low bandwidths, and it can be used as a streaming audio format. As a matter of fact, it was one of the first to offer streamed audio software in the world. For high bit rates, Real Media uses AAC. Many radio stations use RealAudio to stream their programming over the internet in real time. The first version of RealAudio was released in 1995. The current version of the codec, RealAudio 10 was published in 2004. It includes multichannel-coding (RealNetworks Incorporated, 2004). 4 DEVELOPMENT TRENDS AND COMPARISON OF CODEC S FOR THE APPLICATIONS OF TOMORROW The hardware of the mobile platforms is going through a rapid development and thus new software and applications can be expected in mobile devices of tomorrow. The capacity of the central processing unit grows and more memory is already available at a lower price. This chapter should extrapolate what will happen to the devices in the near future. The wireless communication channels are also going through a development which leads to faster transmission to the mobile devices. Is there any need for such an effective compression the HE AAC offers or will it go to history while the limitations of today disappear? 4.1 The Features of new Mobile Phones The main target of this hardware study is the mobile phone, as the number of mobile phones is much larger than the number of PDA s. The mobile phone is also a good lowend platform representative for mobile devices as one of the main requirements for a phone always is its size and weight. According to Symbian Ltd. (Symbian, 2004), the leading manufacturer of 3G Operating Systems for Mobile Phones, the latest Symbian operating system, OS 8, is already used on Series 60, Platform 2.0 based 3G phones as the Nokia 6630 which give them a wide support for audio codec s as NB-AMR, WB- AMR, MP3, AAC and RealAudio. As the phone has 10 MB of internal dynamic and 64 MB on a MMC, it offers fairly good possibilities for audio and video applications in the mid-price range. More expensive phones as Nokia 7710, in series 90, with Symbian OS 7, support the same audio codec s even in stereo. The Nokia 7710 has 90 MB of RAM 11
and can handle a MMC on 512 MB which makes it an excellent choice for audio and multimedia applications. The same applies to the Nokia Communicator 9500. Thus the hardware limits for audio have been eliminated in the mid- and high-end mobile phones. In the low-end mobile phones there are still some relevant hardware restrictions considering the use of audio, mainly because of the low price requirement, but they will disappear in the near future. 4.2 The Telecommunication Features of the Mobile Networks The basic GPRS (General Radio Packet Services) network still used in many mobile phones support communication speeds of 30-50 kbps. The EDGE (Enhanced Data rates for GSM Evolution) or EGPRS technology increases the speed for the end-user to rates of 120-150 kbps and even a bit higher. EGPRS is available in most mid-end and even some low-end phones so it can be considered as the standard today. EGPRS is however available only in urban and suburban areas today. The UMTS (Universal Mobile Telecommunication System) offers data speeds from 384 kbps (TDD Mode) to 2 Mbps (TDD Mode) (Compagnie Financière Alcatel, 2004), which removes some of the limitations. So far UMTS is available only in high-end mobile phones and only in urban areas. Finland will not be covered by UMTS networks in the near future, which means that EGPRS still will be the fastest alternative for a large group of phone users here. The speed of the EGPRS is however enough for streaming music audio applications if the latest codec s are used. 4.3 Comparison of Mobile Audio Codec s There are many methods to compare the quality of audio streams. One method is to use an audience to judge the quality. A test used by the European Broadcasting Union (EBU) called MUSHRA is often used as a reference. MUSHRA stands for MUlti Stimulus test with Hidden Reference and Anchors and is an advanced testing method developed and proposed by the EBU Project Group B/AIM. The method has been submitted to ITU for standardization. MUSHRA (Stoll and Kozamernik, 2000) is a subjective test where listeners in different EMU-countries compare different types of audio to a reference signal and grade it according to a scale from 0 to 100, where the interval 81-100 is considered excellent, 61-80 is considered good, 41-60 is considered fair, 21-40 is considered poor and 0-20 is considered bad. Different types of music such as classical, folk, jazz and pop music is tested. Broadcasting programs, both in a studio and a live environment, with female and male voices, are also tested. According to these listener tests, performed by EBU, only a little difference can be heard between stereo cd-quality and HE-AAC compression at 48 kbps. The test results are described by Kozamernik (2003). This is also illustrated in figure 4, which shows that aacplus, also called HE-AAC gets the highest MUSHRA index of 80 compared to 12
mp3pro which gets the index 76. At the rate 48 kbps the more well-known RealMedia Real 8, mp3 and MS Windows Media 8 codec s get much lower ratings. EBU has not reported MUSHRA testing of the AMR-WB+ codec yet. Figure 4. European Broadcasting Union MUSHRA testing at 48 kbps stereo (Coding Technologies, 2004). The 3rd Generation Partnership Project (3GPP) is a collaboration agreement that was established in December 1998. 3GPP has conducted a standardization process for Packet Switched Streaming (PSS) and Multimedia Messaging Services (MMS). Two bit rates have been defined: 1. low-rate range up to 24 kbps, where the candidates are: AMR-WB+, HE-AAC+ /aacplus) and Enhanced AAC+ 2. high-rate range, with rates higher than 24 kbps. Here the candidates are: HE-AAC+ /aacplus) and Enhanced AAC+ The comparison tests that 3GPP conducted for the selection audio coding standard shows the following quality scalability for AMR-WB+ in a MUSHRA test. Figure 5 shows that the MUSHRA score for AMR-WB+ at 48 kbps is 83, which overrides the EBU figures for aacplus. 13
Figure 5. Quality scalability of AMR-WB+ based on a MUSHRA test (Bruhn, 2004). The comparison tests between EAAC+ and AMR-WB+ that 3GPP conducted for the selection of low rate-range audio coding showed that AMR-WB+ is a slightly better codec for stereo at rates lower than 24 kbps which can be seen in figure 6. Both codec s however represent edge coding technology giving the highest quality possible for mobile devices today. Figure 6. Comparison of AMR-WB+ and EAAC+ by 3GPP (Mäkinen, J. et al., 2004). 14
4.4 Support for latest Codec s in Mobile Phones Codec s as AAC and AMR-WB are already supported in mobile mid- and high-end mobile phones so they can be used if the target consumer is in the mid- or high-end classification as office mobile phone users generally are. The latest codec s using SBR however, are yet not supported by mobile phones. This means that HE-AAC (aacplus), EAAC+ and AMR-WB+ cannot yet be used in mobile applications. It will however not take too long before also these codec s are supported as they are approved by the 3GPP and the hardware manufacturers already have implemented them in the products. Nokia has also signed a aacplus license agreement in July, 2004 (www.3g.co.uk,2004) which indicates that aacplus will be available on Nokia mobile phones soon. Open codec s, as Ogg Vorbis, do not seem to be so successful on the mobile commercial market. They are generally not supported, as a standard feature, but users that are interested in them can install the codec and a player. There is an Open Source Player called OggPlay by Leif H. Wilden (2004) for the Symbian OS. This player currently supports ogg-, mp3- and acc-files on Series 60 phones having Symbian OS 7 or later. Windows Media has not yet succeeded to get the same position in mobile phone market as it has in the PC-market. No other phones than Microsoft s own brands include a Windows Media Player. Real Media Players are however available for most mobile phones and it has thus established a special position on the mobile market. 4.5 Existing and upcoming applications for high-quality audio in mobile devices Applications for downloading of music contents to the mobile phone already exist. There are at least two commercial players (MP3go and UltraMP3) that support the playing of mp3-based music. These players also support the creating and usage of play lists. The main disadvantage at the moment is the need of memory (3-5 MB/song) for high quality stereo mp3-music. If HE-AAC or AMR-WB could be used, the size of a song would be below 1 MB. This would allow low-end mobile phones to store more music than today. The high-end mobile phones already have enough memory available on MMC s. The downloading time for a song in the EGPRS-network would decrease from five minutes to one. Normally songs are not downloaded over the network but directly from a PC through cable, Bluetooth or IR. The UK mobile phone company MMO2 (mm02, 2004) is launching a service for downloading music to mobile phones in November. It uses a special music player called O2 Digital Music Player. The music files will be encoded in the MPEG4, aacplus format and should be about a megabyte in size, MMO2 says. One song would take roughly 90 seconds to download across a GPRS connection. The copy-protection technology will be provided by the Swiss company Secure Digital Container (SDC). 15
Streaming applications for mobile phones already exist. Both music and video can be enjoyed from the mobile phone. The Finnish Broadcasting Company YLE, as an example, sends the news as 20-50 kbps streams for the GPRS-network. This speed was selected to make the news available anywhere in Finland. The most common format today is RealMedia but other formats will certainly be available in the near future. One of the problems today is the quality of the content due to the low bandwidth in the GPRS network. In the near future the quality of the content will be much better due to both increased bandwidth and more efficient codec s which will largely improve enjoyment. The American Market Research Centre In-Stat (In-Stat, 2004) expects the American streaming video market to start to grow in the next two years but not until 2009 it will reach 15 % of the total wireless revenues which is not very encouraging. Another study shows that 11 % of the mobile phones users today are very or extremely interested in buying music over the mobile phone network. 5 CONCLUSIONS The development of audio codec s for mobile phones have been very rapid in the past few years. Enhancements of codec s have been released yearly and there always seems to be new technologies that can be applied on the compression procedure. Such technologies that changed the world of encoding are MP3, AAC and SBR. At this moment the ultimate codec s for audio seem to be AMR-WB+ and HE-AAC (aacplus) depending on what kind of audio material is encoded. This is most likely not the last step in codec s. New codec s will probably be introduced yearly even in the future. Figure 7. Applications for Mobile Audio (Mäkinen, J. et al., 2004). 16
The need for more efficient codec s will probably gradually decrease as the telecommunication speeds will continue to grow even beyond 3G networks. According to the telecommunication company Alcatel s White Paper on Mobile Network Evolution the expected communication speed for mobile phones will approach 1 Gbps in 2010 2015(See figure 8). Figure 8. Evolution of mobile networks from 2G to B3G (Hurel, J-L et al., 2004). On the other hand, new applications utilizing the possibilities will certainly be introduced on the market. These products also act as a drive for the technology as they demand more computing power, more memory, better graphics and better audio which in turn demand more efficient telecommunication possibilities. As long as there is a need for those applications and a willingness to pay the price utilizing them the development process is secured. Severe limiting factors, that could stop the development of mobile audio applications, seem to be hard to find. 17
REFERENCES ARM Developer s Guide 2004-2005. Convergence Promotions. Developer s Guide (Online) 2004. [Referenced 25.11.2004]. Available: http://arm.convergencepromotions.com/catalog/m_home.htm Bruhn, S. 2004. Bridging the gap between speech and audio coding - AMR-WB+ - The codec for mobile audio. Ericsson Research, Multimedia Technologies. Available: http://www.s3.kth.se/radio/courses/s3_seminar_2e1380_2004/presentations/er icssonaudio-040506.pdf Coding Technologies. aacplus. Products and Technologies. Promotion Page (Online) 2004. [Referenced 25.11.2004]. Available: http://www.codingtechnologies.com/products/aacplus.htm Compagnie Financière Alcatel. Mobile Networks. Solutions. Technology Overview Page (online) 2004. [Referenced 25.11.2004]. Available: http://www.alcatel.com/mobilenetworks/mobileinternet/ F. Henn, R. Böhm, S. Meltzer, Th. Ziegler, 2003, SPECTRAL BAND REPLICATION (SBR) TECHNOLOGY AND ITS APPLICATION IN BROADCASTING http://www.broadcastpapers.com/radio/ibc2003codingsbr04.htm Fraunhofer Institute for Integrated Circuits IIS. Audio & Multimedia. MPEG Audio Layer-3. Technology Report (online) 2004. [Referenced 25.11.2004]. Available: http://www.iis.fraunhofer.de/amm/techinf/layer3/ Frerichs, D. 2003. New MPEG-4 High-efficiency AAC Audio: Enabling new applications. Coding Technologies. Available: http://www.telossystems.com/techtalk/hosted/m4-in-30100%20(m4if_he_aac_paper).pdf Hurel, J-L. Lerouge, C. Evci, C. & Gui L. 2004. Mobile Network Evolution: From 3G Onwards. Technical White Paper. Compagnie Financière Alcatel. Available: http://www.alcatel.com/doctypes/articlepaperlibrary/pdf/atr2003q4/t0312- Mobile-Evolution-EN.pdf In-Stat, American Market Research Centre. Mobile Consumer Data & Multimedia Services. Information Service (Online) 2004. [Referenced 25.11.2004]. Available: http://www.instat.com/catalog/wcatalogue.asp?id=230 18
Kozamernik,F. 2003. EBU subjective listening tests on low-bitrate audio codec s. EBU Listening Tests. Tech 3296. June 2003. Available: http://www.ebu.ch/cmsimages/en/tec_doc_t3296_tcm6-10497.pdf?display=en mp3pro Zone 2004. Coding Technologies. Developer s Guide (Online) 2004. [Referenced 25.11.2004]. Available: http://www.mp3prozone.com/ Microsoft Corporation. Windows Media Home. Technology Page (Online) 2004. [Referenced 25.11.2004]. Available: http://www.microsoft.com/windows/windowsmedia/default.aspx mm02. O2 Digital Music Player. Cellular Phone Operator. Promotion Page (Online) 2004. [Referenced 25.11.2004]. Available: http://www.o2.co.uk/o2-digital-musicplayer.html Mäkinen, J. et al. 2004. AMR-WB+: A new audio coding standard for 3 rd generation mobile audio services. Nokia Research Center. Finland, submitted to ICASSP 2005. Figures available: http://www.tml.hut.fi/opinnot/t- 111.550/Mobileaudioformats2004-10-26.pdf RealNetworks Incorporated. Real Player Page. Technology Page (Online) 2004. [Referenced 25.11.2004]. Available: http://www.real.com/player/?src=realaudio Stoll, G. and Kozamernik.F 2000. EBU Listening Tests on Internet Audio Codec s. EBU Technical Review June 2000. Available: http://www.ebu.ch/trev_283- kozamernik.pdf Symbian Ltd. Symbian OS Phones. Technology Promotion Page (Online) 2004. [Referenced 25.11.2004]. Available: http://www.symbian.com/phones/index.html The 3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; General Audio Codec audio processing functions; Enhanced aacplus General Audio Codec; General Description (Release 6), (2004) http://www.3gpp.org/ftp/tsg_sa/tsg_sa/tsgs_24/docs/pdf/sp-040428.pdf Vilermo, M. 2004. Audio Codecs. AES/RTI- Audiopäivät 2004 Conference, Helsinki, 25. 26.5.2004. Audio Engineering Society Finnish Section. Available: http://www.aes.fi/audiopaivat2004/vilermo.pdf 19
VoiceAge Corporation Licencing Service. AMR-WB+ FAQs. Technologies. Frequently Asked Questions (2004). [Referenced 25.11.2004]. Available: http://www.voiceage.com/amrsite/tech_wbplus_faqs.php Wales, J. 2004. Wikipedia - The Free Encyclopedia. Wikipedia Foundation. Electronic Encyclopedia (Online) 2004. [Referenced 25.11.2004]. Available: http://en.wikipedia.org/ Wilden, L. H. Ogg Vorbis Player for Symbian OS phones. Technology Page (Online) 2004. [Referenced 25.11.2004]. Available: http://symbianoggplay.sourceforge.net/ www.3g.co.uk. aacplus 2.5 / 3G License with Nokia. News service for 3G. News Service Site (online) July 2004. [Referenced 25.11.2004]. Available: http://www.3g.co.uk/pr/july2004/8026.htm Xiph.Org Foundation. The Ogg Vorbis CODEC project. Ogg. Developer s Page (Online) 2004. [Referenced 25.11.2004]. Available: http://www.xiph.org/ogg/vorbis/ 20