1 Quality of Service for Streamed Multimedia over the Internet Nicola Cranley*, Ludovic Fiard, Liam Murphy* *Performance Engineering Laboratory, School of Electronic Engineering, Dublin City University, Ireland Broadcom Eireann Research Ltd, Dublin, Ireland 1. Introduction Multimedia is any combination of text, graphics, audio, video, animation and data. Multimedia applications over the Internet include Video on Demand (VoD), interactive video, and videoconferencing. However there are limitations to these applications, as it is often required that a multimedia file be completely downloaded before it can be played or viewed. Streaming is the ability to start processing data before all of it has arrived, thus making delivery in real-time or near real-time possible. Streaming technologies are designed to overcome the problem of limited bandwidth. The implication of this is that multimedia files of any size can be played/displayed over the Internet in real-time or near real-time. To date, there has been no definitive way to transmit streamed MPEG-4 files across the Internet with an associated Quality of Service. One possibility is to write a control protocol on top of TCP/IP, which manages the flow of multimedia data . In this paper, an alternative approach using a protocol stack comprising a Real-time Transport Protocol (RTP) layer over a User Datagram Protocol (UDP)/Internet Protocol (IP) layer is described. Firstly we provide a brief description of MPEG/MPEG-4 and RTP/RTCP, followed by a description of the system implemented and plans for its future development Quality of Service (QoS) In computer and telecommunications networks, Quality of Service (QoS) refers to the network operators commitment to providing and maintaining acceptable values of parameters or characteristics of user applications. In certain cases the network operator may be able to guarantee (perhaps probabilistically) the QoS level a given user's application will receive. This is of particular concern for the continuous transmission of high-bandwidth video and multimedia information. QoS can be characterised by a number of specific parameters, such as throughput, delay, delay variation, loss rate, and so on. The client specifies its desired QoS parameters when it requests a connection to a server. The client and server must agree on what constitutes acceptable service. Both the desired and minimum acceptable values may be exchanged and negotiated. If the client and server cannot agree on these QoS parameters then the connection is refused. This may happen if the server is already heavily loaded with existing connections, since the server's first priority is to maintain the QoS of already-accepted connections. Providing QoS guarantees is difficult or impossible in networks that offer "best effort" service, such as the Internet's IP layer. Therefore a lot of work has been carried out recently on how to add QoS support to the Internet service model. Examples of this include the intserv (Integrated Services)  and diffserv (Differentiated Services)  approaches. The RTP/RTCP approach is an attempt to add QoS support mechanisms above the Transport layer (TCP or UDP). However the use of RTCP messages to provide and maintain QoS guarantees to multimedia streams is still under investigation. 2. MPEG/MPEG-4 The term MPEG stands for Motion Picture Experts Group, the body that standardised the MPEG formats (syntax and semantics) . MPEG is a standardised method of compressing video and is used in many current and emerging products. It is at the heart of television set top boxes, DSS, HDTV decoders, DVD players, video conferencing, Internet video and other applications. High compression is needed so that less storage space is required for archived video information, and/or it occupies less bandwidth in transmission of the video information from one point to another. There have been a number of MPEG standards defined, each for a specific application context. For example MPEG-1 was designed specifically to store movies on CD-ROM in CD-I and CD-Video format. MPEG-4 is a new standard designed for video conferencing and other video applications . Unlike MPEG-1 where a video sequence is decomposed into
2 frames, MPEG-4 decomposes the scene into objects, called audio-visual objects (AVO s), each with its own audio and video track that will vary over time. MPEG-4 is very flexible, for example, if the network is congested or overloaded, not all the objects need to be sent to the client, only those that make the video sequence coherent. For example, a simple video clip of a newsreader will have two objects, the newsreader and the background. If the network is overloaded, the background information can be discarded to ensure that the video sequence is transmitted to the client on time and coherently. Previously using other MPEG standards, the image would flicker or be noticeably delayed. Media aware Delivery unaware COMPRESSION LAYER AVO BIF OD ES ES Interface AU Media and Delivery unaware DMIF Application Interface SYNC LAYER (SL) ES stream management and synchronisation SL strea SL-PDU Media unaware Delivery aware DELIVERY LAYER (DMIF) Flexmultiplexing of SL streams and transport access to the delivery technology FlexMux Fig.1. MPEG-4 layered architecture Internet RTP/UDP/IP The Compression Layer produces the compressed representation of the traditional audio/visual elementary streams (ES) and associated streams such as the Binary Format for Scenes (Bifs), object descriptors (OD) and initial object descriptors (IOD) that compose a scene. The Bifs ES syntax allows dynamic scene description. The OD ES syntax allows the description of hierarchical relations, location and properties of the different ES s through a dynamic set of OD s. OD s refer to natural or synthetic objects and serves as a grouping of one or more Elementary Stream Descriptors that refer to a single media object. A complete set of OD s can be seen as an MPEG-4 resource or session description. The compressed content produced by the compression layer is organised into Elementary Streams. This layer provides the synchronisation between the ES s. ES s are organised into Access Units (AU). The AU is the smallest element to which a timestamp can be attributed and each AU is unique. AU s produced by the encoders are passed onto the SL layer either complete or segmented i.e. partial AU s through the Elementary Stream Interface (ESI). Each AU is marked with indications of its boundaries, random access points and timestamps such as the desired decoding time, arrival time and composition time of the decoded AU. The Synchronisation Layer takes these AU s and encapsulates them into SL packets or SL-PDU s (packet data units). The SL packet header can be adapted to the needs of the ES to be carried and provides a means for continuity checking in case of data loss, carrying a coded representation of the timestamps and
3 associated information. The configuration of each individual SL packet header is conveyed in the SLConfigDescriptor. The SL-PDU s of varying instantaneous bit rates can be multiplexed or interleaved by means of the FlexMux tool. The use of a FlexMux tool is optional which adds a low multiplexing overhead to the multiplexed SL streams. A FlexMux stream is a succession of FlexMux packets. Each FlexMux packet is built from a fixed length FlexMux packet header and a FlexMux payload. The FlexMux packet header is composed of a one-byte index followed by a one-byte length field. There are two FlexMux packet management modes, MuxCode Mode or Simple Mode. In Simple Mode, the FlexMux packet payload corresponds to one complete SL packet. In MuxCode Mode, the FlexMux payload can consist of a number of SL packets. FlexMux describes the bitstream syntax used to multiplex SL streams. It also is used for the reconstruction of the correct timing of an MPEG-4 bitstream and is supported by the MPEG-4 SL stream syntax and the MPEG-4 FlexMux stream syntax. The reconstruction of correct timing of an MPEG-4 FlexMux stream is possible under some QoS constraints closely related to the reduction of network jitter. MPEG-4 FlexMux assumes that there is a nearly constant transmission delay. The MPEG-4 Delivery Application Interface supports content location independent protocols firstly, for establishing the MPEG-4 session and secondly for accessing to transport channels. The DMIF monitors transport channels on the QoS requirements assigned to the SL streams, and support the multiplexing of the SL streams, by means of the FlexMux tool. 3. RTP/RTCP(Real time Transport Protocol/ RTP Control Protocol) The main problem with UDP/IP as a transport mechanism is that there is no guarantee that the packets will arrive, and once lost or delayed past their playtime they are discarded. However using another transport mechanism such as TCP/IP would be wasteful, as it has a larger header overhead and requests retransmission of all lost or delayed packets. Retransmission of all lost packets is usually unsuitable for real-time applications, as this would cause undue traffic on the network and the retransmitted packets may arrive too late for play-out. So far there are no standards in existence for requesting the re-transmission of streamed media. RTP is a transport protocol designed for real-time applications . RTP has been used with other network protocols such AAL5/IP and TCP/IP, but applications typically run RTP on top of UDP/IP to make use of UDP's multiplexing and checksum facilities. RTP and RTCP are independent of the underlying transport and network. RTP and RTCP add extra functionality to UDP/IP such as sequence numbers and timestamps. RTCP is particularly useful as it enables the integration of some quality of service mechanisms by means of client feedback, thus allowing the server to modify its transmission to the client in response to network conditions. In essence, it is the clients feedback which dictates the servers transmission. The MPEG-4 data is broken up into packets constituting the payload of the RTP/UDP/IP packets. Each layer in the stack adds a header of information so that the packets can be routed to, and decoded by, the client who has requested it. RTP should be able to adjust its transmission to match the receivers requirements and abilities and network conditions. However in multicast sessions it is difficult for the server to determine how it should adapt - to the average of the users, or to the lowest common denominator, or the highest. Instead, the client controls rate adaptation by combining layered encoding with layered transmission. RTP provides end-to-end network transport functions suitable for applications transmitting realtime data, such as audio or video, over multicast or unicast network services. There is no guarantee ensuring timely delivery or QoS, nor does RTP provide multiplexing/de-multiplexing facilities. RTP does not assume that packets will arrive in sequence and so it incorporates packet sequence numbers within its header so that the receiver can sort the sender's packet sequence before decoding. The RTP payload is the data transported by the RTP packet, which in this case is the MPEG-4 systems data. An RTP packet consists of a fixed header, a list of contributing sources and the payload data. Typically one RTP packet is encapsulated within one UDP/IP packet. RTP should use an even destination port number and the corresponding RTCP stream should use the next higher (odd) destination port number. In a unicast session, both participants need to identify a port pair for receiving RTP and RTCP packets. When RTP data is sent in both directions, each participant must issue RTCP Sender Reports. However it cannot be assumed that the source of incoming RTP data will be the destination for outgoing RTP data. The Synchronisation source (SSRC) is the source of a stream of RTP packets. All packets from the same source share the same timing and sequence numbering space. This is important for video-conferencing, as
4 there will be data from many different sources such as the microphones, cameras etc. A synchronisation source can change its data format over time. The SSRC is just a 32-bit random number but will be globally unique throughout the RTP session. The SSRC identifiers are bound through the RTCP information. The Contributing source (CSRC) is the source of a stream of RTP packets that has contributed to the combined stream produced by an RTP mixer. For example in an audio conference, several peoples' speech will be interleaved into one RTP packet, the CSRC will identify whose speech was combined and identifying who the current speaker is. The fields in the RTP packet header that are of interest are the extension bit and the marker bit. When the extension bit is set, denoted X in Fig.2, indicates that the fixed header must be followed by one header extension. It is through extending and tailoring this header that QoS will be integrated into RTP. Also, the marker bit, denoted M in Fig.2, is of interest as the interpretation of the marker bit can be defined by a profile. Markers are used to allow significant events such as frame boundaries to be marked in the packet stream. A profile may define additional marker bits, or specify that there is no marker bit, by changing the number of bits in the payload type field. In the system we have implemented this marker bit is assumed to indicate packet priority V P X CC M PT Sequence Number TIMESTAMP Synchronisation source (SSRC) identifier Contributing source (CSRC) identifier Fig.2. RTP Packet Structure It is possible to multiplex RTP sessions although the number of multiplexing points should be minimised. In RTP, multiplexing is provided by the destination address (network address and port number) which define the RTP session. In the case of audio-video conferences where audio and video are encoded separately, one RTP channel must be set up for the audio stream and another for the video stream, each with its own destination transport address in addition to the RTCP channel. The audio and video streams are transmitted and treated separately but they share the same canonical name identifying them so that they can be associated with each other at the receiver. The RTP header is flexible and can be adapted for profile specific modifications and additions. This facility is provided by the extension mechanism. However the header extension is intended for limited use only. If the extension bit is set to 1, this indicates that there is a variable length header appended to the RTP header, following the CSRC list. RTP receivers provide quality feedback using the RTCP report packets. RTCP is used to facilitate the monitoring of RTP data delivery and quality of service throughout the session. RTCP is used to convey statistics about the transmission to the client and server periodically . RTCP performs four main functions: - 1. Provide feedback on the quality of the data distribution. This feedback could be used directly to enable the server to dynamically adapt its transmission and also to aid the diagnosis of problems on the network by re-routing the RTCP packets to a third party. 2. RTCP has a persistent transport-level identifier for an RTP source called the canonical name. This name or CNAME is used to keep track of all the participants within the session. The CNAME is also required to associate multiple data streams and then synchronise them together for example audio and video streams. 3. In the case of conferencing, all participants send RTCP packets to all other participants; each participant can independently observe all the other participants. 4. Convey minimal session control information. Each RTCP packet begins with a fixed header part followed by structured elements that may be of variable length. RTCP packets are stackable and so multiple RTCP packets can be concatenated without the need of a separator forming an RTCP compound packet, which is then encapsulated in a UDP/IP packet.
5 RTCP is designed to carry a variety of control information. The Sender Report (SR) is used for the transmission and reception of statistics from participants that are also active senders. The Receiver Report (RR) is used for the reception of statistics from the participants that are not active senders and used in combination with SR for active senders reporting on more than 31 sources. The Source DEScription (SDES) items identifying the source using a canonical name (CNAME). The APPlication (APP) packet is profile defined and specific to the session type. There are, however, a few restraints on RTCP: 1. Reception statistics (SR and RR) should be sent as often as possible, therefore each periodically transmitted compound RTCP packet must include a report packet. 2. New receivers need to receive a CNAME for a source as soon as possible to identify the source and to begin associating media. A compound RTCP packet must also include the SDES CNAME. So RTCP packets must be sent in a compound packet of at least one SR and one RR, with one SDES packet. V RR[#site1, #site2] SR[#site1, #site2] SDES #CNAME phone #CNAME loc BYE##why Fig.3. RTCP Compound Packet Structure Control traffic is not self-limiting and if there are many participants in a session all transmitted control packets at the same time causing undue congestion on the network. This is prevented by dynamically adapting the RTCP transmission interval. It is recommended that the control traffic should be limited to one RTCP packet approximately every 5 seconds, although this interval may be scaled to suit network conditions. For example, if the network is highly loaded, then the interval could be scaled down. Sender and Receiver reports differ in that the Sender Reports include a 20-byte sender information section for use by active senders. 3.1 Extensions to the RTP and RTCP payload type format enabling Quality of Service The main objective is to adapt RTP to lower delay requirements for streaming applications by making RTP more reliable, in a sense emulating TCP through selective re-transmissions. In order to realise this the existing RTP/RTCP payload format must be modified slightly. The underlying transport protocol chosen is UDP/IP (user datagram protocol/internet protocol) which is extremely unreliable and is susceptible to severe packet loss when transmitting compressed MPEG video streams in congested networks. One simple solution is to use increased redundancy by sending multiple copies of data packets, however this adds an extra load on the network. Another solution using retransmission of all lost packets is unsuitable for real-time or near real-time streams, as retransmitting causes additional propagation delays and also increases the load on the network PA PT PR DTI SNHP Diff Time Stamp NULL padding Fig.4. RTP header extension for Selective Retransmission V P RXP PT=Rreq Length SSRC C1 SNHP1 C2 SNHP2 Fig.5. RTCP header extension for Selective Retransmission The fields of interest in Fig.4 of the RTP header extension are the priority bit, denoted PR, and the Sequence Number of RTP packet with High Priority, denoted SNHP. The Priority (PR) bit identifies the priority of the RTP packet, which will be assumed to be equal to the existing marker bit in the system
6 implemented. If set it indicates the presence of a data packet with a high priority. The Sequence Number of RTP packet with High Priority (SNHP) indicates the sequence number of the RTP packet with high priority. This number increases with each high priority packet sent. If the PR bit is not set, then this field indicates the sequence number of the last high priority packet sent, thus allowing a log to be kept of all the high priority packets sent in all packets sent, both of high and low priority. In the RTCP header extension, Fig.5, the field of interest is the retransmission protocol, denoted RXP, and the Control bits, denoted C1. RXP indicates the re-transmission protocol to be used. The Control bits indicate the number of packets lost and when used in conjunction with the SNHP fields identify which packets in the sequence were lost. In the case of MPEG-4 systems streams certain packets are more important than others so, instead of arbitrarily re-transmitting all lost packets, the approach is to selectively re-transmit lost packets that have a high importance to the coherence, decoding and playout of the MPEG-4 stream. The priority bit (PR) in the RTP extension is used to indicate a packet of high priority; however this can only indicate two levels of priority. Detection of a lost high priority packet is a straightforward matter of comparing the SNHP of the recently received packet with the client s log of high priority packets received. This will reduce the number of retransmissions requested. There is also a retransmission judgement whereby the client will estimate whether the retransmission request of the important packet will arrive in time for playout. There can also be multiple retransmission attempts based on the retransmission judgement. Requesting a retransmission should be avoided but, if it is necessary, the client should estimate the round-trip propagation delay between the sending the retransmission request and receiving the requested retransmitted packet. As mentioned earlier, the RTCP interval is set to approximately sending one RTCP packet every 5 seconds. This value is not fixed as, if all receivers in a multicast session were to transmit their control packets at the same time, this could cause flooding at the sever side. In the unicast scenario this is unsuitable as the request for the retransmission of lost packets could take up to 5 seconds and would be useless for real-time or near realtime applications as the retransmitted packet would arrive after the playout time. The RTCP interval should be dynamically adaptable to signal severe packet loss to the server as soon as possible, indicating that there are problems with the network. 3.2 Using RTP as a transport mechanism for MPEG-4 FlexMux stream MPEG-4 applications can involve a large number of ES s and thus a large number of RTP sessions. Allowing a selective bundling scheme or multiplexing of ES s may be necessary for certain MPEG-4 applications. MPEG-4 FlexMux streams can be synchronised with other RTP payloads. MPEG-4 FlexMux streams and other real-time data streams can be combined into a set of consolidated streams through the use of RTP mixers and translators. The delivery performance of the MPEG-4 stream can be monitored via the RTCP control channel. An MPEG-4 FlexMux stream is mapped directly to the RTP payload without any addition of extra header fields or the removal of any FlexMux packet header. Each RTP packet contains a sender clock reference timestamp that is used to synchronise the FlexMux clock. On the client side, the FlexDemultiplexor does not make use of the RTP timestamp. The purpose of the RTP timestamp is to determine the network jitter, and propagation delay between server and client. An RTP packet should begin with an integer number of FlexMux packets. 4. System Implementation The overall system is implemented using a client-server architecture, as shown in Figure 6. The server architecture consists of four main components: the source for MPEG-4 data; a translator; an MPEG/RTP interface; and the RTP/UDP interface. The main function of these is to take the MPEG-4 source data and package the information into UDP packets, which are then transmitted to the client over the Internet. The client architecture is the inverse of the server as it must extract the MPEG-4 data from the packets by removing the header appended by each layer in the server stack so that it can extract the stream file. Three programs have been written in C++ (the client, the server and the multiplexor) to implement the system and have been tested using a local loop on one PC. As there is no player available yet to decode the transmitted MPEG-4 file on the client side, playback of the transmitted video is not yet possible. To ensure the stream file is transmitted correctly to the client, the sequence numbers and timestamps of the transmitted RTP packets and the statistics conveyed through the RTCP packets are monitored on both the
7 client and server side. The statistics collected by the server about the client are: the fraction of packets lost; the cumulative number of packets lost; the highest sequence number of the last received packet; the jitter; the timestamp of the last reception report; the reception time of the last reception report; and the time of the reception report. The server uses these statistics to dynamically adapt its transmission to optimise the use of network resources and improve the quality of the received file for the client. The statistics collected on the client side about the server are the number of packets received and the cumulative number of bytes contained in the packets received. The statistics collected about the server are not as important as those collected by the server about the client; after all, it is the client who is in the better position to judge the QoS. The translator program is based on the bifs-encoder and multiplexer developed by CSELT  for offline encoding of the MPEG-4 source data. This translator program is slightly modified to produce the FlexMux packets with reduced SL-header i.e. duplicated fields are removed. The program reads the MPEG-4 source data from a text file and a script file and produces several files. These files are the encoded and packetised version of the MPEG-4 source data and are comprised of AVO (audiovisual object) files, the scene description files (BIFS and OD) and the IOD (initial object description) files. It creates the SL (Sync layer) packets and then removes the redundant data (i.e. fields duplicated in the RTP header). These SL-PDU s are then multiplexed using the FlexMux tool operating in Mux Code Mode. The output of the translator program is a muxrtp file. This muxrtp file is the input to the server program, which reads the data in the muxrtp file and creates an RTP packet for each muxrtp packet. Also developed is an interface program, which decodes the headers of the FlexMux packets. Normal system operation is as follows. The client requests an MPEG-4 stream file to be played from the server over a reliable TCP/IP connection. The server acknowledges the file request and spawns two duplex UDP connections between the client and the server for both the RTP channel and RTCP channel. FlexMux Translator program Muxer BIFS-ENCoder File. MUXRTP File. MP4 File. BIF File. OD File. LST Server MUXRTP file MPEG/RTP interface RTP/RTCP Client Muxrtp file MPEG/RTP interface RTP/RTCP Real Time Application Decoder/Player Multiplexor De-multiplexor MPEG-4 source File.TXT, File.SCR MPEG RTP UDP/IP Payload header header Fig.6. System overview RTCP RTCP UDP/IP Payload header header Within the server there are three main threads. The use of threads ensures that the functionality between the different processing elements of the program can work simultaneously and with nearindependence. The first thread creates and transmits the RTP packets until the end of the file is reached or until the client terminates the session. A mutex status variable is used to define a transmission profile; for example, one such profile sends just high priority packets. The RTP packets are then multiplexed and encapsulated into UDP packets and transmitted over the Internet to the client. The second thread loops periodically sending RTCP packets. The third thread loops until the end of the session, listening for RTCP packets from the client. When an RTCP packet is received from the client, the server extracts the relevant statistical information about the client. This information is used to modify the transmission profile to the client by way of locking the mutex status variable, changing its value if necessary, and releasing the lock. The new status value will then be imposed in the RTP transmission thread thus modifying the RTP transmission profile. On the receiving side there are three main threads in operation. One thread loops continuously until the end of the session or until the session is terminated, receiving the RTP packets, removing the UDP/IP header information to extract the RTP packet and subsequently the RTP header is removed and the
8 original muxrtp packet is extracted. These muxrtp packets are passed onto an MPEG-4 decoder that will play the MPEG-4 video stream. The second thread also loops until the end of the session or until the session is terminated receiving RTCP packets from the server. The third thread loops periodically sending RTCP packets to the server containing the clients reception statistics. 5. Future Development The current version of the system is capable of creating and transmitting the MPEG-4 stream file using a RTP/UDP/IP transport stack to a client. The next stage is to harness and exploit the characteristics of both the transport media and MPEG-4 so as to implement QoS parameters. The extensions to the RTP and RTCP packets have yet to be implemented with the intended purpose of implementing selective retransmission into the system . The extended RTP and RTCP packets are to be used to monitor that the client receives all essential packets i.e. the PR bit is set to one. Currently, the server assumes that the marker bit and the priority bits are equal. The server can transmit according to different transmission profiles as defined by the status variable, however it is unable to dynamically change the transmission profile dynamically within the session. Also, research must be done to identify what characterises and constitutes a change in transmission profile. For example, when should the server resort to prioritised transmission of high priority packets, or when should it adopt transmission redundancy to send high priority packets? Ideally, prioritised encoding transmission (PET) should be adopted when the MPEG-4 file is encoded in real-time; however, in the system implemented, encoding of the MPEG-4 file is offline. Should any high priority packets be lost, this will be detected by comparing the SNHP field in the RTP packet. The modification to the system will be to use the RTCP packet to indicate to the server to modify its transmission to suit the capacity of the network. However if there is serious packet loss on the network, for example, ten consecutive high priority packets are lost, there is no way of signaling this to the server immediately. The client will automatically wait until the next RTCP interval has timed out before it can notify the server of this situation. Then there is no guarantee that the RTCP packet sent from the client will reach the server. RTCP packets are unsuitable for this type of urgent reporting back to the server as they are designed only for statistics reporting and are sent periodically. Therefore a profile-specific RTCP packet of a more urgent nature needs to be defined, being sent only when needed. There are many aspects of RTP that have yet to be defined and finalised, and so for the moment a basic draft version of the RTP packet is being used. References  G. Muntean and L. Murphy, An Object-Oriented Prototype System for Feedback Controlled Multimedia Networking, submitted to ISSC 2000    The MPEG Home page  The MPEG-4 Forum Home page  RFC 1889: RTP: A transport protocol for Real Time Applications  RFC 1890: RTP profile for Audio and Video Conference with Minimal Control  Internet draft: draft-podolsky-avt-rtprx-00.txt A RTCP based Retransmission Protocol for Unicast RTP Streaming Multimedia