ADAPTATION MECHANISMS FOR STREAMING SERVER APPLICATIONS OPTIMIZED FOR THE USE ON MOBILE DEVICES WITH LIMITED RESOURCES

Similar documents
Classes of multimedia Applications

Digital Audio and Video Data

RTP / RTCP. Announcements. Today s Lecture. RTP Info RTP (RFC 3550) I. Final Exam study guide online. Signup for project demos

internet technologies and standards

point to point and point to multi point calls over IP

Encapsulating Voice in IP Packets

IP-Telephony Real-Time & Multimedia Protocols

Unit 23. RTP, VoIP. Shyam Parekh

Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

Multimedia Communications Voice over IP

Transport Layer Protocols

Streaming Stored Audio & Video

EDA095 Audio and Video Streaming

Advanced Networking Voice over IP: RTP/RTCP The transport layer

Final for ECE374 05/06/13 Solution!!

Question: 3 When using Application Intelligence, Server Time may be defined as.

Lecture 33. Streaming Media. Streaming Media. Real-Time. Streaming Stored Multimedia. Streaming Stored Multimedia

Protocols and Architecture. Protocol Architecture.

(Refer Slide Time: 01:46)

6. Streaming Architectures 7. Multimedia Content Production and Management 8. Commercial Streaming Systems: An Overview 9. Web Radio and Web TV

Voice over IP: RTP/RTCP The transport layer

An Introduction to VoIP Protocols

Transport and Network Layer

Mobile Communications Chapter 9: Mobile Transport Layer

IP Ports and Protocols used by H.323 Devices

White paper. Latency in live network video surveillance

TCP and Wireless Networks Classical Approaches Optimizations TCP for 2.5G/3G Systems. Lehrstuhl für Informatik 4 Kommunikation und verteilte Systeme

Dissertation Title: SOCKS5-based Firewall Support For UDP-based Application. Author: Fung, King Pong

Fragmented MPEG-4 Technology Overview

Session Initiation Protocol (SIP) The Emerging System in IP Telephony

technology standards and protocol for ip telephony solutions

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network

AT&T Connect Video Conferencing Functional and Architectural Overview. v9.5 October 2012

VIDEOCONFERENCING. Video class

Best Practices for Role Based Video Streams (RBVS) in SIP. IMTC SIP Parity Group. Version 33. July 13, 2011

Applications that Benefit from IPv6

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Glossary of Terms and Acronyms for Videoconferencing

District of Columbia Courts Attachment 1 Video Conference Bridge Infrastructure Equipment Performance Specification

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s)

Voice-Over-IP. Daniel Zappala. CS 460 Computer Networking Brigham Young University

Project Code: SPBX. Project Advisor : Aftab Alam. Project Team: Umair Ashraf (Team Lead) Imran Bashir Khadija Akram

Voice over IP. Demonstration 1: VoIP Protocols. Network Environment

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Web-Conferencing System SAViiMeeting

TECHNICAL CHALLENGES OF VoIP BYPASS

The OSI model has seven layers. The principles that were applied to arrive at the seven layers can be briefly summarized as follows:

VoIP network planning guide

Voice over IP (VoIP) Overview. Introduction. David Feiner ACN Introduction VoIP & QoS H.323 SIP Comparison of H.323 and SIP Examples

ADVANTAGES OF AV OVER IP. EMCORE Corporation

Introduction to VoIP. 陳 懷 恩 博 士 副 教 授 兼 所 長 國 立 宜 蘭 大 學 資 訊 工 程 研 究 所 TEL: # 255

RARP: Reverse Address Resolution Protocol

Native ATM Videoconferencing based on H.323

Objectives of Lecture. Network Architecture. Protocols. Contents

Central Management System (CMS) USER MANUAL

Wowza Media Systems provides all the pieces in the streaming puzzle, from capture to delivery, taking the complexity out of streaming live events.

CHAPTER. The Technology of Internet Protocol Networks

IP - The Internet Protocol

Scopia Desktop Server

Asynchronous Transfer Mode: ATM. ATM architecture. ATM: network or link layer? ATM Adaptation Layer (AAL)

IP Telephony v1.0 Scope and Sequence. Cisco Networking Academy Program

Introduction: Why do we need computer networks?

TCP for Wireless Networks

White paper. SIP An introduction

BlackBerry Enterprise Service 10. Secure Work Space for ios and Android Version: Security Note

Requirements of Voice in an IP Internetwork

Indepth Voice over IP and SIP Networking Course

SIP Trunking and Voice over IP

Scalable Video Streaming in Wireless Mesh Networks for Education

Network Simulation Traffic, Paths and Impairment

SERIES H: AUDIOVISUAL AND MULTIMEDIA SYSTEMS Infrastructure of audiovisual services Communication procedures

ReadyNAS Remote White Paper. NETGEAR May 2010

Quality of Service Management for Teleteaching Applications Using the MPEG-4/DMIF

Internet Working 15th lecture (last but one) Chair of Communication Systems Department of Applied Sciences University of Freiburg 2005

IPTV and Internet Television

QOS Requirements and Service Level Agreements. LECTURE 4 Lecturer: Associate Professor A.S. Eremenko

SeeTec ExpansionPackage

Internet Firewall CSIS Packet Filtering. Internet Firewall. Examples. Spring 2011 CSIS net15 1. Routers can implement packet filtering

Need for Signaling and Call Control

(Refer Slide Time: 4:45)

Computer Networks. Voice over IP (VoIP) Professor Richard Harris School of Engineering and Advanced Technology (SEAT)

Adaptive HTTP streaming and HTML5. 1 Introduction. 1.1 Netflix background. 1.2 The need for standards. W3C Web and TV Workshop, 8-9 February 2011

Audio and Video for the Internet

Design and implementation of IPv6 multicast based High-quality Videoconference Tool (HVCT) *

CSIS CSIS 3230 Spring Networking, its all about the apps! Apps on the Edge. Application Architectures. Pure P2P Architecture

PackeTV Mobile. solutions- inc.

Quality of Service for Streamed Multimedia over the Internet

Methods for Mitigating IP Network Packet Loss in Real Time Audio Streaming Applications

Per-Flow Queuing Allot's Approach to Bandwidth Management

Note! The problem set consists of two parts: Part I: The problem specifications pages Part II: The answer pages

UVOIP: CROSS-LAYER OPTIMIZATION OF BUFFER OPERATIONS FOR PROVIDING SECURE VOIP SERVICES ON CONSTRAINED EMBEDDED DEVICES

UPPER LAYER SWITCHING

Ethernet. Ethernet. Network Devices

A Transport Protocol for Multimedia Wireless Sensor Networks

IIS Media Services 3.0 Overview. Microsoft Corporation

Bandwidth Aggregation, Teaming and Bonding

7 Streaming Architectures

networks Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery WHITE PAPER

Improved Digital Media Delivery with Telestream HyperLaunch

ICOM : Computer Networks Chapter 6: The Transport Layer. By Dr Yi Qian Department of Electronic and Computer Engineering Fall 2006 UPRM

Transcription:

Faculty Computer Science Institute of Systems Architecture, Chair of Computer Networks Diploma Thesis ADAPTATION MECHANISMS FOR STREAMING SERVER APPLICATIONS OPTIMIZED FOR THE USE ON MOBILE DEVICES WITH LIMITED RESOURCES Submitted by: Rico Meyer matricula: 3333315 year of matriculation: 2006 e-mail: rico.meyer@mailbox.tu-dresden.de supervisor Technische Universität Dresden: Dr. Ing. Daniel Schuster supervisors Università Parthenope: Prof. Angelo Ciaramella, Prof. Antonino Staiano responsible university teacher: Prof. Dr. rer. nat. habil. Dr. h. c. Alexander Schill Submitted on 31th August 2013

CONTENTS 1 Introduction 5 2 Streaming - Basic Technologies 7 2.1 Definition Of Streaming............................. 7 2.2 Types Of Streaming............................... 8 2.2.1 On Demand Streaming......................... 8 2.2.2 Live Streaming.............................. 9 2.3 Components Needed For Streaming...................... 9 2.3.1 Transport Protocols............................ 10 2.3.1.1 Transmission Control Protocol................ 10 2.3.1.2 User Datagram Protocol.................... 11 2.3.1.3 Real-Time Transport Protocol................. 11 2.3.1.4 Comparison Of TCP And UDP/RTP.............. 14 2.3.2 Streaming Protocols........................... 15 2.3.2.1 Real Time Streaming Protocol................ 15 2.3.2.2 HyperText Transport Protocol................. 17 2.3.2.3 HTTP Live Streaming..................... 18 2.3.2.4 MPEG-Dynamic Adaptive Streaming over HTTP...... 20 2.4 summary..................................... 22 1

3 Adaption Mechanisms 23 3.1 Quality Of Experiences............................. 23 3.2 Adaptive bit rate streaming........................... 25 3.2.1 Evaluation on server side........................ 26 3.2.2 Evaluation on client side......................... 26 3.2.3 Proxy-based evaluation......................... 27 3.3 Hierarchical codecs................................ 27 3.3.1 Basics of hierarchical codecs...................... 27 3.3.2 Scalable codecs............................. 28 3.4 Signalling..................................... 29 3.4.1 Single-Layer Approach.......................... 29 3.4.2 Cross-Layer Approach.......................... 30 3.5 Cognitive Radio.................................. 31 3.5.1 Definition Of Cognitive Radio...................... 31 3.5.2 Cognition Tasks And Cognitive Circle.................. 32 3.6 Summary..................................... 34 4 The Conception Of A Mobile Streaming Server 35 4.1 Use Cases For An Mobile Streaming Server.................. 35 4.1.1 Scenario.................................. 35 4.1.1.1 Scenario 1: The Journalist.................. 36 4.1.1.2 Scenario 2: Saving Media On Server............. 37 4.1.1.3 Scenario 3: Conferences From Underdeveloped Regions. 37 4.1.2 Basic Requirements........................... 37 4.2 Problems To Solve In Mobile Context...................... 38 4.2.1 Multiple Network Interfaces....................... 38 4.2.2 Private Networks Using NAT-Routers and Firewalls.......... 40

4.2.3 Dynamic Internet-Protocol Addresses................. 40 4.3 Infrastructure................................... 41 4.3.1 The Mobile Device............................ 42 4.3.2 The Proxy Server............................. 46 4.3.3 The Client................................. 47 4.4 Communication And Protocol Stack...................... 48 4.4.1 Communication Between Mobile Device And Proxy Server..... 48 4.4.2 Communication Between Proxy Server And Clients......... 49 4.5 Session Control Protocol............................. 50 4.6 Adaptation.................................... 56 5 A Prototype Application For Live Streaming From A Mobile Device 57 5.1 Related Work And Used Libraries........................ 57 5.1.1 SpyDroid................................. 57 5.1.2 Apache Commons Libraries....................... 58 5.2 Scenario Setup.................................. 58 5.3 Selection and Monitoring Of Networks..................... 59 5.3.1 Check For Usable Network Devices.................. 59 5.3.2 Selecting The Active Network Interface................ 60 5.3.3 Monitoring Of Network Connections And Handovers......... 62 5.4 Session Management.............................. 63 5.5 Capturing Video Data On The Mobile Device................. 64 5.5.1 Isolating Video Data........................... 65 5.5.2 Video Quality Profiles And Adaptation................. 66 5.5.2.1 Quality Profiles......................... 67 5.5.2.2 Quality Adaptation....................... 67 5.5.2.3 Signalling Changes Of Video Quality............. 69 5.6 The Streaming Service.............................. 71 5.7 The Storage Service............................... 73

6 Statistical Results 75 6.1 Test Configuration................................ 75 6.2 Measured Data And Measure Methods.................... 76 6.2.1 Adaptation Time............................. 76 6.2.2 Handover Time.............................. 77 6.2.3 Packets Per Frame And Frame Size................... 78 6.2.4 Packet Loss And Loss Rate....................... 79 6.2.5 Relations Between Loss, Adaptation And Frame Size......... 80 6.2.6 Difference Between Adaptation With and Without Resolution Changes 81 7 Conclusion And Future Works 83 7.1 Future Works................................... 83 7.1.1 Simultaneous Usage Of All Network Devices............. 83 7.1.2 Handling Adaptation Of Frame Resolution............... 84 7.2 Conclusion.................................... 85 Appendix i A Erklärung / Confirmation i B List of Figures iii C List of Tables vii D Bibliography ix 4

1 INTRODUCTION More and more people are using tablet computers, smartphones or both kind of devices to navigate on the Internet. Checking e-mails, reading books, chatting with friends and colleagues nowadays is possible in any place at any time. In addition to this there is an increasing number of offers for online storage and Software as a Service, such like Google Documents 1, icloud 2 or Microsoft Office Web Apps 3. Watching videos, receiving video streams or participating at a video conference using a mobile device gets more and more easy and common as the availability of mobile networks for fast data transfers increases. In general, streaming video became more popular and causes more than the half of Internet traffic (see [AKH11]). Mobile networks for fast data transmission include technologies like Wireless Local Area Network (WLAN), Universal Mobile Telecommunications System (UMTS), Long Term Evolution (LTE) etc. Receiving a video stream using a mobile device as a client does not require much effort. The consumption of power is acceptable and the central processing unit (CPU) does not get overloaded with operations. During the last fifteen years in research and development, many protocols and mechanisms for streaming to mobile clients were implemented, protocols like the Real-Time Transport Protocol (RTP) and the Real-Time Transport Control Protocol (RTCP) or mechanisms like the adaptation of the bit rate according to the quality of the network connection. Modern mobile devices are getting more and more powerful, some of them using multiple processing units or having an additional graphical processing unit (GPU) for encoding and decoding video data while processing units consume less power. The average period for one battery cycle is eight hours of talking on the phone. 4 Further on, there is a tendency towards synchronization of different computers in one household or company. This includes the mobile devices, too. Users want to save videos, captured with a mobile device, on the hard disc of their desktop PC at home or on platforms like YouTube. Business people use mobile devices to dictate the latest reports and, afterwards, send them to the secretary s workstation. Journalists use the built-in cameras of their devices to record sceneries and transmit them to the broadcasting stations. Deriving from those scenarios, the need for streaming applications, which 1 https://docs.google.com/ 2 https://www.icloud.com/ 3 http://office.microsoft.com/it-it/web-apps/ 4 average talking time by comparing smartphones on market in March 2013 according to techcrunch.com 5

are able to handle all problems occurring in such mobile contexts, rises. Within this thesis, the author gives an overview of the already existing technologies for streaming and adaptation mechanisms. Taking this as a basis, the opportunities for implementing these mechanisms on a mobile device, which will be used as a streaming content source and streaming server, will be discussed. Therefore it is necessary to evaluate audio and video encoding, connectivity quality and the efficiency of the mechanisms. The main aspect here has to be their advantages and disadvantages according to the mobile context. In the end, the reduction of costs is one of the most significant parameters to evaluate as the resources of a mobile device are limited. In order to verify all the results, one part of this thesis is the implementation of a clientserver application, which provides the server on a mobile device. The client could be any streaming application for receiving audio and video streams from the Internet. It does not matter, whether the client is running on a mobile device or a classical desktop PC. To reach that target, the author shows, step by step, all needed technologies and mechanisms for the implementation of such an application. Starting with the normal streaming applications, the reader is then introduced to the special aspects of mobile networks and the problems, which have to solved. As a last point today s adaptation mechanisms are described, before the author presents his concept and the implementation of the server application. 6 Chapter 1 Introduction

2 STREAMING - BASIC TECHNOLOGIES This chapter will introduce the reader to the basic technologies and protocols, which are needed in order to stream media over an IP-based network. It will help the reader to understand the mechanisms of the implemented prototype and the decisions for the used protocols. At the beginning a definition for streaming will be given, followed by the different transport protocols. This includes the protocols advantages and disadvantages in the context of streaming from a mobile device in a wireless network environment. At the end of this chapter the most common streaming protocols will be presented. A decision for a certain technology will not be included in this part. For that, the reader is referred to chapter 4. 2.1 DEFINITION OF STREAMING Streaming is a way to provide data on the Internet. The information is packed into data packets and sent to the client. If the received information is presented on the client side while the transmission is still in progress, it is named streaming. The information is often called media. There are different types of media such as audio and video. But also text information is media. Streamed media is called streaming-media. It is possible to combine different types of media. Therefore the word multimedia is used to describe this mix of different types. The media can be categorized in two groups: discrete and continuous. Discrete media could be a text or a simple JPEG picture. In case of streaming discrete media is not a point of interest. Continuous media contains audio and video data providing a continuous stream of information, usually in real time. To specify the used media for streaming, it is common to add the media type. In case of audio data, this would be audio streaming. Analogous to this, it would be video streaming, if the data contains video information. 7

When talking about streaming, people usually think of audio and video streaming. But the basics of streaming are valid for all types. It does not matter, whether text for a liveticker is streamed, pictures or audio. Discussing about streaming always includes the fact that data is transmitted on the Internet. The transport layer protocol for the Internet is the Internet Protocol (IP). Therefore the main unit for transmitting data in one time unit is one IP packet. A stream received by a client is going to be presented instantly after receiving and buffering the data. In case of audio or video it is necessary to process the data in real time. This aspect is important later for choosing the appropriate encoding and the minimal quality requirements to be fulfilled by the network connection. For synchronization of different media types, it is necessary to add a timestamp to each packet. Received data will not be stored completely on the clients hard disk. At most data will be buffered to avoid jitter effects. Most of the streaming protocols allow the user to control the stream while receiving it. It is possible to pause or stop the stream and, if it is not a live stream, it is allowed to go forward or rewind to any point of the stream in dimension of time. Some streaming protocols will be discussed later in this chapter. The main preferences of streaming can be summarized as follows: consists of a continuous stream of IP packets received data will be presented instantly data won t be downloaded completely, local buffering is possible controlling the stream is possible 2.2 TYPES OF STREAMING In order to stream data to the user, there are two different possibilities for providing the media: streaming on demand and live streaming. 2.2.1 On Demand Streaming Streaming on demand is usually used by video hosting platforms like YouTube or websites of television broadcasting stations, which provide an online rerun of their program or additional information, such as documentations or podcasts. The media are already registered and present on the server. It is common to offer the media in different qualities and different encodings. Providing different qualities of one video allows the server to adapt the video quality to the conditions of the network connection. The media can be requested at any time and at any point in time dimension. An on demand stream allows all common control commands. It is possible to start, pause or stop the stream, forward or rewind it. On demand streaming will not be argument of this thesis. The interest is focused on the second type of streaming. 8 Chapter 2 Streaming - Basic Technologies

2.2.2 Live Streaming Live streaming is the second way to provide media streams. Based on the fact that data is transmitted instantly after recording, the control options are limited within this mode. As first restriction, given by physical laws, it is not possible to forward to a future point of the stream. Some providers allow to pause the stream or to go back to a point in history. This function is called time shifting, but it is not often used in combination with live streams. The time shifting function is usable in combination with live streaming over HTTP, if the server saves the older segments for longer periods in the physical memory and keeps the references of these segments in the index file (see [LOH11] for further information). The more common method, used when resuming a live stream, is to wait for the next packet received from the server and to discard all previous packets. Going back to a past point in time dimension is not possible with this method. In fact, it is only allowed to start the stream at the most recently provided position or to stop it. Providing a live stream is more cost intensive than streaming on demand, because the data has to be encoded in real time. This results in the need for powerful processing units as well as efficient and fast encoding and compression algorithms. Offering adaptation mechanisms for changing the quality of the streamed data requires even more resources. There are two major application scenarios used in every day live: web radio web TV A huge amount of radio stations transmit their program, in addition to the classical broadcast, on the Internet. Not only audio data is streamed, but also additional information like song title, artist and other information. Television stations offer some parts of their program as live stream on the Internet. Receiving the whole program of one television channel via live streaming is also a solution, that is becoming more popular. So called IP television offers exists for some years now and are available even for mobile devices. 2.3 COMPONENTS NEEDED FOR STREAMING In order to provide a multimedia stream, some basic components are required on server and client side. Figure 2.1 illustrates, in a schematic way, all components by using the example of web radio, i.e. audio streaming. Components for other types of streaming do not differ from this scheme significantly. The stream provider captures the stream data from a sensor source, in the example case this is a microphone. The use of more than one sensor source, for example a video camera and a microphone, is possible, too. In case of on demand streaming, it would be a video file from the file system. The data from the sensor sources is passed to the encoding unit. The encoding unit encodes the raw audio data from the microphone into the target audio format. 2.3 Components Needed For Streaming 9

Figure 2.1: required components for streaming The network unit organizes the transmission of data and control streams between client and server. Special streaming protocols are used for this aim. These protocols will be discussed in section 2.3.1. It is possible to use a combination of protocols instead of one single protocol. An example for a combination of protocols is the Real-Time Transport Protocol (RTP). For control messages the Real-Time Transport Control Protocol (RTCP) is used in addition. The network unit packs the data into IP packets and sends it to the clients. At least on client side the network unit provides a buffer. It is necessary to avoid errors caused by jitter. Jitter is the derivation of the delay of packets occurring while transmitting the data. The single packets can be transmitted on different routes with different transmission times. On server side a buffer is not obligatory. One could be implemented to secure a constant transmission. After the data is received by the client s network unit, it is passed to the decoding unit. The decoding unit decodes the audio data in real time and passes it to the output unit, in this example to the speakers. The quality of the stream is defined by the weakest element of this chain. 2.3.1 Transport Protocols Before describing the different streaming protocols themselves, this section recalls the basic knowledge about the protocols of the transport layer, namely the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). 2.3.1.1 Transmission Control Protocol TCP is a connection orientated protocol, which guarantees full duplex connections with checked and unmodified sequence order. For streaming video or audio streams a simplex connection is sufficient as data is sent in one direction only. Control connections instead need a duplex connection. To secure the packet order and avoid packet loss, every packet gets acknowledged by the receiving part. In case of packet loss a retransmission of the missing packet will be requested. TCP keeps the packet order during transmission using sequence numbers. As a mechanism for flow control, TCP offers the slow-start algorithm to avoid overload situations on the network. After the detection of a congestion the data rate will be reduced and the transmission will be relaunched slowly like it is shown in figure 2.2. 10 Chapter 2 Streaming - Basic Technologies

Because of these mechanisms, in the early development of streaming applications and protocols, TCP did not fulfil to the real time criteria in transmission and it was not common to use TCP as protocol on the transport layer. Nowadays, recent researches proved the possibility to use TCP without having any problems in quality issues. [LOH11] showed in their work, that using TCP/HTTP can reduce the overhead below the level of UDP/RTP if the segment duration is chosen in an optimized way. Figure 2.2: slow-start algorithm of TCP taken from [BOC13] 2.3.1.2 User Datagram Protocol UDP is a simple extension to the Internet Protocol. It provides the multiplexing of datagrams between different computers within a network. In addition to the multiplexing, UDP offers mechanisms to generate and evaluate checksums. UDP does not acknowledge any packet and does not report transmission errors. If other control mechanisms like sequence order checking are needed, they have to be implemented by other protocols on higher layers. RTP is an example for such an additional protocol (see section 2.3.1.3 for more details). While using UDP, it is not possible to control the data flow in case of occurring congestions. Because of that, UPD provides full data rate during the complete period of transmission. Therefore it was often used by multimedia applications for transmitting the data stream. Costs for ignoring or extrapolating missing packets usually are lower than requesting a retransmission (compare to [RIE03]). 2.3.1.3 Real-Time Transport Protocol In combination with UDP the Real-Time Transport Protocol is often used as streaming transport protocol. It was designed for the use in multimedia applications, like video conference systems, with the need for real time elaboration of data. RTP is defined close to the transport layer. Using UDP as transport protocol guarantees high data rate from the beginning of the transmission, because UDP does not implement any data flow control, like the slow-start algorithm in TCP. RTP implements additional features to UDP and enables the control of sequence order, 2.3 Components Needed For Streaming 11

synchronization of data and the possibility for simple adaptation mechanisms. RTP Packet The RTP packet consists of two parts: the header and the payload. The header provides additional information about the data in the payload and the packet itself. Figure 2.3 shows the scheme of an RTP header. Figure 2.3: RTP header The first nine bits contain general information, two bits for the protocol version, one bit for padding flag, one bit extension flag, four bits content source count and one marker bit. As these first bits are not the point of interest at this work, the author recommends to read [RFC03] for more details. The next 7 bits provide information of the media type in the payload. When sending a stream this value can change if the format of the data in the payload has changed in reaction to lower or higher bandwidth of the connection used for transmission. The next two bytes are assigned to the sequence number. The sequence number allows multimedia applications to check and rebuild the order of arriving packets, as offered by TCP. In order to reorder packets, they have to be buffered. Packets arriving highly delayed can easily be discarded, instead of being presented to the user and causing errors in the replay. The timestamp allows, in addition, to set the current packet into relation to the entire stream and so to assure real time elaboration of the data contained within the packet payload. In case of separated streams, it should be assured, that the timestamp is synchronized to the same clock. The synchronization source identifier assigns each stream a quasi unique ID. The range of numbers for the SSRC is sized in a way, that the probability of assigning the same ID twice at the same time to two different streams is close to zero ([RIE03]). Using the SSRC enables the client to synchronize two separate streams, for instance separated streams for audio and video data of a movie. The synchronization source identifier is assigned at the beginning of the session by the server. In case of assigning one SSRC twice the server just assigns another identifier. The content source identifier is used for aggregated data streams. It is possible to aggregate up to 15 different streams in one packet. The number of different streams is specified in content source count. For more details according aggregation see [STE00] or [RFC03]. Real-Time Transport Control Protocol As the Real-Time Transport Protocol uses UDP there is no possibility to get statistical information about jitter or packet loss without the help of additional protocols. Therefore the Real-Time Transport Control Protocol (RTCP) was introduced together with RTP in [RFC03]. It allows the exchange of such information between all participants of an RTP session. The RTCP implements five different packet types: 12 Chapter 2 Streaming - Basic Technologies

sender report The sender report (SR) contains transmission and reception informations from actively sending participants of an RTP session. receiver report The receiver report (RR) contains reception statistics from participants which are not active senders and is used in combination with the sender report for active senders reporting on more than 31 sources. source description bye app The source description packet (SDES) provides information about a given SSRC identifier, including the CNAME. The CNAME is a persistent transportlevel identifier for an RTP source. The bye packet indicates the end of a client s participation. The app packet is reserved for application specific functions. The statistical information for the sender and receiver reports can be provided by the transmitting or receiving application or by third party monitoring applications. As sender and receiver reports are sent periodically, it is important to define the interval period between the last and the next report in a way that is not limiting the bandwidth for the data packets. In [RFC03], it is recommended to use five percent of the available bandwidth for the control packets. Further on, it is recommended to have 25 percent of the control bandwidth just for actively sending participants. Figure 2.4 illustrates the interaction and influences between RTP and RTCP. Figure 2.4: interaction and influences between RTP and RTCP 2.3 Components Needed For Streaming 13

2.3.1.4 Comparison Of TCP And UDP/RTP For a fast transmission with a low delay most applications commonly used to implement the User Datagram Protocol as transport protocol. UDP does not have a "slow-start" algorithm like the Transmission Control Protocol, but offers full data rate from the beginning (compare to [STE00]). Recent research projects have shown, that also the usage of TCP does not reduce the quality of experience while streaming the media. In [AKH11] and [LOH11] the results indicate, that there are no problems caused by the TCP flow control mechanisms. [LOH11] is also comparing the theoretical minimal header overhead between UDP/RTP and TCP/HTTP. They assumed a video with 30 frames per second (8 kbytes per frame) and 24 khz audio sampling rate. Table 2.1 illustrates their results. Mainly the MPEG-DASH implementation from [LOH11] and a simple UDP/RTP application setup were compared. During a live stream the UDP/RTP solution caused a constant overhead for header information of 12 kbps. The MPEG-DASH solution using TCP/HTTP caused different overhead sizes, which depend on the segment size. The scenario included durations of one, two and ten seconds. Segments of one or two seconds of encoded video information caused 31 kbps respectively 17 kbps, values higher than the one of the UDP/RTP solution. By using a segment size of 10 seconds, it is possible to reduce the header overhead to 6 kbps, the half of the UDP/RTP implementation. Packetization HTTP segment duration = 1s HTTP segment duration = 2s HTTP segment duration = 10s RTP Total overhead 31 kbps 17 kbps 6 kbps 12 kbps Table 2.1: overhead while streaming from [LOH11] The comparison further included the delay time of the two approaches, compared to the normal television broadcast. Using RTP, the total delay to the television broadcast was only 20-30 milliseconds. The TCP caused an additional delay of almost three seconds. According to [AKH11] and [LOH11] the best results are retrieved in TCP using segment periods of 10 seconds and securing a minimal TCP throughput of 2 times the bit rate of the transmitted stream. The major part of researches related to this topic conclude that TCP nowadays is the better solution for streaming. Thanks to new protocol specifications it is possible to increase efficiency and reduce the overhead. Furthermore, most solutions based on TCP are using HTTP, which enables the usage of the HTTP cache features for on demand streaming. The leads to a reduction of the load on servers and infrastructure. In addition, the well known port for HTTP, port 80, is opened for traffic on most firewalls and network address translation routers (NAT-Routers) while RTP ports are usually closed for security reasons. In addition RTP does not have any well know ports assigned, because, by default, ports will be allocated dynamically during the initialization of the RTP connection. 14 Chapter 2 Streaming - Basic Technologies

2.3.2 Streaming Protocols After a short discussion about the basic components needed for streaming and about the protocols of the transport layer this section will now present the basics of some selected streaming protocols. There are many more used by applications, but these are the most common ones. When using UDP as transport protocol it is common to separate the transport part and the control part of a protocol. Control protocols offer the possibility to control the stream, start, pause, stop it and go forward or backward in time. Transport protocols try to provide a stable transmission of data in order to offer a fluent playback of the media. Transport protocols usually work close to the transport layer. 2.3.2.1 Real Time Streaming Protocol The Real Time Streaming Protocol (RTSP) is a typical control protocol for streaming applications. The first version was published in 1998. The current version is defined in RFC 2326 ([RFC98]). Its aim is to offer a free and independent streaming control protocol. Therefore the RTSP does not require a special transport protocol. Messages can be transmitted on a TCP or UDP connection. As well-known-port for RTSP on server side port number 554 was assigned for both transport protocols. As RTSP is a pure streaming control protocol, additional streaming transport protocols are needed. In the RFC 2326 there is no protocol specified. In practice it is common sense to use the RTP. The RTP was described in section 2.3.1.3. Figure 2.5: basic structure of a RTSP message The structure of RTSP is similar to HTTP 1.1. It is a plain text message protocol, which offers similar operations like HTTP. The status codes were adapted from HTTP and RTSP uses a client-server architecture. This means, messages can be divided in request and response. Figures 2.5 and 2.6 illustrate the structure of an RTSP message. General header and entity header do not differ between requests and responses. They contain information like date or caching configuration. The request header contains information like user agent (used media player), accepted encoding, language, authorisation (username, password, etc.) and certain parameters of the messages (see [RFC98] for more details). 2.3 Components Needed For Streaming 15

SETUP rtsp://192.168.1.130/stream/12345678.wav RTSP/1.0 CSeq: 1 Session: 12345678 Transport: RTP/AVP/UDP;unicast;client_port=2080 Figure 2.6: example RTSP request RTSP/1.0 200 OK CSeq: 2 Session: 12345678 Transport: RTP/AVP/UDP;unicast;client_port=2080; server_port=5001 Figure 2.7: example RTSP response One of the main differences is the use of other streaming transport protocols. HTTP sends the stream data included in the HTTP response to the client (see section 2.3.2.2 for more details). Because of using external streaming transport protocols, it is possible to control several streams in RTSP, all time-synchronized. Therefore it is a so called outof-band -protocol. The out-of-band -structure requires an additional streaming server apart from a web server to manage the streams. Both server application can be executed on the same computer. To control the stream media the Real Time Streaming Protocol defines eleven different methods for controlling the stream. The most important methods are: SETUP PLAY DESCRIBE PAUSE TEARDOWN For detailed description of the single methods the reader is referred to [RFC98]. These methods are used in eight different phases. Figure 2.8 illustrates the course of a typical session. For initializing an RTSP session, the client will first establish a connection to the web server and request the meta file including the address to the stream media. The phase is not part of the RTSP session and is not an obligatory step. If the URI for the stream is known by the user, it is possible to connect directly to the streaming server using a media player application. As described above, RTSP is an out-of-band protocol. At the beginning of a session the client establishes a control connection to the server. This connection will be kept alive for the entire session. During this phase the client receives all necessary information form the server, like additional audio tracks for a movie or subtitle files. 16 Chapter 2 Streaming - Basic Technologies

Figure 2.8: phases of a RTSP session After receiving all information regarding the stream, the client can now choose the stream configuration it wants to use. It could be the choice of the video quality, the language of audio and/or subtitles and so on. Within the next phase a connection for the data transmission will be established. All necessary negotiations required by the used streaming transport protocol will be done in this phase. During the transmission of the stream, the user can influence the replay of the stream in different ways by using the media control. For example the user can interrupt the stream (PAUSE) or go forward/backward in time dimension. But not only the client or user could influence the representation of the stream, but also the server. Using the transport control, the server collects statistical information about the transmission, such as jitter or delay and evaluates these data in order to adapt the stream to new situations. One option could be to reduce the video resolution, if the data rate of the connection decreases. The transport connection will be closed by the server, when the end of the stream has been reached or by the client, if the stream was stopped. A successful closing of the transport connection will invoke the closing of the control connection as the last phase of an RTSP session. 2.3.2.2 HyperText Transport Protocol The HyperText Transport Protocol (HTTP) gets more and more applied in streaming scenarios. Technologies like HTTP Live Streaming (HLS) and MPEG-DASH, which use the adaptive bit rate streaming, are the main reasons for the success. In addition to this, most firewalls and NAT-Routers, by default, allow the HTTP data stream. The obligatory maintenance for UDP/RTP streaming is not necessary any more (see [LAZ12]). 2.3 Components Needed For Streaming 17

Basic HTTP Streaming Streaming multimedia files by using pure HTTP does not support live streaming. If a web browser requests data stored on a web server, the web server responds in a similar way as done by RTSP. The requested data is packed into the respond message packet and sent as a stream to the client. Figure 2.10 illustrates the process. Figure 2.9: streaming over HTTP Within this procedure there are three big disadvantages (see [RIE03]): The HTTP is based on a connection using TCP. This implicates a delay at the beginning of the transmission caused by the slow-start-algorithm. All data is sent as a stream to the client, but it will not be represented or passed to a media player before the data was received completely. By default, the HTTP does not define any messages or requests for controlling the stream. Because of that, it is not possible to go forward or backward. As the HTTP is inseparably connected to the TCP, the first point is not resolvable. For the second point it is possible to reduce the negative effects for streaming. To avoid the complete download of the data before representing it, a uniform resource locator (URL) is given within a meta file, which will be sent to the client instead of the data. The client s web browser passes the URL to the media player, which opens the resource remotely and therefore can present it instantly. Missing control options cannot be resolved by using the pure HTTP. The usage of additional protocols is the only possibility to implement control methods. 2.3.2.3 HTTP Live Streaming HTTP Live Streaming is a technology developed by Apple for the operating system ios. The basic idea is to split the media file into small fragments with a length of a few seconds. The most common periods are two (used by the Microsoft Smooth Streaming) and ten seconds. A period of ten seconds offers a good equilibrium between header overhead and delay (see [AKH11, LOH11]). 18 Chapter 2 Streaming - Basic Technologies

Figure 2.10: architecture of a HLS system, from [APP11] The HTTP Live Streaming, until now, has not passed the state of an Internet draft. That means the protocol is not an official standard. Encoding The HLS protocol allows streaming on demand and live streaming of video and audio. Every media segment is put into a MPEG2 Transport Stream ([ISO00]). In the implementations developed by Apple the following encoders are used (according to [APP11]): video: H.264 Baseline Level 3.0, Baseline Level 3.1, and Main Level 3.1 audio: HE-AAC or AAC-LC up to 48 khz, stereo audio MP3 (MPEG-1 Audio Layer 3) 8 khz to 48 khz, stereo audio Officially, Apple does not limit the usable encoders for encoding the data in the protocol specification ([PAN12]). Index File All available segments for one stream are listed in order in an index file, an extended M3U playlist file. The HLS-protocol therefore defines some additional tags (see [PAN12] for more details). The index file provide information about the URL, the duration and quality of a segment. For live content the index file defines a tag for reload intervals. If set the client reloads the index file for the stream after the given interval is expired. To offer different qualities of the requested stream content the HLS protocol allows to specify the URLs of alternate index files instead of media segments. The main index file describes each alternate index file in aspect of resolution and bit rate. The alternate index files list the segments of the stream in the given quality. 2.3 Components Needed For Streaming 19

Security The HLS protocol enables the user to encrypt the transmitted stream data. It therefore offers the AES-128 encryption using 16-octet keys. The media stream segmenter, available from Apple, supports three modes for configuring encryption: 1. Specify a path to an existing key file on the hard disk. The URL of the existing key file will be listed in the index file. All media files are encrypted using this key. 2. The segmenter generates a random key file, saves it in a specified location instead of using an existing one. 3. The segmenter generates a new random key file every n media segments. Each group of n files is encrypted using a different key. Further, the HLS protocol allows the client to request the index file and the single segment files via TLS/SSL secured connections using HTTPS. 2.3.2.4 MPEG-Dynamic Adaptive Streaming over HTTP MPEG-Dynamic Adaptive Streaming over HTTP (MPEG-DASH) is a protocol similar to HTTP Live Streaming by Apple. It is the first approach for adaptive bit rate streaming over HTTP, that became international standard. It was developed by the Moving Picture Experts Group (MPEG) and became standard in November 2011, published April 2012 in [ISO12]. Figure 2.11: MPEG-DASH architecture, from [SOD11] The basic principle is the same, like it is used by HTTP Live Streaming. The streaming media is divided into small segments of two to ten seconds. Every segment is present in different qualities. Encoding Like the HTTP Live Streaming draft MPEG-DASH does not define a certain encoding for the media stream. It works well with H.264, WebM and other codecs. In the [ISO12] two guidelines for the container formats are defined: 20 Chapter 2 Streaming - Basic Technologies

ISO base media file format [ISO08] and MPEG2 Transport Stream. Other formats are not excluded by the standard, but support is not guaranteed. Media Presentation Description The Media Presentation Description (MPD) file lists the different available stream representations. The MPD file is an XML file with a hierarchical structure shown in figure 2.12. Figure 2.12: structure of an MPD file, from [SOD11] Each MPD file defines one or more periods. Within a period one or more adaptation sets are declared. An adaptation set is a combination of different media. For instance one adaptation set contains the German audio track of a video, another set the Italian one. The adaptation sets consist of one or more representations. Within a representation the different qualities of the media are defined. A representation lists the single segments of the media stream in temporal sequence. The MPD is parsed by the MPD-parser on client side and the adequate representation is chosen. This mode is more complex than offering a simple playlist file, but it provides all information in one single file, instead of a separate playlist file for each representation. Security The MPEG-DASH standard allows the usage of digital right management (DRM) and the common encryption standard, defined in [ISO11]. Each adaptation set can use one or multiple content-protection descriptors to describe the supported DRM scheme, as long as the client recognizes at least one of them. The common encryption standard defines a encryption scheme of media content, which is declared in the MPD file. Using this standard, the content can be encrypted once and streamed to clients supporting different DRM license systems. Each client gets the decryption keys and other required information using its particular supported DRM system and requests the commonly encrypted content from the same server. 2.3 Components Needed For Streaming 21

2.4 SUMMARY Within this chapter the basics of streaming were presented. Starting with a definition of streaming and the two different types, the chapter continues with the different available protocols. Starting with layer four, the reader was introduced to TCP as well as UDP in combination with RTP as a kind of extension in functionality and efficiency. The main mechanisms of the protocols were described, their advantages and disadvantages were named and compared. Finally the author got to the conclusion, that TCP is nowadays used more often than UDP/RTP as a result of its combination with the HTTP, which does not cause as many conflicts with firewalls and NAT-Routers as RTP does. As a result the configuration time gets reduced and less by-pass solutions for firewalls have to be implemented. After the basics of layer four, this chapter sets the focus on the different streaming protocols. The first of the described protocols was the Real-Time Streaming Protocol, which is a pure control protocol and is used in combination with RTP/RTCP for data transport. It is similar to HTTP in structure of messages as well as in the status codes. To be able to compare it to HTTP, the next section introduced the reader to the basics of the HyperText Transport Protocol. Concluding with all the problems occurring using pure HTTP for streaming media, different extension protocols like HLS and MPEG-DASH were presented as possible solutions of the named disadvantages. The main problem occurring while using pure HTTP for streaming is the full download of the media file, before representing it in a media player. The extension protocols solved this problem by splitting the media into small segments of one up to ten seconds length and list the URL to these segments in an index or playlist file. The most efficient period of one segment was found to be ten seconds. 22 Chapter 2 Streaming - Basic Technologies

3 ADAPTION MECHANISMS Mobile devices are mostly connected to a network using wireless connections. A wireless link is more dedicated to errors, that occur while transmitting data. Those error are caused by changes of the available bandwidth for transfer of data or by a complete loss of connection. To react to changing bandwidth, adaptation mechanisms are implemented by most streaming solutions. In this chapter the main adaptation mechanisms will be presented. The first two mechanisms adapt parameters of the data to be transmitted. The third mechanism describes a method to change the network itself. Before starting with the adaptation mechanisms themselves, this chapter introduces the reader to the concept of Quality of Experiences. 3.1 QUALITY OF EXPERIENCES One of the intentions of this work is to provide the highest possible quality of the video according to the available resources like battery, network, encoding etc. But quality can be estimated from different points of view. It is possible to evaluate quality of media in an objective way, based on technical parameters or in a subjective way, based on the users experience while consuming the media. For the first way literature refers to as quality of services (QoS). The second one is referred to as quality of experiences. In order to achieve high quality of the media, the quality of experiences has to be taken into account, too. Therefore, it will be shortly discussed in the next section and the minimal parameters will be defined. Before this, a definition for quality of experiences has to be made. The ITU-T defines quality of services as follows (see [ITU07]): The overall acceptability of an application or service, as perceived subjectively by the end-user. Notes: 1. Quality of Experience includes the complete end-to-end system effects (client, terminal, network, services infrastructure, etc). 23

2. Overall acceptability may be influenced by user expectations and context. The measurement of the quality of experiences is a non-trivial subject and still an object of ongoing research. Several methods were developed in the last years and set up into ITU-T recommendations like ITU-T P.800 [ITU96] for audio or the ITU-T P.910 [ITU08] for video data. The target of recent research nowadays is to figure out the relation between network parameters and the resulting video quality on one side and the subjectively perceived video quality on the other side. Figure 3.1: results of the survey made by Lee, taken from [LEE10], for the scene DucksTake- Off; The values below are the following (in order of appearance: bit rate, spatial resolution, frame rate, pixel bit rate) In the year 2010 the group of Jong-Seok Lee presented their studies about how people evaluate the quality of a video, if one of the three dimensions in scalable video coding changes, in [LEE10]. They used the scalable video coding of three different scenes with different information rate in spatial and temporal dimension and showed two different qualities of the same scene simultaneously to a group of persons. The persons task was to decide for the video with the better quality. As figure 3.1 shows, the better spatial resolution is not always providing the better quality of experience. Further on, the quality of experience depends not only on the three 24 Chapter 3 Adaption Mechanisms

dimensions, but also on the bit rate and the content shown in the video. For a bit rate below one Megabit per second the spatial resolution of 640 to 360 pixel was rated best. For bit rates higher than 1,5 Megabit per second the spatial resolution of 1280 to 720 pixel was rated best. Based on these results, within this thesis the following things will be taken into account in order to provide high quality, not only based on the quality of services, but also on the quality of experience: 1. For low bit rate the resolution is more important than the frame rate. 2. The more the bit rate increases the more increases the importance of the frame rate. 3. Higher bit rates results into better quality of experience. These points will have a significant influence on the decision making for the adaptation of the video quality. The previsioned adaptation mechanisms will be discussed in section 4.6. To keep the quality of experience up high, the main goal of the decision making for the adaptation mechanism will be to use high bit rates as long as possible. Before reducing it, frame rate and spatial resolution should be decreased. The decision, which parameter will be decreased first, depends on the current bit rate. If neither decreasing the frame rate nor reducing the spatial resolution help to adopt to current resource conditions, the bit rate will be reduced. The other way round, it will be the bit rate to be increased first. 3.2 ADAPTIVE BIT RATE STREAMING The adaptive bit rate streaming is based on the concept of a changing video/audio bit rate while transmitting the stream. There are mainly two types of how to decide for a bit rate to use for encoding: evaluation of TCP statistics on server side evaluation of the network characteristics on client side The main idea is to offer several streams in different quality. This technique in literature is called simulcast. According to the available bandwidth the most sufficient stream will be chosen for transmission. Figure 3.2 illustrates the principles of this technology. Most examples for streaming scenarios with adaptive bit rate streaming are using the HTTP, such as HTTP Live Streaming and MPEG-DASH. The adaptation decisions are influenced by different context aspects and media options (see [HOM08]). This problem can be formalized as a process p(c i, m i ), whereby m i characterize the media options and c i describe the context parameters. In [HOM08] media options are for instance bit rate, resolution etc. Context parameters contain information about available bandwidth, screen resolution, user s preferences and other values. 3.2 Adaptive bit rate streaming 25

Figure 3.2: adaptive bit rate streaming - principle 3.2.1 Evaluation on server side One possibility of handling the decision process is evaluating the statistical values, which were sent by the client through TCP acknowledgements and other quality of service (QoS) protocols completely on server side. The server application collects all necessary information for the decision process by monitoring the connection to the clients. Context information that can be provided by the client only, will be requested by special QoS protocols. Based on this information the server decides for the most appropriate and sufficient media stream, which is going to be transmitted to the client. The theoretical advantage of a server side evaluation can be seen in providing one stream instead of several streams of the same medium in different qualities. Assuming an underlying homogeneous network infrastructure 1 the decision process can be done using the average of all client context information. In a more complex setup, for instance heterogeneous network infrastructure, the server side decision process may be very problematic. Potential disadvantages result from the fact that content servers need to be upgraded and the clients may not offer all the context information that is needed for the decision process. 3.2.2 Evaluation on client side The major part of streaming applications that implement adaptive bit rate streaming, like HLS or MPEG-DASH, are evaluating the network conditions completely on client side. All media options and context information that are needed for the decision process are collected by the client and the client itself decides for the appropriate stream. The server, apart from the media, provides only meta information about the offered representations. Advantages of client side evaluation are reduced traffic between client and server, which is a result of not transmitting the client s context information to the server, less workload for the server through missing decision making for each client (if no homogeneous network infrastructure is available) and improved flexibility according to users preferences and heterogeneous network infrastructure. The disadvantages result from the increasing power consumption on client devices, which is caused by the decision process and the need for advanced media players, that support the decision making. 1 Network infrastructure includes used technologies on layers one and two OSI reference model as well as client devices 26 Chapter 3 Adaption Mechanisms

3.2.3 Proxy-based evaluation Proxy-based adaptation can be seen as a variant of server based adaptation. The adaptation logic is distributed on the streaming server and some proxy. It is also possible to host the entire decision process on a proxy server. This proxy server will provide all needed meta information to the clients instead of the original media source. This solution can be taken into consideration in case of limited resources on the server, as it will be the case if the server is running, for instance, on a mobile phone. The server just transmits the media to the proxy, where all the adaptation process will be done. The main disadvantage is the additional need of hardware and infrastructure. 3.3 HIERARCHICAL CODECS Another possibility for the adaptation of a media stream, instead of encoding several streams in different qualities, is to use hierarchical coding. By using hierarchical codecs only one stream has to be encoded. Lower qualities can be derived from it. 3.3.1 Basics of hierarchical codecs The idea of hierarchical codecs is to use a layered structure where the lowest layer, the so called base layer, contains the minimum of information that is necessary for decoding the media stream. By adding the information of an higher layer, the amount of total information and so the quality of the decoded media stream increases. As a result, this type of encoding is optimal for transmission in networks, which are based on packet switching with shared network resources and a high expectation of losses and errors (compare to [CRO98]). One packet contains information of only one layer, so they can be marked according to their layer and, therefore, according to the importance for the client, too. These markers can be used for the decision, which packets are to be dropped in case of occurring congestions. In combination with priority bits from the Internet protocol, the priority can be assigned to a certain layer. A typical technique used in hierarchical coding, is the hierarchical B picture technique (see [XIE11]). It increases video compression efficiency, because B frames have an higher compression rate compared to I or P frames. Typical implementations define four layers from bottom up, whereby the base layer contains only I and P frames. Higher layers contain only B frames. Figure 3.3 illustrates the mechanism. The principles of hierarchical codecs were implemented for example in H.261, H.263 and H.264. 3.3 Hierarchical codecs 27

Figure 3.3: hierachical codecs - principle 3.3.2 Scalable codecs Scalable codecs are similar to hierarchical codecs, more precisely, they are an extension of them. The only officially standardized codec for scalable video coding is the extension of H.264/AVC - the H.264/SVC ([ITU12], SVC can be find in Annex G). Like usually within hierarchical coding, one bit stream contains one or more valid substreams, which can be derived by dropping of packets. The concept of scalable video coding usually consists of three dimensions of scalability ([LEE10]): temporal dimension The temporal dimension refers to temporal resolution of the encoded video. A typical adaptation in this dimension is, for instance, the reduction of frames contained in one second of the video. For the scalable video coding the techniques used in temporal scalability dimension do not differ from those implemented in normal H.264 advanced video coding. Hierarchical prediction structures like the hierarchical B pictures, presented in section 3.3.1, are used. spatial dimension The spatial dimension refers to the spatial resolution of the encoded video. To adapt the bit stream in this dimension, the number of pixels within one video frame can be reduced. The spatial scalability dimension is based on the multilayer model like the temporal dimension. Each layer represents one of the supported spatial resolutions and can be identified by a dependency identifier D. quality dimension The quality dimension, or commonly called signal-to-noise ratio (SNR) dimension, describes the possibility of reducing the quality of the encoded video. It will be achieved by extracting and decoding coarsely quantized pixels from the compressed bit stream. These dimensions can be combined in all variants. By adjusting one or more of them the video stream can be adapted in a flexible way, which allows a good transmission even on resource-constrained networks ([LEE10]). 28 Chapter 3 Adaption Mechanisms

The scalable video coding, as defined in the H.264/SVC, can be used in combination with the HTTP and HLS or MPEG-DASH as well as RTP. The MPEG-DASH standard allows the usage of scalable video coding by default. For RTP the standard payload types were extended by RFC 6190 ([RFC11a]). Implementations using HTTP were presented in [SAN11]. Examples for using UDP/RTP the reader can find in [LEI08, OJA12, YIM12]. For a detailed explanation of scalable video coding the reader is referred to [SCH07, ITU12]. 3.4 SIGNALLING All adaptation mechanisms do have one fact in common: if adaptation is necessary, it must be signalled somehow. In this section some of the more recent approaches will be presented to the reader. 3.4.1 Single-Layer Approach In the beginning of research, the attention focused on single-layer approaches to fulfil the ISO/OSI reference model of network architecture. The main advantage of the layering paradigm is the modularity in protocol design, enabling interoperability and improving the design of communication protocols. Moreover, a protocol within a given layer is described in terms of functionalities it offers, while implementation details and internal parameters are hidden to the remainder layers ([KLI11]). Within this approach the signalling will not be applied between different layers, but only within the current layer of the protocol. For instance the RTP receives its information about throughput, packet loss etc, from the RTCP in the same layer. Direct information retrieval from the transport layer is not implemented. The most research was done in link layer and transport layer. The different solutions tried to solve the problems occurring in the mobile context. The single-layer approach loses more and more its importance, especially in context of mobile devices in wireless networks. The more common approach is the cross-layer approach. In the year 2012 the group of Yi-Mao presented an single-layer signalling approach in [YIM12] using scalable video coding, based on an UDP/RTP connection. All adaptation decisions will be made on server side, based on the loss information received from the client. The Adaptation control itself is divided into to parts: micro-control and macro-control. The macro-control has two main tasks. It concerns with, in general, the spatial scalability. The first task happens before starting the stream. It requests information about the client s hardware such like CPU power, GPU and maximum screen resolution. The second task consists of a measurement of the available bandwidth. According to the received information it decides for the appropriate resolution and the initial frame rate in the substream for the client. The hardware parameters will be set only once during the session. If the client signals the loss of an I-frame the second task of the macro-control will be repeated in order to assure that the bandwidth is still sufficient for the chosen resolution. 3.4 Signalling 29

The micro-control is responsible for the temporal scalability. Based on the packet loss, that is reported by the client the micro-control calculates, how many B- and P-frames have to be dropped to fit into the available bandwidth. If no further dropping is possible, the macro-control starts a new measurement for detecting the current available bandwidth. 3.4.2 Cross-Layer Approach Most of the recent research projects in the field of streaming proposed a cross-layer solution for the signalling. Cross-layer solutions help to increase efficiency in energy consumption, data transfer performance and quality of service (see [KLI11]). Mainly in the mobile context the interlayer communication is important to react in an appropriate way. In general, two approaches of cross-layering are defined in the literature (from [KLI11]): Weak cross-layering: Weak cross-layering enables interaction among entities at different layers of the protocol stack. It thus represents a generalization of the adjacency interaction concept of the layering paradigm to include non-adjacent interactions. Strong cross-layering: Strong cross-layering enables joint design of the algorithms implemented within any entity at any level of the protocol stack. In this case, individual features related to the different layers can be lost due to the crosslayering optimization. Potentially, strong cross-layer design may provide higher performance at the expense of narrowing the possible deployment scenarios and increasing cost and complexity. In the year 2012 the team of Tiia Ojanperä presented their prototype for a streaming framework, which uses scalable video coding and the cross-layer approach for session and adaptation signalling, in [OJA12]. This framework supports the basic protocols for video transmission and signalling. In addition to that, the framework is capable of collecting data about system performance and context information from different layers and, further on, from different network nodes. The implemented architecture is based on a triggering architecture described in [MAK07]. This trigger engine collects information from network interfaces, QoS monitoring tools, mobility managers, video streaming clients, etc. and evaluate them before triggering events. The trigger engine therefore defines interfaces for information collection and delivery as well as a specific format for the triggers. The latter contains three fields: ID, type, and value. ID and type identify the trigger and its producer. The value represents the actual context information. The trigger engine is connected with a mobility expert system (MES). The MES decides for the quality the client can receive, based on the triggers arriving from the trigger engine. The MES signals to the SVC-filter on the server, which level it wishes to receive. Figure 3.4 shows the scheme of the system. A similar system was implemented in the year 2008 by the group of Wolfgang Leister and presented in [LEI08]. 30 Chapter 3 Adaption Mechanisms

Figure 3.4: The streaming application implemented by the team of Ojanperä, from [OJA12] 3.5 COGNITIVE RADIO As this work is related to mobile devices, which are usually equipped with several network interfaces to different network infrastructure, such as UMTS, WLAN and Bluetooth, an algorithm for choosing the appropriate network is needed. The cognitive radio approach is a possible option to manage this aim. Several recent researches like [OJA12] or [DAS12] approve the possibility of efficient implementation. 3.5.1 Definition Of Cognitive Radio Cognitive radio is an approach to increase the efficiency rate for the usage of the available frequency spectrum for wireless data transmission. Since the rapid growth of wireless communications, the problem of inefficient utilization of the official assigned spectrum raises. For instance the 2.4 GHz spectrum gets more and more overcrowded by different applications with a significant diversity, including voice, short messages, web and multimedia, which is leading to a reduced level of user satisfaction, because of low quality of experience. The United States Federal Communications Commission (FCC) had analysed the usage of the assigned spectrum in the year 2002 ([FCC02]). The result shows the usage of the spectrum within the range from 0-6 GHz. It varies from 15 percent up to 85 percent. Frequencies assigned to amateur radio, paging or television are under-utilized, while the 2.4 GHz band is reaching its limit in capacity. This report was the basic motivation for the cognitive radio approach trying to solve this issue. The first projects try to use the spectrum for analogue television broadcast. The Institute of Electrical and Electronic Engineers (IEEE) formed a working group for cognitive radio, the IEEE 802.22. One of the pioneers in cognitive radio is J. Mitola. definition for cognitive radio: In [MIT00] he gave the following 3.5 Cognitive Radio 31

The term cognitive radio identifies the point at which wireless personal digital assistants (PDAs) and the related networks are sufficiently computationally intelligent about radio resources and related computer-to-computer communications to detect user communications needs as a function of use context, and to provide radio resources and wireless services most appropriate to those needs. 3.5.2 Cognition Tasks And Cognitive Circle One of the required features for a cognitive radio is intelligence. The definitions of intelligence differ in literature. Mitola defined nine different levels of intelligence, shown in table 3.1, starting with level 0 and going until level 8, which describes a system, that acts autonomously and makes decisions on self-learned rules. According to Mitola, each level has to be implemented in a cognitive radio. Presenting the state of art of cognitive radio, Vianello described other views on the topic in [VIA10]. Some authors split the definition into three parts: the adaptive radio, the cognitive radio and the intelligent radio (see figure 3.5). Mapping it on the levels defined by Mitola, an adaptive radio would include levels 0 and 1. A cognitive radio includes in addition levels 2 to 5 and an intelligent radio includes all levels. Level Capability Task Characteristics 0 Pre-programmed The radio has no model-based reasoning capability 1 Goal-driven Goal-driven choice of RF band, air interface, and protocol 2 Context Awareness Infers external communications context (minimum user involvement) 3 Radio Aware Flexible reasoning about internal and network architectures 4 Capable of Planning Reasons over goals as a function of time, space, and context 5 Conducts Negotiations Expresses arguments for plans or alternatives to user, peers, networks 6 Learns Fluents Autonomously determines the structure of the environment 7 Adapts Plans Autonomously modifies plans as learned fluents change 8 Adapts Protocols Autonomously proposes and negotiates new protocols Table 3.1: Characteristics of radio cognition tasks, from [MIT00] In all the different definitions three key features can be found, which are the following: observation adaptation The radio is able to get information about its environment in a direct or indirect way. 32 Chapter 3 Adaption Mechanisms

The radio is able to change its state and/or operational mode. intelligence The radio is able to change its state and/or operational mode after a decision based on the observed information to reach a given goal is made. Figure 3.5: Venn diagram illustrating the different types of radio and their relation, like in [VIA10] These key features lead to the cognitive circle which is described in similar ways by Mitola in [MIT00] and Thomas in [THO07]. Thomas described the cognitive circle for radio using the OODA (observe - orient - decide - act) circle from Boyd ([BOY86]) as shown in figure 3.6. Figure 3.6: ooda circle, from [THO07] The radio observes the environment in its surroundings by using the mechanisms of spectrum sensing. These information will be evaluated and preprocessed to determine the priority of the information (orient). After the orientation the radio decides and acts in the appropriate way. The cognitive circle described by Mitola includes two additional parts: plan and learn. In the planning phase the radio takes in consideration the alternatives and checks whether there is a more appropriate solution. The learning phase evaluates the taken decision for possible future events. For further details the reader is referred to [MIT00]. 3.5 Cognitive Radio 33