Competitive Analysis of Adaptive Video Streaming Implementations

Similar documents

Techniques for a New ARRIS Generation of Traffic Types Carried by DOCSIS

SIP Trunking and Voice over IP

Digital Audio and Video Data

Performance Evaluation of AODV, OLSR Routing Protocol in VOIP Over Ad Hoc

Clearing the Way for VoIP

White paper. Latency in live network video surveillance

Proactive Video Assurance through QoE and QoS Correlation

Analog vs. Digital Transmission

D. SamKnows Methodology 20 Each deployed Whitebox performs the following tests: Primary measure(s)

Requirements of Voice in an IP Internetwork

Channel Bonding in DOCSIS 3.0. Greg White Lead Architect Broadband Access CableLabs

Applications that Benefit from IPv6

Per-Flow Queuing Allot's Approach to Bandwidth Management

INTRODUCTION. The Challenges

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

Network Simulation Traffic, Paths and Impairment

Bandwidth Aggregation, Teaming and Bonding

An Introduction to VoIP Protocols

Fujitsu Gigabit Ethernet VOD Solutions

LARGE-SCALE INTERNET MEASUREMENTS FOR DIAGNOSTICS AND PUBLIC POLICY. Henning Schulzrinne (+ Walter Johnston & James Miller) FCC & Columbia University

Strategies. Addressing and Routing

Wideband: Delivering the Connected Life

How To Understand The Differences Between A Fax And A Fax On A G3 Network

White paper. H.264 video compression standard. New possibilities within video surveillance.

BACKGROUND. Big Apple Case Study 2

The Picture must be Clear. IPTV Quality of Experience

Bandwidth Trends on the Internet A Cable Data Vendor s Perspective. Tom Cloonan, Chief Strategy Officer, ARRIS

Classes of multimedia Applications

QOS Requirements and Service Level Agreements. LECTURE 4 Lecturer: Associate Professor A.S. Eremenko

NIELSEN'S LAW VS. NIELSEN TV VIEWERSHIP FOR NETWORK CAPACITY PLANNING Michael J. Emmendorfer and Thomas J. Cloonan ARRIS

Technical Training Seminar on Troubleshooting the Triple Play Services for CCTA Member Companies August 24, 25, 26, 2010 San Juan, Puerto Rico

Technote. SmartNode Quality of Service for VoIP on the Internet Access Link

IxLoad: Testing Microsoft IPTV

How To Understand How Bandwidth Is Used In A Network With A Realtime Connection

ADVANTAGES OF AV OVER IP. EMCORE Corporation

Understanding Latency in IP Telephony

VOICE OVER IP AND NETWORK CONVERGENCE

Protocols. Packets. What's in an IP packet

Measure wireless network performance using testing tool iperf

Key Components of WAN Optimization Controller Functionality

Quality of Service Monitoring

MULTICAST AS A MANDATORY STEPPING STONE FOR AN IP VIDEO SERVICE TO THE BIG SCREEN

Chapter 3 ATM and Multimedia Traffic

Octoshape s Multicast Technology Suite:

Capacity Management in Multimedia Networks. Presented at SCTE Emerging Technologies 2005

internet technologies and standards

ENSC 427: Communication Networks. Analysis of Voice over IP performance on Wi-Fi networks

Course 4: IP Telephony and VoIP

How To Send Video At 8Mbps On A Network (Mpv) At A Faster Speed (Mpb) At Lower Cost (Mpg) At Higher Speed (Mpl) At Faster Speed On A Computer (Mpf) At The

NComputing L-Series LAN Deployment

networks Live & On-Demand Video Delivery without Interruption Wireless optimization the unsolved mystery WHITE PAPER

Managed IP Video Service: Making the Most of Adaptive Streaming John Ulm & John Holobinko Motorola Mobility

Cable Modems. Definition. Overview. Topics. 1. How Cable Modems Work

Traffic load and cost analysis for different IPTV architectures

Computer Network. Interconnected collection of autonomous computers that are able to exchange information

diversifeye Application Note

Live and On-Demand Video with Silverlight and IIS Smooth Streaming

Quality of Service Testing in the VoIP Environment

White Paper Three Simple Ways to Optimize Your Bandwidth Management in Video Surveillance

WILL HTTP ADAPTIVE STREAMING BECOME THE DOMINANT MODE OF VIDEO DELIVERY IN CABLE NETWORKS? Michael Adams Ericsson Solution Area TV

(Refer Slide Time: 4:45)

EINDHOVEN UNIVERSITY OF TECHNOLOGY Department of Mathematics and Computer Science

Region 10 Videoconference Network (R10VN)

Advanced Networking Voice over IP: RTP/RTCP The transport layer

Segmented monitoring of 100Gbps data containing CDN video. Telesoft White Papers

Final for ECE374 05/06/13 Solution!!

Application Note How To Determine Bandwidth Requirements

Avaya ExpertNet Lite Assessment Tool

Network administrators must be aware that delay exists, and then design their network to bring end-to-end delay within acceptable limits.

VoIP Bandwidth Considerations - design decisions

White Paper Content Delivery Networks (CDN) for Live TV Streaming

How To Make A Multilink Ame Relay Work Faster And More Efficiently

Application Notes. Introduction. Contents. Managing IP Centrex & Hosted PBX Services. Series. VoIP Performance Management. Overview.

Voice over IP: RTP/RTCP The transport layer

IMPROVING QUALITY OF VIDEOS IN VIDEO STREAMING USING FRAMEWORK IN THE CLOUD

QoS issues in Voice over IP

Video over IP WHITE PAPER. Executive Summary

TCP in Wireless Mobile Networks

STANDPOINT FOR QUALITY-OF-SERVICE MEASUREMENT

VoIP QoS. Version 1.0. September 4, AdvancedVoIP.com. Phone:

Cable and Satellite Digital Entertainment Networks. Moving toward an Integrated Digital Media Experience

INVENTION DISCLOSURE

PackeTV Mobile. solutions- inc.

DSAM VoIP Offerings App Note

BCS THE CHARTERED INSTITUTE FOR IT BCS HIGHER EDUCATION QUALIFICATIONS. BCS Level 5 Diploma in IT SEPTEMBER Computer Networks EXAMINERS REPORT

Cisco Digital Media System: Cisco Digital Media Player 4305G

4. H.323 Components. VOIP, Version 1.6e T.O.P. BusinessInteractive GmbH Page 1 of 19

How To Understand The Technical Specifications Of Videoconferencing

Analysis of IP Network for different Quality of Service

WebEx. Network Bandwidth White Paper. WebEx Communications Inc

12 Quality of Service (QoS)

Voice over IP Basics for IT Technicians

Fragmented MPEG-4 Technology Overview

Microsoft Smooth Streaming

Practical advices for setting up IP streaming services.

Troubleshooting VoIP and Streaming Video Problems

High Definition (HD) Technology and its Impact. on Videoconferencing F770-64

Performance Evaluation of VoIP Services using Different CODECs over a UMTS Network

Evolving Telecommunications to Triple Play:

Serving Media with NGINX Plus

Transcription:

Competitive Analysis of Adaptive Video Streaming Implementations Tom Cloonan, Ph.D. Chief Strategy Officer Jim Allen, Ph.D. Staff Engineer ARRIS 2400 Ogden Avenue- Suite 180 Lisle, IL 60532 Ph : 630-281-3050 Fax : 630-281-3362 E-mail : tom.cloonan@arrisi.com

Keywords IP Video Adaptive Streaming Quality of Experience Simulation ABR Video Abbreviations and Acronyms 3GP- Third Generation Partnership Project file format AAC- Advanced Audio Coding ABR- Adaptive Bit Rate ACK- Acknowledgement Message AVC- Advanced Video Coding Bcast- Broadcast CM- Cable Modem CMTS- Cable Modem Termination System DOCSIS- Data Over Cable System Interface Specification DS- Downstream DTA- Digital Terminal Adaptor GOP- Group of Pictures HD- High Definition HFC- Hybrid-Fiber Coax HSD- High-Speed Data HTTP- Hypertext Transfer Protocol HTTP GET- HTTP Data Request I, P, & B Frame- IP- Internet Protocol IP TV- Internet Protocol Television ISO- International Organization for Standardization KB- Kilobytes Kbps- Kilobits per second Mbps- Megabits per second MP3- MPEG 3 MPEG- Moving Picture Experts Group MPEG-4- Moving Picture Experts Group (version 4) MPEG-TS- MPEG Transport Stream MSO- Multiple System Operator NAT- Network Address Translation Ncast- Narrow Cast OTT- Over The Top QAM- Quadrature Amplitude Modulation (or Quadrature Amplitude Modulator) Page 2 of 34

QoE- Quality of Experience QoS- Quality of Service RTP- Real-time Transport Protocol RTT- Round Trip Time SDV- Switched Digital Video TCP- Transmission Control Protocol UDP- User Datagram Protocol Tmax Maximum Sustained Traffic Rate VC-1- Video Coding codec standard (SMPTE 421M) VoD- Video on Demand VoIP- Voice over Internet Protocol VP8- Video compression format WMA- Windows Media Audio 1 Introduction Adaptive Internet Protocol (IP) Video Streaming (also called Adaptive Bit Rate (ABR) Video) using Hypertext Transfer Protocol (HTTP) and Transmission Control Protocol (TCP) has rapidly evolved to become one of the most popular techniques for delivering IP Video to personal computers and hand-held devices. Its deployment by many Over-The-Top content providers on the Internet has led to such successful service offerings that its very existence has effectively changed the mix of traffic on the Internet. Over-The-Top (OTT) IP Video traffic currently constitutes approximately half of the Internet bandwidth in recent traffic samplings. In addition, Over-The-Top IP Video traffic is experiencing a growth rate that is basically driving the bandwidth growth trends on the Internet. [San] As a result of this rising popularity, Multiple System Operators (MSOs) are beginning to plan their networks to ensure that two important goals are satisfied in the future: 1) Their subscribers will continue to get good quality delivery of Over-The-Top IP Video streams, as the subscriber Quality of Experience is more and more being determined by the quality of this IP Video delivery 2) The MSOs will be able to offer an enhanced version of IP Video delivery through an MSOmanaged service that offers better quality video (through better content libraries and guaranteed QoS-oriented delivery systems that do not transit the unpredictable links of the Internet backbone). Such an MSO-managed service will provide subscribers with a unified experience across all three screens- the personal computer, the hand-held device, and the television. To satisfy these two goals, it is clear that MSOs will need to add High-Speed Data bandwidth capacity using more Data Over Cable Interface Specification (DOCSIS) channels within their Hybrid-Fiber Coax Page 3 of 34

(HFC) networks to provide needed future capacity. In addition, it is also clear that MSOs must be able to accurately predict how client-based Adaptive Video Streaming algorithms (whether used for Over- The-Top delivery or MSO-managed delivery) will respond to various network conditions that the IP Video traffic may encounter. This paper will explore the performance of Adaptive IP Video Streaming algorithms under various network conditions. It attempts to identify corner case scenarios where one might find undesirable Quality of Experience levels waiting to surprise MSOs as they start delivering and deploying IP Video services. A particular focus was placed on scenarios that yield unfair results where one set of subscribers experiences excellent Quality of Experience levels while another set of subscribers experiences terrible Quality of Experience levels. Four different Quality of Experience issues will be investigated in various corner case scenarios. These four different Quality of Experience issues include: 1) IP Video stream packet loss due to CMTS buffer overflows 2) IP Video stream rendering engine starvation due to IP Video client buffer underflows (during periods of congestion) 3) IP Video resolution dithering due to rapidly-occurring bandwidth adjustments of an IP Video client s Adaptive Streaming algorithm (during periods of congestion) 4) IP Video resolution down-shifting to unacceptably-low resolution levels due to bandwidth adjustments of an IP Video client s Adaptive Streaming algorithm (during periods of congestion). This paper will also dive into some of the details regarding how different types of ABR Video Rate Selection algorithms might interact with one another. The results of the paper can be used to help MSOs design their future Hybrid-Fiber Coax networks to ensure good subscriber Quality of Experience for Over-The-Top IP Video delivery, for MSO-managed IP Video delivery, and for combinations of the two that share Channel Sets within a common Service Group. 2 General Background on IP Video Multiple System Operators (MSOs) are dealing with rapid changes on many fronts as they move into the 2010 decade. At least two of their three primary service types (Video, High-Speed Data, and Voice) are experiencing rapid growth in bandwidth demands as we move into the decade, because their Video services require more HD and 3D content and their High-Speed Data services must provide an ever-increasing amount of bandwidth to the data subscribers (most of which is consumed by IP Video content). Thus, as Multiple System Operators (MSOs) begin to expand their existing Hybrid-Fiber Coax (HFC) networks to support the increasing bandwidth demands of the 2010 decade, they are oftentimes finding that they have to deploy many different tools within their toolkit to accommodate the growing subscriber demands. As an example, many MSOs have started to Page 4 of 34

implement node splits in an effort to reduce the number of subscribers sharing bandwidth for Narrowcast (Ncast) services such as High-Speed Data (HSD), Voice over IP (VoIP), and Video on Demand (VoD). Some MSOs have also found great benefits in the application of Switched Digital Video (SDV) as a technique to reduce the bandwidth consumed by the continuallytransmitted Broadcast (Bcast) services. Other MSOs have found the use of Digital Terminal Adaptors (DTAs) as a means to convert their legacy analog services into more spectrallyefficient digital services. All of these modifications are aimed at providing efficiencies or improvements in the delivery of already-existing service types for Video, High-Speed Data, and Voice services. However, in parallel with these important transition efforts, most MSOs are also working on new architectures that will ensure their ability to expand into new developing markets that are already beginning to show rapid growth. One of these emerging markets dominating the minds of many forward-looking planners within MSOs is the burgeoning IP Video market. At a high level, IP Video is simply the delivery of video services throughout the home network using Ethernet/IP packets to carry the content to various end-points in the home. In actuality, most subscribers do not really care how the content was actually delivered to their home as long as it can be routed around their home using Ethernet/IP. Within MSO Hybrid-Fiber Coax networks, two different approaches are being considered for the actual transport of the IP Video streams from the MSO head-end to the subscriber s home: 1) Over the existing QAM-based MPEG-TS infrastructure 2) Over the existing DOCSIS infrastructure The first method uses traditional EdgeQAMs to transport the video content to the Set-Top Boxes or Media Gateways, where the content is encapsulated in Ethernet/IP packets for final transmission over the home network. The second method uses DOCSIS CMTSs to transport the video content to Cable Modems or Media Gateways, and those elements merely relay the packets into the home network. This second method permits the video content to be encapsulated in Ethernet/IP packets for its entire path (from the server to the client), making it more straight-forward to adopt the modern HTTP delivery methods for retrieving the IP video content. While both methods will likely find applications in MSO networks of the future, this paper will focus on the second method the DOCSIS method. Over-The-Top content providers have used the MSO s DOCSIS infrastructure as a dumb pipe to offer their server-based content to subscribers for many years now, and their subscribers have accessed that content in greater and greater numbers over the years using various types of delivery mechanisms. During the early days of IP Video (in the early-1990s), the content was initially delivered to the home using Traditional Streaming with Ethernet/IP/UDP/RTP encapsulations. Custom players and custom servers were usually utilized. It worked fairly well, but it did run into issues with inhome NAT boxes and congested networks. Page 5 of 34

As a result of these issues, IP Video began to transition to a new delivery technique in the late- 1990s using Progressive Downloading with Ethernet/IP/TCP/HTTP encapsulations. This helped circumvent many of the NAT box issues and network congestion issues. Progressive Downloading still used custom players (such as browser plug-ins), but it was often able to use inexpensive, standards-based servers. The video content was stored in large files on the server, and a single HTTP GET message from the client to the server would initiate the downloading of the file. Nevertheless, trick modes (such as fast-forward, rewind, and pause) and the large file downloads for partially-watched videos led to wasted bandwidth on the network. And like the Traditional Streaming solution, Progressive Downloading also suffered whenever network congestion intensified. Both Traditional Streaming and Progressive Downloading were used for over a decade, and they are still used (to some extent) today. However, the latest improvements in IP Video delivery began to be utilized in the middle of the 2000 decade. These improvements replaced the single HTTP GET message of Progressive Downloading with a series of repeated HTTP GET messages, with each HTTP GET message requesting a different, small chunk (or fragment) of the video content file. As a result, only the video content that is to be viewed is actually requested, so the problems associated with wasted bandwidth are minimized. In addition, since the video fragments tended to be fairly short in duration (2-10 seconds was typical), it was very easy to efficiently support trick modes. The short-duration fragments also made it possible for the clients to rapidly identify network congestion and adjust their HTTP GET messages to request higher or lower resolution fragments that could be accommodated by the available network bandwidth at any instant in time. These rapid adjustments in the resolution (and bit-rate) of successively-requested video fragments came to be known generically as HTTP-based Adaptive Streaming. The creation of a fragmented IP Video file requires several processing steps. First, the video content and audio content must be separately encoded and compressed. At the present time, several different video coding standards can be utilized, including H.264 (aka MPEG4-Part 10 or AVC) encoding, VP8 encoding, MPEG-4 Visual, and VC-1 encoding. Video is essentially a sequential series of still images (frames) that are played one after the other to create the illusion of smooth motion to the eye. Various frame rates exist, but a typical 30 frames per second frame-rate would display a new image every 1/30 sec= ~33.33 msec. Encoders compress the image information using spatial techniques based on the Discrete Cosine Transform (which converts the image into its spatial frequency components), quantization (which suppresses some of the high frequency components in the Discrete Cosine Transform), and Run-Length Encoding (which compresses long sequences of repeated numbers within the Discrete Cosine Transform into shorter sequences). The encoders also compress the information using temporal techniques that identify redundancies in successive frames. Some frames (known as I frames) make no use of this temporal compression technique. Other frames (known as P frames) use temporal compression based on the previous I or P frame. Other frames (known as B frames) use temporal compression based on both previous and successive I or P frames, which produces the highest levels of frame compression. A Group Of Pictures (or Page 6 of 34

GOP) is then defined as a sequence of successive I frames, P frames, and B frames. Each GOP must begin with at least one I frame which serves as the basis frame from which the P and B frames within the GOP can then be calculated. A full-length video stream covering the duration of a movie is therefore created from a large number of successive GOPs, with each GOP containing a number of I frames, P frames, and B frames. This is illustrated in Fig. 1. Several different audio coding standards can also be paired with the video coding standards. Typical audio coding standards include AAC, MP3, Vorbis, and WMA. A common pairing might use H.264 video encoding with AAC audio encoding. Since HTTP-based Adaptive Streaming offers many different resolutions for a particular piece of content, the same content must be encoded and compressed multiple times to produce multiple outputs, each output associated with a different resolution level. For example, the width (in pixels) x the height (in pixels) for five different encoder outputs might be 1920x1080, 1280x720, 854x480, 640x360, and 176x144. Each of these encoded outputs would have a different resolution as well as a different average bit-rate (which might range from 12 Mbps down to 128 kbps). Thus, the ratio of the highest to the lowest average bit-rate might yield a factor as large as ~100, so the HTTP-based Adaptive Streaming algorithms can have a broad range of resolutions and bit-rates to choose from as they adjust their requested IP Video bitrates to match the available channel capacities. The next step in the creation of a fragmented and packaged IP Video program requires each of the different-resolution bit-stream outputs from the coding step to be packaged in a containerized format that is then stored inside of a single file. Many different container types have been defined by different vendors, including Flash Video, MPEG2-TS, WebM, 3GP, and Ogg. Many of these container formats are based on the formatting standards defined by the MPEG4-Part 14 specification (which is based on the MPEG4-Part 12 specification, which is based loosely on the Apple Quicktime format). Files that conform to the MPEG4-Part 14 specification are said to use the standardized ISO Base Media File Format. These files are also known as fragmented MP4 or fmp4 files. Depending on the vendor and the containerization format used, these files are given various file suffixes, including.mp4 files,.mpeg files,.ogg files,.flv files,.wmv files, and.mov files. [Gil] [Odr] [Wag] In addition to creating the fragmented IP Video file, the packaging sub-system is usually responsible for creating a server Manifest file and a client Manifest file. These Manifest files provide several important pieces of information needed to access the fragments in the fragmented IP Video files. The Manifest file information includes pointers (ex: URL information) that helps the IP Video client locate (on an Internet server or MSO-managed server) a particular fragmented IP Video file with a particular desired resolution and bit-rate. The Manifest file also contains a dictionary of insertion indices (or starting point offsets) that indicate where the start of a particular fragment can be accessed within the fragmented IP Video file. These fragments usually start at the beginning of a GOP and contain an integer number of GOPs, as shown in Fig. 1. Page 7 of 34

Fig. 1- Frames, GOPs, and Fragments Once the client Manifest file for a particular piece of IP Video content is distributed from the MSO head-end to the IP Video client device, the client software is able to generate an appropriate HTTP GET message for any available fragment of any available resolution for that IP Video content. Upon receiving a correctly-formatted HTTP GET message from an IP Video client, the IP Video server is able to access the correct IP Video content file and rapidly chain through several sub-containers in the file to find the requested fragment. That fragment (a piece of the bigger file) is then chopped up into smaller pieces and wrapped up in TCP frames for transport back down to the requesting IP Video client. [Atz] [Per] [Zam] Different end-to-end IP Video delivery systems were created by many different vendors. These systems included Adobe s Flash Dynamic Streaming, Microsoft s Smooth Streaming, and Apple s HTTP Live Streaming. Each vendor added slightly different specifics (such as minor differences in file formats) within their respective HTTP-based Adaptive Streaming solutions, but the biggest subscriber-visible differences are probably found in their proprietary client algorithms for deciding when to change the adaptive streaming resolution in response to network congestion. But at a 10,000-foot level, most of these solutions (as well as other proprietary HTTP-based Adaptive Streaming solutions) are actually implemented in quite similar fashions. In recent years, different Over-The-Top content providers have used most of these solutions to deliver their content to a growing number of subscribers. MSOs have been cognizant of this growing trend for several years now. As a result of the rising popularity of IP Video, MSOs recognize that they must modify their DOCSIS networks to ensure that they have enough bandwidth for the Over-The-Top services to be successfully delivered to their subscriber base. Most MSOs also strongly believe that they would benefit by offering their Page 8 of 34

own MSO-managed IP Video service. However, they expect that their own offering would be an improvement over the offerings from Over-The-Top providers. In particular, the MSO-managed offering would likely include improved video content (ex: more live offerings) and could also provide improved Quality of Experience levels for their subscribers by carefully managing the data delivery bandwidth or isolating the MSO-managed video streams onto a separate set of DOCSIS channels. MSOs planning for this transition to IP Video will likely follow a phased approach. These phases may include: Phase Protocol Service Managed in Specially- Designated Bandwidth Requires User Authentication & Authorization? Network 1 Unicast VoD No Yes DOCSIS or Other 2 Unicast Live No Yes DOCSIS 3 Unicast VoD Yes Yes DOCSIS 4 Unicast Live Yes Yes DOCSIS 5 Multicast Live Yes Yes DOCSIS The Specially-Designated Bandwidth mentioned in the list above is a reference to either special IP Video-oriented DOCSIS Service Flows or to special IP Video-oriented Channel Sets dedicated to the IP Video service. The order of the deployment for the various phases listed above may vary from MSO to MSO, but in the end, most MSOs will likely support most of those delivery systems. It is highly likely that the last delivery system (multicast transmission of Live content) will be enabled only when the take-rate on the IP Video service has grown to sufficiently-high levels to yield significant efficiency gains from the added complexity of multicast (which avoids redundant consumption of bandwidth by multiple unicast streams carrying the identical content). 3 The Simulation Environment The fundamental goal of this work was to determine if there are unforeseen Quality of Experience issues that may surface as MSOs encounter more ABR Video traffic. The authors wanted to ensure that the results of this work would be applicable to both Over-The-Top IP Video services and to MSOmanaged IP Video services because it will be imperative that MSOs provide a good Quality of Experience for each of those service types. The authors also wanted to explore corner cases that might develop over time and surprise MSOs late in the deployment phase. For the most part, these corner cases might be related to interesting mixes of IP Video subscribers and Web-surfing subscribers that come together in both congested and un-congested service groups. Page 9 of 34

In an ideal world, we might have emulated each of these different corner case scenarios in a lab environment using real-world equipment. However, inserting modifications into the real-world Adaptive Streaming algorithms in commercially available IP Video clients was not possible, and creating the levels of congestion needed in the experiments proved to be quite challenging. As a result, the authors turned to simulation methods to help predict the Quality of Experience levels that might be seen by the various mixes of subscribers in these corner case scenarios. A Java-based simulator was created to mimic both the behavior of the users and the operation of the various pieces of networking equipment. At a high level, the simulator modeled the operation of all of the elements shown in Fig. 2. Fig. 2- Network Elements in the Simulation Environment Two different types of Clients were modeled: ABR Video viewers and Web-surfers. Each of the Clients had configurable parameters that would define their behaviors and activities. The simulation tool permitted different numbers of Clients from each of the client types to be added to the Service Group, so the traffic mixes could be easily altered. The Clients supported HTTP, TCP, and UDP. The HFC plant was modeled to support various types of DOCSIS Service Groups. The number of downstream and upstream channels could be easily changed to alter the size of the Service Group. The CMTS was modeled using most of the Quality of Service-oriented sub-systems found in realworld CMTS equipment. Each Router (or Internet) link has configurable latency, bit rate and packet delay. The Servers in the simulated network can support HTTP, TCP, or UDP operation, which allow them to work well with any of the Clients. They are able to generate IP Video streams that closely mimick real-world streams that we have sampled. 4 Adaptive Streaming Network Protocol At a high level, the typical message and packet exchanges between IP Video clients and IP Video servers are shown in Fig. 3. In the figure, the IP Video client is assumed to be along the right-most vertical arrow, and the IP Video server is assumed to be along the left-most vertical arrow. The vertical arrows are showing the flow of time (which is advancing in the downward direction). The small interval of time represented in the figure illustrates the IP Video client making three successive HTTP GET requests for three different IP Video fragments from the IP Video server. In each case, the IP Video server responds by sending the requested IP Video fragment in a stream of TCP packets Page 10 of 34

whose basic flow-rate is predominantly determined by the TCP congestion window (which provides congestion control) and the TCP receive window (which provides flow control). The TCP ACKs being sent back from the IP Video client to the IP Video server are used to indicate when transmitted TCP packets have successfully arrived at the IP Video client. They are also used to manipulate the IP Video server s internally-managed congestion window. The successful arrival of TCP ACKs will tend to increase the IP Video server s congestion window, which typically permits more un-acknowledged packets to be launched into the network by the IP Video server. In essence, the actions of these successfully-received TCP ACKs on the congestion window will tend to increase the rate at which the TCP packets are transmitted by the IP Video server. Delayed or duplicated TCP ACKs (which indicate missing TCP packets at the IP Video client) will decrease the IP Video server s congestion window, which will tend to decrease the rate at which the TCP packets are transmitted by the IP Video server. Thus, TCP offers a very powerful form of feedback from the IP Video client to the IP Video server that can accelerate or decelerate the rate at which packets are being transmitted by the IP Video server. A second form of congestion control within the IP Video transfer of Fig. 3 is shown by the transmissions of the second and third fragments. In the figure, the second fragment takes longer to be transmitted due (presumably) to network congestion. The IP Video client can indirectly recognize this network congestion by measuring the arrival period of the fragment (shown on right side of the figure) or by measuring the average bandwidth during the time when the fragment is arriving. (Note: These two measurements are closely related to one another). Upon recognizing the network congestion during the transmission of the second fragment, the Adaptive Streaming algorithm operating in the IP Video client can decide to lower the bit-rate associated with the next fragment (fragment #3). In the figure, the IP Video client sends an HTTP GET message for a lower-resolution fragment during the third fragment s transmission. The reader should note that the first two requested fragments were 6 Mbits in length, whereas the third requested fragment was only 3 Mbits in length. As a result, the average bit-rate associated with the third fragment will be roughly half of the bit-rate required for the first two fragments. This changing of the resolution is exactly how Adaptive Streaming Rate Selection algorithms work to down-shift the bit-rate associated with an IP Video stream during congestion. In a similar fashion, the Adaptive Streaming algorithms can also request a higher-resolution fragment and up-shift the bit-rate associated with an IP Video stream whenever it detects that downloads are occurring more rapidly (indicating that bandwidth capacity is unused and available on the network). Thus, there are two forms of congestion control in play within Fig. 3. These two forms of congestion control can work together to help down-shift or up-shift the bit-rates associated with IP Video streams in response to increases or decreases in network congestion. This is sometimes called a nested double-feedback congestion control algorithm. The fact that both of these effects are working together tends to make IP Video streams very compressible and very expandable. They can rapidly down-shift or up-shift to adapt to changes in the available network capacity. Page 11 of 34

Fig. 3- Typical Message and Packet Exchanges for 3 IP Video Fragments A mechanism for estimating the available network bandwidth will now be described. The mechanism described here is similar in many respects to those used in existing ABR clients. The average bit-rate experienced by a received video fragment can be easily calculated by recognizing the beginning and the ending of the fragment transmission. If the fragment s transmission size is given by A x (in bits) and if the fragment s arrival period is given by T 1 (in seconds), then the average bit-rate B 1 for the fragment can be calculated as B 1 = A x / T 1 (in bits per second). These calculations are illustrated in the bit-rate vs. time plots of Fig. 4 for the three fragments delivered in Fig. 3. Within Fig. 4, the top-most plot corresponds to transfers similar to the first fragment transmission in Fig. 3. It represents a high-resolution fragment (a large fragment with a large rectangular area) that is transmitted without any congestion, so the fragment is transmitted in a short window of time at a fairly high bit-rate. The middle plot corresponds to transfers similar to the second fragment transmission in Fig. 3. It represents a high-resolution fragment (a large fragment with a large rectangular area) that is transmitted in the presence of congestion, so the fragment is transmitted in a longer window of time at a lower bit-rate. (Note: The areas under the rectangles of the top and Page 12 of 34

middle plots are identical). The bottom plot corresponds to transfers similar to the third fragment transmission in Fig. 3. It represents a low-resolution fragment (a small fragment with a small rectangular area) that is transmitted in the presence of congestion, so the fragment is transmitted in a short window of time at a fairly low bit-rate. (Note: The area under the bottom rectangles is less than the area under the top and middle rectangles). From the plots in Fig. 4, it should be apparent that an ABR Rate Selection algorithm could use any one of several metrics to indirectly measure the congestion in the network. One approach would be to use the average bit-rate of each fragment transfer (corresponding to the height of the rectangles). Another approach would be to use the fragment arrival period of each fragment (corresponding to the width of the rectangles). Another approach would be to measure the duty cycle for each fragment, where duty cycle is defined as the fragment arrival period divided by the fragment period and the fragment period is defined as the ideal period of time between successive fragments. (In the simulation environment described here, the authors opted to use the first approach based on measuring the average bit-rates for each fragment.) It is also common to pass these individual data rate estimates through a smoothing filter to remove most of the short-term statistical variation from the bandwidth estimate. In our simulated ABR Rate Selection algorithm, described in the next section, we call this smoothed bandwidth estimate Rs and compare it against the available IP video content bit-rates to determine which rate should be used in the next IP Video fragment request. Page 13 of 34

Fig. 4- Bit-rate vs. time for various Fragment Transmissions 5 The ABR Rate Selection Algorithm Simulation Model A key component in the creation of the Java-based simulator was the design of the Adaptive Streaming Rate Selection algorithm model. The authors wanted to be able to mimic the behaviors of most popular commercially-available Adaptive Streaming clients. In particular, we wanted our results to be relevant to the Rate Selection algorithms used in: Microsoft s Silverlight, Apple s HTTP Live Streaming, and Adobe s Flash video clients. Much about these products, however, is proprietary and we have had to resort to a bit of reverse-engineering in this regard. During the course of this process we came to appreciate that, while the algorithm can be implemented in a fairly straightforward manner from a high level view, a good deal of art can be employed in the selection of a few critical time constants. These time constants determine the balance between the algorithm s three basic needs: to select the best resolution (as quickly as possible), to hold onto it as tightly as possible, and yet to avoid frequent and annoying resolution changes. Each vendor has made their own (and, most likely, different) best guess at the optimal balance between these tradeoffs and future vendors will probably do something different still. Page 14 of 34

As time goes on, we anticipate that video providers will find many new ways to make their algorithms just a little better/quicker at claiming shared channel capacity for their customer at the expense of competing provider s customers. This will, of course, require each competing provider to respond with their own more aggressive version and a video bandwidth arms race will be underway. (Note: This concept was alluded to by another study described in [Akh]). With these considerations in mind we sat out to implement a generalized ABR Rate Selection algorithm that could be parameterized to emulate currently available vendor products and also to allow some of the clients to be slightly more aggressive than other similar clients sharing the same channel. We used empirical results available in the literature [Akh] to validate our basic implementation. A high-level description of our generic ABR Rate Selection algorithm is outlined in the steps below. This process is repeated for each incoming IP Video fragment: 1) Let A = Aggressiveness // Between 1 and 5; where 1 indicates Least Aggression 2) Measure R = 0.8 * (Average Bit-Rate during last Fragment Arrival Period) 3) Measure F = Fraction of 10 second buffer that is not yet filled; 1 => Empty, 0 => Full 4) If (R > Rs) then a = 0.04 * A // Increase Resolution very slowly 5) else a = 0.1 / A // Decrease Resolution more quickly 6) a = a + (1-a) * F 3 // Decrease Very Quickly if Nearly Empty 7) Rs = (1- a)*rs + a*r // Rs = exponentially smoothed moving average of R 8) if (Rs > 3.0 Mbps) set Resolution = 3.0 Mbps elsif (Rs > 2.1 Mbps) set Resolution = 2.1 Mbps elsif (Rs > 1.5 Mbps) set Resolution = 1.5 Mbps else set Resolution = 1.0 Mbps For the sake of clarity, the algorithm above uses constant values corresponding to most of our tests in the place of several operational parameters. We believe that (appropriately parameterized) it corresponds well to the behavior of many available providers [Akh] though it may differ in many implementation details. It additionally provides a simple mechanism to make some clients slightly more aggressive than others and we have shown through our simulations that this can result in the more aggressive clients receiving better quality video at the expense of their neighbors. It should be said, however, that we have intentionally omitted a mechanism for artificially limiting the frequency of making resolution changes. The algorithm above might make as many as four Page 15 of 34

resolution changes in a single minute, which most customers would find objectionable. Most commercial products would simply defer an indicated resolution increase (deferring a decrease runs the risk of a buffer underflow) until an amount of time that they considered acceptable had passed. Inclusion of such a mechanism in our simulations, however, would have three undesirable affects: 1) network traffic patterns would take longer to stabilize requiring longer simulation runs, 2) the overall average video quality would be slightly decreased from clients deferring resolution increases, and 3) the clients who defer resolution increase would become easy victims for more aggressive clients who responded more quickly. The IP Video Adaptive Streaming algorithm described above attempts to provide a simple, but flexible, design that can easily mimic the behaviors of real-world Adaptive Streaming clients. The algorithm tended to respond rapidly to required down-shifts in the bit-rate (since they are sometimes required to keep from overwhelming the Service Group s bandwidth resources), while responding more slowly to permissible up-shifts in the bit-rate. The Aggressiveness parameter (A) permits one to make changes to the two response times. The detailed design of this generic IP Video Adaptive Streaming algorithm above may differ from the other commercially-available algorithms on the market. However, it seemed to be able to nicely track the behaviors of some of the commercially-available algorithms as well as the algorithms monitored by [Akh] if the various design parameters were appropriately tuned. The primary configurable design parameter that could be tuned was the Aggressiveness (A). However, one could visualize many other designs that could be realized within the very large design space by adding modifications to the steps that calculate the exponentially smoothed moving average or by adding modifications to the steps that determine which resolution should be assigned. The Adaptive Streaming algorithm design space is extremely large. It is also open and available to all of the commercial algorithm designers, and it is the existence of this very large design space that practically guarantees that End-to-end IP Video Deliver System vendors will likely be involved in an arms race that leads to continued improvements and modifications to their algorithms in a very rapid fashion. The authors believe that this large design space practically guarantees that vendors will work hard to provide better IP Video performance within their algorithms at the expense of the performance of their competitors service. This complex inter-play between different algorithmic designs with differing degrees of Aggressiveness was one of the areas that the authors hoped to study within this paper. The generic algorithm described above includes a statically-configured parameter labeled Aggressiveness (A). This value determines how quickly the algorithm would up-shift to higher resolutions when bandwidth becomes available, and it also determines how quickly the algorithm would down-shift to lower resolutions when network congestion occurs. Larger values of A are more aggressive, in that they cause up-shifts to occur more rapidly and down-shifts to occur more slowly. The algorithm described above also includes a dynamically-measured parameter known as the Emptiness Fraction (F). This value describes how much of the IP Video client s rendering buffer Page 16 of 34

remains empty. Values of F that are close to 1.0 indicate that the rendering buffer is nearly empty and in danger of under-flowing. This larger value of F leads to more rapid down-shifts in the bandwidth, which will (hopefully) lead to a more rapid arrival of fragments in the rendering buffer. 6 Monitored Parameters Defined This section defines measurements and metrics that have been used to configure the simulated network and to monitor its performance during simulation runs. Channel Capacity: The Channel Capacity of any network data link is defined here to be maximum rate (in Mbps) that user data can be transmitted over the link (excluding network management messages and any other dedicated overhead). For DOCSIS downstream Service Groups used in our simulations the Channel Capacity is equal to the Bonding Group Size * the Capacity of a single DS. The Channel Capacity of 2 bonded DOCSIS DS channels, therefore, would be: 2 * 40 Mbps = 80 Mbps. Actual Utilization: The Actual Utilization of a network data link is defined, over any given interval of time, to be the percentage of that link s Channel Capacity that is actually consumed by user traffic. For example, if 10 MB of user data were transmitted over 2 bonded DOCSIS channels during a 2 second interval, the resulting Actual Utilization would be: 100% * 8 bits/byte * 10 MB / (2 sec * 80 Mbps) = 50%. Actual Utilization Is often measured over time intervals of a few seconds or more since measurements over very short intervals can exhibit a great deal of random variation. It is worth noting that the Actual Utilization measurements defined here can actually reach 100% due to the way that we have defined Channel Capacity (as excluding system overhead). Utilization measurements in the field may not account for system overhead and management messages and might, therefore, never reach 100%. Desired Data Rate: We define the Desired Data Rate for each user to be the data rate that would be consumed by that customer on a completely uncongested network. The Desired Data Rate is limited by the user s Maximum Sustained Data Rate (Tmax) and TCP window size but not by any competing user traffic. As an example, a single IP Video viewer would probably desire getting the highest resolution video fragment every 2 seconds (the fragment period). If we assume that the highest resolution video fragment contains (on average) 6 Mbits of data, then the IP Video viewer would obviously desire an average bit-rate of 6 Mbits/2 second = 3 Mbps. If 50 active IP Video viewers were attached to the Service Group during a simulation run, then their aggregated desired bit-rate would be (3 Mbps/viewer)*(50 IP Video viewers) = 150 Mbps. As another example, a single Web-browsing user would probably desire getting his/her 500 kbyte web-page downloaded at the Tmax rate of 22 Mbps, so the entire download would consume (500 kbytes)*(8 bits/byte)/(22 Mbps) = 0.18 seconds of time. Then he/she would view that downloaded web-page for an average perusal period of 15 seconds, meaning that he/she Page 17 of 34

probably desires to utilize an average bit-rate on the Service Group capacity of (500 kbytes)*(8 bits/byte)/(0.18 seconds + 15 seconds) = 263.5 Kbps. If 100 active Web-browsing users were attached to the Service Group during a simulation run, then their aggregated desired bit-rate would be (263.5 kbps/user)*(100 Web-browsing users) = 26.35 Mbps. Desired Utilization: We now define the Desired Utilization of a network data link to be sum of the Desired Data Rates for all users on that link expressed as a percentage of the link s Channel Capacity. The Desired Utilization then represents the percentage of the total Channel Capacity of that link that is required to give every user their highest Quality of Experience level. Thus, if an entire Service Group had 50 active IP Video viewers and 100 active Web-browsing users, as in the examples above, then the total aggregated Desired Data Rate from all 150 of those active subscribers would be 150 Mbps + 26.35 Mbps = 176.35 Mbps. The total Channel Capacity for the Service Group can also be calculated as previously described. For example, with a 4-channel Service Group, the total Channel Capacity would be: (4 channels)*(40 Mbps/channel) = 160 Mbps. The Desired Utilization for this example can now be calculated as follows: (100%)*(176.35 Mbps/160 Mbps) = 110% As illustrated in this example, it is possible that the Desired Utilization can be greater than 100%. This can result when each active user has an expectation for a specific amount of bit-rate capacity, and the pool of active users connected to a Service Group at a single instant of time yields a cumulative bit-rate expectation that exceeds the bit-rate capacity of the Service Group. Usually, MSOs architect the channel counts within their Service Groups so that these overload conditions do not normally exist. However, it is possible that transient behaviors of a large pool of subscribers can produce short-lived windows of time when this overload condition might exist. As a result, this paper will explore Desired Utilization levels ranging from 50% to 200% in an attempt to quantify the Quality of Experience levels during these short-lived windows of time. It should be clear that high Desired Utilization levels can lead to increased buffer depths in the network elements, increased delays in packets, and even dropped packets in the network elements (due to congestion control algorithms). The scheduling sub-systems within the CMTS will typically use DOCSIS QoS information to determine which packet streams to throttle down during periods of high Desired Utilization on the channels within a Service Group. In the end, the CMTS will always decrease the throughput on the Service Group s channels such that the Actual Utilization will always be less than 100%. For Desired Utilizations less than 100%, the Actual Utilization level is maintained roughly at the Desired Utilization level (as shown in Fig 5). However, for Desired Utilizations greater than 100% (indicating the existence of network congestion), the Actual Utilization level is clamped at a level near 100% as packets are delayed and or dropped to lower the Actual Utilization levels. This could lead to a reduction in the subscriber s perception of his/her Quality of Experience level. Because of this fact, MSOs should Page 18 of 34

try to predict and manage the Desired Utilization levels of their subscribers in an effort to predict and prevent drops in user Quality of Experience levels. Fig. 5- Actual Utilization vs. Desired Utilization HTTP Response Time: The simulation environment records the time at which each HTTP GET Request message was initiated by a client, and also records the time when the entire response (the last packet for the requested Web-page or the last packet for the requested IP Video fragment) was received by the client. This quantity was called the HTTP Response Time in the simulation runs, and it was measured for both IP Video fragments and Web-browsing pages. It should be apparent that HTTP Response Times for IP Video streams that approach the fragment period (ex: 2 seconds) are becoming problematic. Human factors studies performed both by ARRIS and other researchers have shown HTTP Response Times for Web-Surfing activities typically become problematic as they start to approach four seconds. [Aka] Round Trip Time: The simulation environment also monitors the time interval between each TCP packet transmission and the receipt of its corresponding ACK. This quantity was called the TCP Round Trip Time (RTT) in the simulation runs, and it was measured for both IP Video fragments and Web-browsing pages. Packet Loss Probability: The Packet Loss Probability is a measure of the percentage of the transmitted packets that are lost (or dropped) within a given window of time. In the simulation, the actual number of packets transmitted (P) from the servers in a window of time is measured, and the number of packets lost due to drops (L) in the same window of time is also measured. Page 19 of 34

The Packet Loss Probability is defined as: Packet Loss Probability = (100%)*(L/P). Packet losses can occur in many elements within any End-to-end IP Video delivery system or a Web-browsing system. In our simulation environment packet loss is primarily due to CMTS buffer overflows or IP Video client buffer overflows. Each type of loss in monitored in our simulation runs. Video Errored-Second Probability: The display of a digital video image is typically accomplished without any video degradation. However, it is possible that the image created by the rendering engine is less than perfect (image tiling, freezing of the frame, etc.) if some of the video content information has been either lost or delayed. The simulation environment keeps track of these events, counting the total number of seconds that a video was being displayed and also counting the number of seconds during which the video display had one or more errors (due to packet loss or packet delay). Video Errored-Seconds are monitored for each Video display and also in aggregate for selected groups of video displays. The Video Errored-Second Probability is defined as: Video Errored-Second Probability = (100%)*(# Errored Seconds)/(Total # Seconds) ABR Video Resolution Changes per Minute: For an IP Video viewer, one of the characteristics of the viewing experience that can greatly impact their Quality of Experience level is the number of times that the ABR Rate Selection algorithm changes their resolution within a short window of time. As a result, the simulation environment monitors the number of resolution changes per minute for each ABR Video stream. Average ABR Video Resolution: One of the characteristics of the viewing experience that can greatly impact an ABR Video viewer s Quality of Experience is the proportion of time during which his/her ABR Video is displayed with an adequately high resolution. What is an adequately high resolution? It probably depends on the type of device being used for viewing purposes- a viewer on a high-definition television would probably define only the highest resolution available as being acceptable, whereas a viewer on a smart-phone would probably define even very low resolutions as being acceptable. So defining a resolution that is good enough is difficult. As a result, the authors opted to define a metric that could capture the average viewing experience of a large number of viewers in an aggregate sense. We have called this metric the Average ABR Video Resolution, and in this paper, it will be measured in units of Mbps. We calculate the Average ABR Video Resolution for any interval of time (ex: once per second) by adding up the actual IP Video resolution last requested by each subscriber and dividing by the total number of IP Video viewers. Page 20 of 34

If there are four permissible ABR Video resolution levels (3.0 Mbps, 2.1 Mbps, 1.5 Mbps, and 1.0 Mbps), then one would expect to find the Average ABR Video Resolution level to be somewhere between 3.0 Mbps and 1.0 Mbps. This Average ABR Video Resolution level is created by a combination of both time averaging and spatial averaging. Time averaging occurs when the ABR Video Adaptive Streaming algorithm for a single user dithers his/her resolution up and down between two or more different resolution levels. Thus, a single user that dithers between 3.0 Mbps resolution and 2.1 Mbps resolution might display an Average IP Video Resolution of 2.6 Mbps, even though 2.6 Mbps is not one of the discrete resolution levels provided by the IP Video service provider. Spatial averaging occurs when the ABR Video Adaptive Streaming algorithms on different users tune to different resolution levels. Thus, if there are currently two active users within a Service Group, and if one of those users has requested the 2.1 Mbps ABR Video resolution and if the other user has requested to the 1.5 Mbps ABR Video resolution, then the Average ABR Video Resolution for the Service Group would be 1.8 Mbps, even though 1.8 Mbps is not one of the discrete resolution levels provided by the video service provider. As a result, the reader might envision the Average ABR Video Resolution as a metric describing the average of a mix of temporally-changing and spatially-changing ABR Video Resolutions being experienced by the pool of subscribers in the Service Group. In general, if the Average ABR Video Resolution level increases, then more viewers are receiving higher resolution levels for longer periods of time (and more viewers would likely register higher Quality of Experience scores). And if the Average ABR Video Resolution level decreases, then more viewers are receiving lower resolution levels for longer periods of time (and more viewers would likely register lower Quality of Experience scores). It is interesting to note that MSOs can take an active role in determining how to spatially distribute the different ABR Video Resolution levels to their subscribers. For example, they could add in hooks to the IP Video Adaptive Streaming algorithms to help ensure that viewers on high-definition televisions would typically get access to higher resolution IP Video streams, whereas viewers on smart-phones would be permitted to down-shift to much lower resolution IP Video streams. This could also be managed from a centralized control element, which might be called upon to establish different types of Service Flows and Manifest files for each of the different types of ABR Video clients, with high-definition televisions getting higher bandwidth Service Flows and Manifest files with only high resolution ABR Video streams. Thus, if the Average ABR Video Resolution level permitted in a particular Service Group is 2.6 Mbps, the MSOs could ensure that high-definition televisions remain at the 3.0 Mbps resolution level while the other types of clients would be down-shifted to 2.1 Mbps resolution levels and below. This approach would ensure that demanding subscribers get the highest resolutions available. This approach would also help to maintain a more stable temporal resolution level for the most demanding subscribers. Page 21 of 34

7 Simulation Configuration The authors have explored many different corner case scenarios in an attempt to identify surprising conditions that might lead to unexpected degradations in the Quality of Experience levels for either IP Video services or Web-browsing services. Various mixes of IP Video and Web-browsing traffic were explored. In addition, many adjustments were made to the simulation environment to explore the effect of changes in areas such as the Desired Utilization levels in the Service Group, the Service Group s bit-rate capacity, the number of IP Video streams per DOCSIS Service Flow, the DOCSIS Service Flow s Maximum Sustained Traffic Rate, the number of bit-rates (resolution levels) for the IP Video content, the TCP Round Trip Times for the flows, etc. Presentation of all of these results would have resulted in a paper that would have been much too long. As a result, the authors have decided to present a subset of the results, choosing some of the more surprising and/or valuable results. In most of these simulations, we studied one or more Quality of Experience metrics as a function of Desired Utilization and (usually) some other Service Group or Client parameter. In order to create results that could be easily compared across all simulations sets we have used a consistent set of network attributes for as many of the network properties as possible. Parameterization choices selected (unless otherwise stated) for each of these simulation runs is summarized in the table below. Properties Consistent Across All Simulations: Property Value Notes DOCSIS DS 2 Bonded Channels 80 Mbps DOCSIS US 1 Channel 30 Mbps ABR Fragment Size 2 seconds ABR Rendering Buffer 10 seconds Video Buffer Preload HTTP Page Size 500 KB Random Variations HTTP Think Time 15 seconds Random Variations TCP RTT (Video & HTTP) 6 msec Tmax (Video & HTTP) 22 Mbps Max Sustained Rate ABR Coding Rates 1.0, 1.5, 2.1 & 3.0 Mbps ABR Aggression 1 Lowest Level Traffic Mix 60% Video, 40% HTTP Browser Each of the simulation descriptions that follow will explicitly indicate any of the above properties that have been given a different value for the purpose of examining its effect on the network performance. Page 22 of 34

Although the above 60/40% traffic mix may not exactly match the mixes found on the Internet today, it does represent a reasonable and consistent point for evaluating interactions between IP Video and Web-surfer traffic types. In general, inactive subscribers were not simulated. The simulator was only initialized with a specific number of clients who represented the active subscribers at a given point in time. Each simulated Web-surfer was modeled as a user who was associated with a DOCSIS service flow having a Maximum Sustained Traffic Rate setting of Tmax=22 Mbps. Each Web-surfer retrieved web pages with a random size that averaged 500 kbytes. These were downloaded using HTTP and TCP. After receiving the complete web page, the Web-surfer would go silent for a random time that averaged 15 seconds in an effort to mimic the reading (or perusal) of the web page. As a result, if the download experienced no network congestion, then each Web-surfer would typically desire 263 Kbps of network traffic. [Note: 263 Kbps = (500 KB)*(8 bits/byte)/(0.18 seconds + 15 seconds), where 0.18 seconds is the time required to transmit 500 KB at 22 Mbps ]. Each simulated ABR Video client was modeled as a user requesting video fragments that were 2 seconds in length. These fragments were requested via HTTP GET operations. Video fragments were delivered into the client s 10 second video rendering buffer playout began when the rendering buffer became ½ full. The video client requests fragments having one of four video coding rates as available bandwidth permits. The method for determining the number of HTTP and Video users to be configured in each individual simulation was determined as follows: No. HTTP Browsers = Desired Utilization * Channel Capacity * (HTTP Traffic Mix %) / 263 Kbps No. Video viewers = Desired Utilization * Channel Capacity * (Video Traffic Mix %) / 3 Mbps This procedure will produce a traffic profile with both the specified Traffic Mix and Desired Utilization. 7.1 Simulation #1: Impact of Traffic Mixes on QoE This simulation series was designed to determine how the traffic mix between Web Browsers and ABR Video Viewers within a Service Group affects the ultimate Quality of Experience for IP Video viewers or Web-surfers. Two sets of simulations were made one that had predominately Web Browser traffic and one with a majority of ABR Video traffic. Each set contained 17 different simulations at Desired Utilization levels that varied from 50% to 200% of the actual downstream channel capacity. Page 23 of 34

Settings for Simulation Series #1: User Group Traffic Mix Tmax Desired Bandwidth/user HTTP Browsers 70 % 22 Mbps 263 Kbps ABR Video Users 30 % 22 Mbps 3 Mbps Settings for Simulation Series #2: User Group Traffic Mix Tmax Desired Bandwidth/user HTTP Browsers 10 % 22 Mbps 263 Kbps ABR Video Users 90 % 22 Mbps 3 Mbps This simulation was designed to determine if a change in the mix of traffic types within a Service Group negatively impacts the ultimate Quality of Experience for IP Video viewers or Web-surfers. Two important Quality of Experience metrics were recorded for each of the simulation runs: the Average ABR Video Resolution (in units of Mbps) for the active ABR Video viewers and the Average Surfer Response Time (in units of seconds) for the active Web-browser users. Each of these measurements is shown in its own chart in the plots in Fig. 6. The two curves within each chart show the results for the two analyzed traffic mixes: one with 30% ABR Video and 70% Web-surfing and the other with 90% ABR Video and 10% Web-surfing. Fig. 6- Impact of Traffic Mixes on Quality of Experience Page 24 of 34

Several observations fall out of the results above. First, it is quite apparent that the traffic mix within a Service Group does have an impact on both the IP Video viewer s Quality of Experience and the Web-surfers Quality of Experience. The video users all get full 3 Mbps video resolution until the Desired Utilization reaches ~95% of the DOCSIS Channel Capacity. At Desired Utilization levels greater than 100% we see that the Average ABR Video Resolution reduces below 3 Mbps in order to reconcile the Desired Utilization with the actual Channel Capacity. Video users in the simulation containing only 30% video traffic must reduce their average video resolution more dramatically because of the smaller number of video streams in that traffic mix. Video users belonging to the 90% video traffic mix, however, have a much greater number of video streams to share the burden of resolution reduction and are able to sustain significantly higher average video resolution at very high levels of congestion (Desired Utilization) At sufficiently high levels of Desired Utilization all ABR video programs have reduced their rates to the lowest available video resolution (1.0 Mbps) and no further video resolution reduction is possible. Our simulations have shown that larger Desired Utilization levels simply produce increasing levels of video errors due to packet delay. From the right-most chart in Fig. 6 we can see that Average Response Time for HTTP Browsers, in the traffic mix that is 90% video, increases much more slowly with Desired Utilization. It seems that the very large video user population is able to absorb the necessary resolution reduction resulting in lower packet delay for the browsing traffic. In the traffic mix with less video traffic, however, the average surfer response time increases more dramatically at much lower Desired Utilization levels because fewer video flows are available to yield bandwidth. We can think of this ABR video traffic as being more compressible (assuming that we are willing to accept the reduced video resolution) than the HTTP traffic. It seems that a significant amount of ABR video traffic might actually improve overall system performance due to its ability to cushion less compressible traffic like HTTP browsing. 7.2 Simulation #2: Impact of ABR Video Client Aggressiveness on QoE This simulation was designed to determine if a change in the ABR Video Client Aggressiveness on the part of some ABR video clients negatively impacts the ultimate Quality of Experience for other, lessaggressive, ABR video viewers. Two sets of simulations were made one comparing ABR clients with modest differences in aggression and one comparing ABR clients with more pronounced differences in aggression. Each simulation set divided the ABR Video clients into two equal groups, A & B, with different amounts of ABR bandwidth aggression. Each set contained 17 different simulations at Desired Utilization levels that varied from 50% to 200% of the actual downstream channel capacity. Page 25 of 34

Settings for Simulation Series #1: User Group Traffic Mix Aggression Desired Bandwidth/user HTTP Browsers 40 % 263 Kbps ABR Video Users (A) 30 % 1 3 Mbps ABR Video Users (B) 30 % 2 3 Mbps Settings for Simulation Series #2: User Group Traffic Mix Aggression Desired Bandwidth/user HTTP Browsers 40 % 263 Kbps ABR Video Users (A) 30 % 1 3 Mbps ABR Video Users (B) 30 % 5 3 Mbps The charts in Fig. 7 show the Average ABR Video Resolution (in Mbps) for each of the simulations. Fig. 7- Impact of ABR Bandwidth Aggression on Quality of Experience From the above charts it is quite apparent that ABR Video clients with higher Aggressiveness values do obtain a larger share of the available bit-rate capacity (and, therefore, experience higher ABR Video resolutions) during periods of congestion. Larger differences in aggression produce larger differences in video resolution. Although not shown in these plots, we have also observed what seems to be a natural trade-off between the ABR Video client Aggressiveness and the temporal stability of the ABR Video streams. More aggressive ABR Video clients seem to experience more Resolution Changes per Minute, as they Page 26 of 34

continually try to snap back to higher resolutions whenever transient fleeting windows of bit-rate capacity appear. A completely unexpected observation is indicated by the arrows labeled Similar in the chart above. These arrows indicate two Average ABR Video Resolution levels (2.1 & 1.5 Mbps) at which the more aggressive client seems to have much less advantage over the less aggressive client. Interestingly, these ABR resolutions also happen to be two of the intermediate available ABR coded rates (i.e., 3.0, 1.5, 2.1 & 1.0 Mbps). We believe that these points indicate that ABR flows must use much more temporal averaging to achieve Average ABR Resolutions that are between available ABR coded resolutions flows must switch between the next higher and lower ABR coded resolutions over time. To achieve an Average ABR Resolution that is near one of the available ABR coded resolutions, most ABR flows can simply switch to that coded rate and stay there. It appears that ABR Bandwidth Aggressiveness might be more affective when a significant amount of temporal averaging is required to achieve the Average ABR Resolution needed to meet the available Channel Capacity. This might mean that a video service provider could defend itself against aggressive competing ABR clients by providing video source material with a greater number of available ABR coded resolutions. 7.3 Simulation #3: Temporal Oscillations vs. Avg. ABR Video Resolution & Client Aggressiveness This simulation was designed to determine what effect a very large number of ABR video flows might have on network performance. In this set of simulations we repeated the procedure described in section 7.2 with the single exception that we substituted a DS Bonding Group of 4 DOCSIS channels instead of the 2 DOCSIS channels used in 7.2. This change, in effect, doubled our DS Channel Capacity and, therefore, also doubled the number of each type of user required to provide the indicated traffic mix. What we found was that the expected relationship, that we described in section 7.2, between Average ABR Video Resolution and Desired Utilization was much less pronounced for the comparison with the less aggressive clients (i.e., Agg = 2) and hardly recognizable at all for the comparison with the more aggressive clients. Following up on this puzzling outcome we took a look at the Average ABR Video Resolution vs Time for both of these simulation sets at a Desired Utilization level of 150% as shown in Fig. 8. Page 27 of 34

Fig. 8- Average ABR Video Resolution vs. Time From the above charts we can see a great deal of temporal instability (or resolution oscillation) in the more aggressive ABR clients Average ABR Resolution (shown in green). This instability was not obvious in the simulations used in section 7.2 with fewer ABR video clients. This effect is especially pronounced in the most aggressive ABR clients shown in the chart on the right. In this chart we see Average ABR Resolution shifts that indicate huge numbers of ABR flows simultaneously making ABR resolution changes in synchronism. The authors hypothesize that these oscillations may be due to the distributed nature of the ABR Video Rate Selection process and caused by the synchronization of the Adaptive Streaming algorithm state variables in many ABR Video streams. Assume that we have N active video streams sharing a channel. If the Adaptive Streaming algorithm on one of the ABR Video streams down-shifts its bit-rate, the resulting reduction in bandwidth on the Service Group is instantly detected by the Adaptive Streaming algorithms in the other (N-1) ABR Video clients. Since they all detect this increase in available bandwidth, they are all simultaneously encouraged that this is a good time to up-shift their bit-rates. If they all do this at roughly the same time, the sudden surge in bandwidth in the Service Group would cause even more congestion on the channel than we started with which would also be detected by all of the Adaptive Streaming algorithms. Since they all detect the increase and a decrease in available bandwidth, they all simultaneously decide that it is time to down-shift their bit-rates. This starts the cycle all over again. Under normal situations this positive feedback cycle is sufficiently damped in the ABR Rate Selection Algorithm, but if the number of ABR clients grows sufficiently large the multiplying effect of due to the number of clients will eventually overpower whatever damping may be present in the client. More aggressive ABR clients will be especially vulnerable to this phenomenon. It seems possible that there may be a theoretical limit to the number of ABR clients that can safely share a downstream channel. Page 28 of 34

In general, two conditions are required for the oscillations to occur- 1) some of the IP Video client must have a high Aggressiveness value, and 2) there must be a large number of IP Video flows in the Service Group. 7.4 Simulation #4: Impact of TCP Round Trip Time on QoE This simulation set was designed to determine how an ABR Video stream s TCP Round-Trip-Time (RTT) impacts its ultimate Quality of Experience. Two sets of simulations were made one comparing ABR clients with modest differences in RTT and one comparing ABR clients with more pronounced differences in RTT. Each simulation set divided the ABR Video clients into two equal groups, A & B, with different RTT values. In each simulation set the smaller RTT (i.e., 6 msec) also matched the RTT of the HTTP browsers. Each set contained 17 different simulations at Desired Utilization levels that varied from 50% to 200% of the actual downstream channel capacity. Settings for Simulation Series #1: User Group Traffic Mix TCP RTT Desired Bandwidth/user HTTP Browsers 40 % 6 msec 263 Kbps ABR Video Users (A) 30 % 6 msec 3 Mbps ABR Video Users (B) 30 % 100 msec 3 Mbps Settings for Simulation Series #2: User Group Traffic Mix TCP RTT Desired Bandwidth/user HTTP Browsers 40 % 6 msec 263 Kbps ABR Video Users (A) 30 % 6 msec 3 Mbps ABR Video Users (B) 30 % 200 msec 3 Mbps In this scenario we might imagine that the ABR Video clients with small Round Trip Times might correspond to clients receiving their ABR Video streams from nearby MSO-managed servers, while ABR Video clients with large Round Trip Times might correspond to the clients receiving their video streams from more distant Over-The-Top servers. The charts in Fig. 9 show Average ABR Video Resolution (in Mbps) vs. Desired Utilization for each of the two comparison simulations. Page 29 of 34

Fig. 9- Impact of Round Trip Times on Quality of Experience From the above charts it seems unclear whether small difference in RTT will result in any appreciable difference in ABR Video Resolution. It does, however, seem that large RTT differences might result in measurable differences in video quality. There even seem to be signs that the temporal averaging phenomenon described in section 7.2 may be at work. 7.5 Simulation #5: Impact of DOCSIS Maximum Sustained Traffic Rate on QoE This simulation set was designed to determine how an ABR Video stream s DOCSIS Maximum Sustained Traffic Rate (Tmax) impacts its ultimate Quality of Experience. Two sets of simulations were made one comparing ABR clients with modest differences in Tmax and one comparing ABR clients with more pronounced differences in Tmax. Each simulation set divided the ABR Video clients into two equal groups, A & B, with different Tmax values. In each simulation set the larger Tmax value (22 Mbps) also matched the Tmax value of the HTTP browsers. Each set contained 17 different simulations at Desired Utilization levels that varied from 50% to 200% of the actual downstream channel capacity. Settings for Simulation Series #1: User Group Traffic Mix Tmax Desired Bandwidth/user HTTP Browsers 40 % 22 Mbps 263 Kbps ABR Video Users (A) 30 % 22 Mbps 3 Mbps ABR Video Users (B) 30 % 18 Mbps 3 Mbps Page 30 of 34

Settings for Simulation Series #2: User Group Traffic Mix TCP RTT Desired Bandwidth/user HTTP Browsers 40 % 22 Mbps 263 Kbps ABR Video Users (A) 30 % 22 Mbps 3 Mbps ABR Video Users (B) 30 % 11 Mbps 3 Mbps The charts in Fig. 10 show Average ABR Video Resolution (in Mbps) as a function of Desired Utilization. Fig. 10- Impact of Maximum Sustained Traffic Rates on Quality of Experience The charts above show that Tmax values seem to have a much more pronounced effect on video resolution than RTT (described in the previous section). In the chart on the left we can see that the ABR clients with Tmax=18 Mbps suffered in comparison to video clients having Tmax=22 Mbps even though Tmax was far in excess of the 3 Mbps data rate needed to receive the highest resolution video. The chart on the right shows that this effect is even more dramatic when the Tmax value is reduced to 11 Mbps. This effect was a bit of a surprise, because the authors originally believed that an ABR Video stream with an average bit-rate of 3 Mbps would not be greatly impacted by a Maximum Sustained Traffic Rate of 11 Mbps (which is much larger than the 3 Mbps bit-rate of the IP Video stream). This effect does exist, however, because (during congestion) a CMTS typically throttles the bit-rates for various service flows in a fashion that keeps their scheduled bit-rates proportional to their Maximum Sustained Traffic Rates. As a result, Service Flows with lower Maximum Sustained Traffic Rates will get less of the multiplexed channel bit-rates during congestion. ABR Video Adaptive Streaming Page 31 of 34