1 Video-over-IP Network Performance Monitoring: What should you measure and why? Everything starts with high-quality original content. A perfect transmission network and top-of-the-line CPE will still display garbage video if the original content is garbage: garbage in, garbage out. Every source of video content should be constantly monitored if possible: every satellite receiver, every Video-on-Demand server, and every ad insertion server. Original content monitoring is the place to verify basic video quality metrics such as over-compression, under-compression, pixelization, tiling, frozen video, missing audio tracks, poor audio/video synchronization. I was talking with a Comcast engineer one day and he had been pulling his hair out during the Beijing Olympics because the swimming pool (think about all of the Michael Phelps coverage here!) sometimes showed some very strange, very distracting visual artifacts. He was getting irate calls from customers but could find nothing wrong in his network or with any of his equipment. Comcast finally tracked the problem down to a faulty HDTV camera being used by NBC in Beijing: garbage in, garbage out. Once you have good quality video content, you need to make sure it gets through your transmission network in good shape. We were working with a customer that was experiencing intermittent video impairments on a single video channel, ESPN2. The video would freeze up, the audio would drop out, the video would pixelate and block, you name it and it was going wrong on ESPN2. Well, it turned out that this customer had two switches in his network, a primary Cisco switch and a backup switch. Whenever the Cisco switch got overloaded (typically during routing updates) some of its traffic was offloaded to the backup switch. And generally this scheme worked just fine. Until the load on the backup switch exceeded 400 Mbps, at which point it started dropping packets left and right, unbeknownst to the customer. It turned out that ESPN2 was NOT the only channel experiencing the video impairments; it was just the mostwatched channel experiencing them (think The Polish Cooking Channel here). You need to monitor the video at as many different points in the transmission network as possible in order to quickly and effectively pinpoint the sources of network-induced impairments.
2 And, once that great video content gets to your customer, you need to make sure that the equipment he has allows him to enjoy it to the fullest. I was working with Telecom Austria on a channel change time or channel zap time issue. Customers used to speedy channel surfing were now complaining about how long it took to change the channel each and every time. Telecomm Austria was working with Microsoft at the time and Microsoft kept insisting that its patented Instant Channel Change technology would fix the problem. It didn t; we discovered that the problem was with the set-top box that Telecomm Austria had selected for its CPE. It was taking the set top box a full 2 to 3 seconds to issue each channel change request. Once the request was issued, the network responded in less than 400ms with the new channel. Microsoft was selling Telecomm Austria a network-based solution to a CPE problem!
4 Here is the fastest explanation of how digital video compression works known to man: you start with a complete still image, as shown here (well, without all of the crazy blue lines). These complete images are called Intra-Frames or I-Frames and are very similar to JPEG images taken by your digital camera. You send such a complete image (AKA an I-Frame ) to your customer at least two times each second. Why twice a second? Because the I-Frame rate determines how quickly that customer can recover from visibly detectable errors: twice per second means that 500 milliseconds is the maximum time needed to recover from a single visible impairment. But, since you need to send 30 frames per second for a moving picture, what do you send to the customer for each of the remaining 28 video frames each second? Video compression algorithms divide each I-Frame up into macroblocks (the crazy blue lines shown here). For the 14 or so video frames that occur between each set of I-Frames (called Predicted (P)- Frames or Bidirectional (B)-Frames ), you only send some very select information about each macroblock, rather than the whole shebang. What information about each macroblock do you need to send? Not very much, usually! Just the changes in brightness and color, and any needed motion vectors. The fundamental idea is that most of the image usually either remains the same (think about all of those CNN talking heads here!) or all moves with identical motion vectors (think about a slowly panning camera shot) and thus does not need to be retransmitted; only the changing/moving macroblocks need to be updated. The pattern of each series of frames that occurs between I-Frames is referred to as a Group of Pictures or a GOP. Digital video encoders are programmed to follow a specific GOP pattern, usually something like I-B-B-P-B-B-P-B-B-P-B-B-P-B-B. You can tell quite a lot about the quality of the video just by looking at the GOP that the encoder is using. How often are I-Frames being sent? Is it at least twice per second? Is the encoder sending lots of low-information B-Frames in between the I-Frames rather than the more robust P-Frames?
5 If you are going to invest money in monitoring just a single point, monitoring the quality of your original content as it enters your network is where you want to spend it. And you should monitor absolutely everything possible here: look at the Ethernet packets, look at the MPEG-2 transport stream packets (which will be present even if you are using nothing but MPEG-4 video compression, I promise!), and look as deeply as possible into the raw, compressed video information, be it MPEG-2, H.264, or VC-1. Get as detailed here as you can possibly afford: look at the original compression bit rate compared to the transmitted bit rate, look at the GOP pattern especially the all-important I-Frame rate, look at the various synchronization timestamps, look at the quantization matrices, the macroblocks, the motion vectors; everything you look at here is one less thing you have to worry about further downstream. But note that only unencrypted video streams can be analyzed this way. That s why you want to do it here, before the Digital Rights Management (DRM) system kicks in and you can t see the compressed video information because it s scrambled. Will you need a Cray Computer to do all this techno-geeky analysis on hundreds of video streams simultaneously? Possibly, but if you can afford it, do it because you will prevent problems BEFORE your customer ever sees them. A few years ago I was working with Alcatel-Lucent in their Proof-of-Concept lab down in Plano, Texas that was completely devoted to IPTV. As part of the testing process we injected errors into the video stream. At one point we noticed that the errors we were injecting were visible much longer than they should have been. When we investigated, we found that the encoder was only sending an I-Frame once every 4 seconds because it was set to absolutely minimize the transmission bit rate! Of course, when we first asked, the video technician assured us that the encoder was sending a perfectly normal GOP pattern and he had no idea what we were talking about. Another time in the same test lab, we found that the encoder was severely over-compressing the quantization matrices, also in order to minimize transmission bit rate. Always monitor the quality of your original content as thoroughly as possible. Once the original content hits your network, most likely getting scrambled in the process if it wasn t already, the next opportunity that you have to evaluate the quality of your video is within your transmission network. And, here again, you want to monitor whatever video quality parameters you can, a topic which we will discuss at great length, at as many different locations throughout the network as possible. Doing so will greatly facilitate and speed fault isolation when issues arise. Having a single piece of network management software that all of your network monitoring points report in to is great if you can afford it but even manually correlating the results across several monitoring points is a relatively easy procedure with today s monitoring tools.
6 And the best thing about network-based monitoring of video-over-ip quality today is that it can be done using standard PCs and laptops with off-the-shelf network interface cards (NICs) rather than requiring expensive hardware-based probe solutions like it used to! These software-based solutions are typically one-third to one-quarter of the price of hardware-based solutions and that cost savings alone allows you to monitor more points and have greater confidence in your video quality before it gets to your customers. If you are monitoring as much as possible at the original content acquisition point and measuring the critical parameters across your transmission network, you should only have to verify once that the CPE you have specified for your customers works appropriately at system turn-up time and then never have another worry out at the customer premises yeah, right Seriously, having problems with CPE is expensive, especially in terms of operating expense. This is absolutely the last place that you want to have a problem happen. So, first do a thorough certification process where you certify that the CPE you have selected works not just to its own stated specifications but also within the specifications of your specific video-over-ip network. Then, monitor the video quality all of way across your transmission network so that you know absolutely, positively that the problem exists out at the customer premises before you ever roll a truck. Once your original content is encrypted, you are limited to only looking at the network packet information rather than at all of that great techno-geeky information that we talked about earlier. No I-Frames or quantization matrices here (at least not that we can see)! Fortunately, there is still quite a bit to look at and you definitely should. There are three major organizations that specify digital television broadcasting standards: the Digital Video Broadcasting (DVB) group in Europe, the Association of Radio Industries and Businesses (ARIB) in Japan, and the Advanced Television Systems Committee (ATSC) in North America. Even though each of these organizations has published its own independent standards for digital television broadcasting (of course!), all of the currently published standards share some very important fundamentals. From a network monitoring perspective, the single most important is that they all currently specify the use of a transport layer protocol called MPEG-2 Transport Stream (MPEG-2 TS). The MPEG-2 part of this name causes quite a bit of confusion so let me clarify things a bit. The MPEG-2 standard specifies BOTH a transport protocol (MPEG-2 TS) as well as a video compression algorithm (MPEG-2). The purpose of MPEG-2 TS is to multiplex many different types of digital information - think about how a video channel typically contains streaming compressed video, streaming compressed audio, an electronic program guide, streaming closed-captioning - over a single logical stream that includes synchronization information. This transport protocol is completely independent of the technique used to compress the video it carries. MPEG-2 TS is used all over the world today to transport MPEG-2 compressed video, MPEG-4 compressed video, H.264 compressed video, and VC-1 compressed video.
7 DVB published the first digital broadcasting specifications in the early to mid 1990s. It produced three main broadcasting standards, DVB-S for Satellite transmissions, DVB-C for Cable transmissions and DVB-T for Terrestrial transmissions. All three specify the use of MPEG-2 TS as the transport protocol. Japan initially used the DVB broadcast standards but, frustrated at how slowly the Europeans moved in updating the standards to make use of newer technology, decided in the late 1990s to publish its own. ARIB now also publishes three main broadcasting standards, ISDB-S, ISDB-C and ISDB-T. All three of these standards use MPEG-2 TS as the transport protocol, and have gained fairly wide acceptance in South America. And here in North America, the ATSC also decided that the DVB standards were not evolving fast enough to support HDTV and other technology leaps, and so published its own standard, ATSC/53 in And ATSC/53 specifies the use of MPEG-2 TS as the transport protocol. Are there any competitors to MPEG-2 TS? The only other protocol I have ever seen used to transport broadcast television streams is the Voice-over-IP transport protocol, RTP (Real-Time Protocol) and it was ALWAYS used along with, not in place of, MPEG-2 TS. We ll talk about why a service provider might want to do something like this in a few minutes. OK, so now I have hopefully convinced you that any video-over-ip network you have will be using MPEG-2 TS as a transport protocol regardless of what type of video compression the content providers are using. So let s take a closer look at MPEG2-TS and see what it can and can t do for us from a network monitoring perspective. The MPEG-2 TS specification defines two fundamental types of payload data: something called Elementary Stream (ES) data which is really just compressed video and audio streams, and something called Program Specific Information (PSI). PSI, which is called Service Information or SI in Europe, provides the information for Electronic Program Guides (EPGs) among other things. And this is where the DVB standards and the ARIB standards and the ATSC standards differ the most, at least as far as MPEG-2 TS is concerned. There is a very minimal, core set of PSI tables defined in the MPEG-2 TS standard that all of the major broadcast standards utilize but most of the EPG information is totally different across the different geographic regions. OK, so the MPEG-2 TS specification by itself is not going to help us test EPG information; what can it do?
8 Plenty, it turns out. MPEG-2 TS has some great features for network performance monitoring. For example, the MPEG-2 TS protocol headers are never allowed to be scrambled even when the rest of the content is. It has a packet counter so that dropped packets can be detected, it has a highly accurate 27 MHz clock so that jitter can be calculated, and it has core PSI tables that allow a decoder to de-multiplex the various individual content streams such as the video, audio, and closed-captioning even if the content itself is scrambled. DVB published the first technical report detailing how to test MPEG-2 TS through the European Telecommunications Standards Institute (ETSI). This specification, known as TR , is the granddaddy of all MPEG-2 TS-based network performance test specifications. Now, I have heard some people grouse about this specification being European and not relevant to North America. That is pure and unadulterated hogwash. TR specifies over 50 MPEG-2 TS measurements split into three different priorities depending on how drastically they affect the content stream. All of the First and Second Priority measurements, around 20 in total, apply to every MPEG-2 TS implementation regardless of the overarching broadcast standard. And about 10 of the 30 or so Third Priority measurements are also not related to DVBspecific functions. When I was working with that Alcatel-Lucent Proof-of-Concept lab in Plano, Alcatel- Lucent decided to write a white paper entitled IPTV Test and Measurement Best Practices. During the ensuing meetings to define exactly what those best practice measurements would be, the use of this specification was very hotly debated, with some of the test equipment vendors using the it s European and doesn t apply here argument. I, of course, argued that all 30 of the relevant TR measurements should be included in the best practices. And, while I cannot claim a complete victory, in the end, all of the First Priority and most of the Second Priority TR measurements were included. OK, besides TR , what other test specifications are out there that can also help you monitor the quality of video-over-ip as it transverses a transmission network? There were three additional specifications used as the basis for the Alcatel-Lucent white paper IPTV Test and Measurement Best Practices, all three of which were produced by the Internet Engineering Task Force (IETF). The first was RFC 3550, the specification for Real-Time Protocol (RTP). Although this transport protocol was originally designed for Voice-over-IP, it has applications in streaming video as well. Since Alcatel-Lucent was using both RTP and MPEG-2 TS as dual transport protocols in their Ecosystem, they were able to using testing methodologies aimed at both protocols. For testing purposes, RFC 3350 is important because it contains the original definitions for packet loss and inter-arrival jitter monitoring and reporting that are still widely accepted today. It also includes definitions for out-of-order packets as well as other nifty things that can be measured.
9 The second specification was RFC 3357, a short and yet fairly heavy tome entitled One-way Loss Pattern Sample Metrics. This statistical-analysis-heavy specification actually really does provide a great way to look at packet loss patterns but is only truly useful for video-over-ip if you are using RTP as a transport layer protocol because its heavy-duty statistics need a real sequence number counter such the 16-bit counter provided by RTP rather than the fairly cheesy 4-bit continuity counter provided by MPEG-2 TS. Once again, since Alcatel-Lucent was using RTP, this choice made a lot of sense. And, the final specification was RFC 4445, more commonly referred to as Media Delivery Index or just simply MDI. MDI does not rely on the use of RTP as the first two RFCs did so it can be used in purely MPEG-2 TS video-over-ip networks but it does come with its own limitations. The most important limitation is that it is only truly useful for constant bit rate (CBR) video streams. Let me repeat that: MDI is only truly useful for constant bit rate video streams. Before we dive into that potential quagmire, let s briefly review what MDI measures and why. MDI measures two components: something called the delay factor (DF) and something called the media loss rate (MLR). Let s start with the media loss rate because it is the simpler of the two. The media loss rate is packet loss, plain and simple. Since TR already has us measuring packet loss, MDI:MLR buys us no new information. Delay factor is a much more interesting measurement. TR has implementers measuring something called PCR Jitter but this is a fairly expensive measurement to make, involving quite a bit of heavy math. This was one of the TR measurements that got dropped from the Alcatel-Lucent white paper, and for that very reason. To calculate it, you have to reach into the MPEG-2 TS packets and grab that highly-accurate, 27 MHz program reference (PCR) clock and do quite a bit of 64-bit math with it. As I said, it s computationally expensive. For constant bit rate streams, MDI:DF allows an implementer to very quickly and easily make a jitter measurement without all of those expensive calculations, making it a measurement that easily scales for hundreds and hundreds of streams simultaneously. As a matter of fact, that was one of the main purposes of the MDI specification to begin with. To quote from the RFC itself: The MDI is instead intended to specifically address the need for a scalable, economical-to-compute metric that characterizes network impairments
10 And I believe that MDI:DF does exactly that for CBR video streams. Since I was obviously not hesitant to repeat my opinion about MDI s weaknesses earlier, let me repeat this statement as well: MDI:DF is a wonderfully simplified jitter measurement for CBR video streams. Now the flip side of that statement is: it is not even in the ball park for VBR video streams. RFC 4445 says that MDI:DF can be calculated for VBR streams, but adds the variable bit rate case may be somewhat more difficult to calculate. The authors do not specify in the RFC how an implementer might make this difficult calculation, but one of the authors later, literally three years later, wrote a white paper in which he proposed that to calculate MDI:DF for VBR streams, an implementer could just reach into the MPEG-2 TS packets and grab that highly-accurate, 27 MHz PCR clock and do quite a bit of 64-bit math with it! Hmmm that sounds an awful lot like exactly the same work required to calculate jitter directly to me! And it is. So, once again, for VBR streams, MDI:DF buys us absolutely no advantage over the original, and more widely accepted, TR measurements. Even the Alcatel-Lucent white paper only specified MDI measurements for CBR streams. Another test standard available for those of us here in North America is the ATSC Recommended Practice: Transport Stream Verification, more commonly called ATSC A/78. This standard explicitly builds on TR , changing some of the thresholds specified for the European market and adding new requirements for testing the ATSCspecific PSIP. This is where you find out how to test your EPG and everything else that is North America-specific (think FCC-mandated stuff here). OK, so now we ve looked at a number of testing specifications for MPEG-2 TS and RTP, and even one for ATSC-specific PSI/EPG. But what should you REALLY measure and why? The first very important thing to realize is that even if your ultimate goal is to measure your ATSC-specific PSI/EPG, you MUST start with MPEG-2 TS because all PSI, be it DVB- or ARIB- or ATSC-specified, regardless of whether it is called PSI or SI or PSIP, is carried over MPEG-2 TS. And since RTP is only ever an optional protocol (and not very common at that), the only logical place to start for transmission network monitoring is ALWAYS with MPEG-2 TS. So, what do we want to measure at the MPEG-2 TS layer and why do we want to measure it? In short, we want to measure four key things: packet loss, video and audio stream dropouts, PSI table rates, and packet inter-arrival jitter. Let s look at each of these in a little more detail.
11 Packet loss is the single most common error that I see on video-over-ip networks. And I am not only talking about losing Ethernet packets, either. I was at SBC in Atlanta several years ago and noticed that my software was reporting lost MPEG-2 TS packets but no lost Ethernet packets. How was this possible? It turned out that SBC was taking an incoming original content stream and using some home-grown software to automatically generate a lower-resolution picture-in-picture video stream, and that software had some rather serious issues. Although, apparently, no one watches the quality of their picture-in-picture video very carefully! I ll show what packet loss looks like to your customers in just a moment. Obviously, video and audio stream dropouts are totally unacceptable unless your company happens to be named something that rhymes with Bomcast. Seriously, I was watching Sunday Night Football on opening weekend and barely got to see Jay Cutler throw all four of his interceptions because the NBC HD channel kept dropping out (and, by the way, the stupid Chicago Bears will live to regret THAT trade - although so will the Broncos, because, really, Kyle Orton?!?) PSI table rates are probably the most frequently overlooked MPEG-2 TS measurement area that I see. No one understands them! Listen carefully: since PSI tables provide the basic channel demuxing and EPG information, your customers can not change to a new channel until the encoder sends the new channel s PSI tables. So, although the connection may not be apparent at first glance, the PSI table rates directly impact a customer s channel change (or zap ) time. We will discuss channel zap time in quite a bit more detail in just a minute. Packet inter-arrival jitter is important because it impacts the buffering requirements for all downstream network and video devices, and extreme jitter can lead to anything from lip-sync problems to the loss of packets because of buffer overflow or underflow. I was at Oneida Telephone in beautiful downtown Oneida, Illinois when a jitter alarm kept going off. We decided to investigate and discovered that a single channel was experiencing very high jitter, well over 20 milliseconds. Now, some video devices will start experiencing problems with jitter as low as 10 milliseconds and pretty much all video devices will have problems with 20 milliseconds of jitter. When we looked at exactly which channel was experiencing the problem, the network engineer realized that it was QVC and that he had been getting customer complaints about intermittent video quality issues on QVC for several weeks.
12 Here is what the loss of single MPEG-2 TS video stream packet looks like to your customers. Think back to what we learned about I-Frames (complete images), P- Frames (small, partial, predicted frames) and B-Frames (very small, very partial, bidirectional frames). When the lost packet is part of a B-Frame, the error is barely visible on the back of the player s jersey and is quickly corrected by the very next P- Frame. When the lost packet is part of an I-Frame, not only is the error more visible but it will remain visible until the next I-Frame is received and the error is finally corrected. And that is how sensitive some video compression techniques, H.264 in this example, are to packet loss.
13 Now this is not exactly the screen I saw on Sunday Night Football but it s close. My screen would switch from the game, usually just as Jay Cutler was dropping back to pass, to a black screen that said Channel Not Available and then back to the game, usually as one of the Green Bay Packers was running the interception down the field with lots of Bears chasing him. The interesting thing about video dropouts is how much they depend on primarily just two things: the video compression technique being used and how much the video is being buffered at the CPE. Which is exactly why they actually occur more frequently on sports channels than on any other type of channel: people love to watch their sports on big, fat HD channels that require a steady influx of lots and lots of packets and sporting events are typically shown with the minimal amount of buffering to prevent the situation where people can hear a play on the radio a full five seconds before they can see in on the television. Well, that, or maybe the real sports nuts just complain much more loudly and often if their premium sport package channel drops out.
14 When your beer-drinking, football-watching, sports-nut customer is channel surfing between games, here is what happens each time he changes the channel on a videoover-ip network: the set top box sends a channel change request to the network (an IGMP Leave request followed by a IGMP Join request for those of you who care), the network then has to find (via some more IGMP Join requests) and forward the new channel to the set top box, and then the set top box has to receive, in this order, all of the core PSI tables and then a complete I-Frame; all of this has to happen before the customer ever sees the new channel on his television screen. No wonder it takes forever to change channels these days! And notice that a new protocol has been thrown into the mix as well, the Internet Group Management Protocol, IGMP. So we have three components that contribute to your customers channel zap time experience: the set top box via its IGMP messages, the transmission network via its IGMP messages and how far away the content source is network-wise, and the content provider s encoder that is responsible for delivering the PSI tables and the I-Frames at reasonable intervals; set top box, network, encoder. I don t want to get on another soap box here, but I have seen a number of test equipment vendors claiming to measure zap time that are only looking at the time between the IGMP Join and the arrival of the very first packet on the requested MPEG-2 TS. The problem with such a measurement is that it just does not reflect what your customer is going to experience.
15 And here is one more wrinkle to channel zap times: AT&T (I think) did a study a number of years ago that showed that customers will accept a longer zap time IF it is consistent. So, although most vendors are striving for zap times of less than 500 milliseconds because those are perceived as being instantaneous by most people - customers will generally not complain about a one or even a two second zap time as long as it is always the same! Apparently, what drives people completely bonkers about channel change times is when the zap time is a seemingly random amount of time: 1 second, then 5 seconds, then instantaneous, then 4 seconds. So, when you are looking at zap time issues you want to make sure that you look at zap time variability.
16 We have talked about two very different types of jitter today, and I ll bet that you never even noticed! TR specifies that implementers will measure something called PCR Jitter and RFC 3550 specifies that RTP implementers will measure something called Packet Inter-Arrival Jitter. What is the difference and why should you care? Encoding and decoding a digital video stream requires not one but two of those highly accurate, 27 MHz clocks: one at the encoder and another at the decoder. PCR jitter, whose official definition is provided in TR , is all about measuring the differences between these two separate 27 MHz clocks. It has three components: the clock frequency offset (because each 27 MHz clock is running at some speed that is not quite exactly 27 MHz), the low-frequency clock drift (that naturally occurs in all clock oscillators over time), and high-frequency clock jitter (due to, among other things, the network-induced inter-arrival jitter). TR says that the overall PCR jitter - the sum of the three components just mentioned - will be measured in nanoseconds and will not exceed 500 nanoseconds. Now, I hope that at least some of you are sitting there thinking, 500 nanoseconds of overall jitter?!? Are you kidding me?!?! What REAL network has jitter that low?!? And the answer to that question is: Absolutely, positively, none. So what is going on here? And the short answer to THAT question is: Statistics. Statistically, the mean value of network-induced inter-arrival jitter is always 0 when measured over a long enough period of time. So, statistically, PCR jitter, if measured over a long enough period of time, does not include any inter-arrival jitter, and only measures the clock inaccuracies. Voila! A PCR jitter measurement that can be specified and measured in nanoseconds! Statistics are like bikinis: what they reveal is suggestive but what they conceal is vital. Aaron Levenstein Torture numbers, and they ll confess to anything. Gregg Easterbrook