H-QoS & µ-shaping Bandwidth Performance Optimization

White paper H-QoS & µ-shaping H-QoS & µ-shaping Bandwidth Performance Optimization With the rise of cloud computing, mobile small cell deployments and prioritized content delivery networks for web applications comes increased interconnect of performancesensitive services over large-scale service provider networks. Data center connectivity is one example, where application, database, storage and desktop hosting make low-latency, high-throughput connections between enterprises and the data center a must. Mobile backhaul is another, with increased end-points moving closer to customers with small cell deployments, Alternate Access Vendors (AAVs) providing last-mile connectivity to mobile operators are increasingly using a wide range of broadband technology where bandwidth is more constrained, and performance is not as easily assured without bandwidth optimization methods employed. Providers offering off-net services have long understood the value of traffic conditioning, shaping and flow prioritization as they strive to deliver the best possible quality of service (QoS) to end-customers over wholesale infrastructure. In these applications the goal is to make best use of finite bandwidth, optimizing latency and availability to meet service level agreements (SLAs) that often include multiple service tiers. With Hierarchical QoS (H-QoS) and high-performance traffic shaping assuring off-net services, these same operators are now deploying this technology on-net as a service differentiator. First-movers in this area have grown significant market share in the business services market by shaping customer traffic into their pipes. H-QoS optimizes the interface between their access links and the customer network to considerably outperform competitive offerings. Likewise, enterprises who take shaping into their own hands can achieve the same benefits without relying on the operator to understand their own application and performance priorities, allowing them to use any provider for connectivity with optimal results that reflect their IT and business objectives. May 2014

H-QoS Traffic Conditioning Hierarchical bandwidth policing (or regulation), combined with advanced µ-shaping techniques, establish and enforce per-flow QoS at the service edge. Typically employed in the uplink direction on last-mile connections, this same approach can be applied bi-directionally at network-to-network interfaces (NNI) - anywhere a change in link bandwidth is experienced and traffic needs to be right-sized into a more constricted service. Simple Bandwidth Policing: Crushing the Edge Crushing the Edge: Policing Metro Ethernet Forum (MEF) Carrier Ethernet services must conform to a bandwidth profile with a Committed Information Rate (CIR, guaranteed bandwidth), and in some cases an Excess Information Rate (EIR, best effort bandwidth). These bandwidth envelopes are normally policed at the service edge using regulators: any traffic exceeding these predetermined thresholds is dropped, resulting in random packet discard that has no preference to low or high priority traffic. This crush the edge technique is effective in preventing bursts of client traffic from entering the providers network, and is easy to implement. However, this technique has serious repercussions on the client traffic, especially if customer traffic is not being prioritized into the provisioned bandwidth profile. Any mismatch between this mapping process results in excessive packet loss, accompanied by increased retransmission, latency and most importantly, inability to fill the pipe. Utilization can be constricted to 20% of usable bandwidth in many cases, and we ll explore why.

Hierarchical Bandwidth Optimization Hierarchical QoS, as specified in the MEF 10.3 standard, is a traffic conditioning method which respects a prioritized flow s right of way, while allowing lower-priority flows to effectively use leftover bandwidth from higher-priority flows to increase overall multi-service performance. Hierarchical Bandwidth Policing (H-BWP) is the evolution of crushing the edge ; ensuring highest priority packets transmission is assured by tiered scheduling. As a policing mechanism, this process is conducted at wire-speed without store-and-forward queuing delays. Ultra-fast µ-shaping can be applied along with H-BWP to maximize link utilization and greatly reduce packet discard without adding delay to latency-sensitive flows. By queuing and scheduling lower-priority flows into unused bandwidth with packet-bypacket granularity, service flows can approach 100% utilization of available capacity, smoothing out bursty traffic and ensuring faster end-to-end packet delivery. This cost-effective, single-ended optimization method is the most efficient approach to bandwidth performance optimization - no complex configuration is required, and flow prioritization can be easily tuned to a particular client s service mix. µ-shaping in Action The results of H-QoS and µ-shaping are dramatic. The mismatch customers often experience between provisioned bandwidth and speed test results can be eliminated with properly implemented H-QoS and µ-shaping at the service edge. Tests with and without µ-shaping on Internet connections of 15 and 30 Mbps show a startling difference. µ-shaped up-link traffic reaches full link capacity, while unconditioned traffic uses only a fraction of the available bandwidth. As we will see, the main reason for this is the nature of TCP transmission, and its relation to traffic bursts and resulting packet loss.

One variable that operators can adjust on their provider equipment (PE) is the Committed Burst Size (CBS) - the amount of instantaneous traffic beyond the CIR that the network element accepts before discarding packets over sub-millisecond scheduling windows. Typically CBS is set at the lowest value possible (the default for most network elements), protecting the provider network from traffic bursts. Tuning this parameter upward can increase throughput significantly, but is undesirable for two reasons: (1) allowing bursts into the provider network impacts overall aggregation and core network performance, affecting other customers traffic over shared infrastructure, and (2), this technique pushes packet loss deeper into the network, where retransmission is more expensive, resulting in longer delays and wasted provider-network bandwidth. The more network elements there are along a service transmission path, the less effective increasing CBS will be, as the lowest CBS value of any element the traffic encounters will be the limiting end-to-end determinant of whether the burst survives. Allowing traffic with CBS of 512 Kbps is ineffective if the next network element allows 64 kbps. Note that results shown in these graphs are those reported by Speedtest.net. Test accuracy is somewhat limited, which is why, in some cases, the reported bandwidth, actually exceeds the CIR of the Internet connection. Despite these limitations, this test is often what customers run to verify their service performance, and the test is a repeatable, relative performance gauge that reflects the true state of the network and service configuration.

Quantifying the Benefits of µ-shaping In controlled tests with precise and accurate instruments, µ-shaping s effect on bandwidth performance optimization is even more dramatic. An improvement of up to 800% can be gained when applied to TCP traffic flows - accounting for over 98% of Internet traffic since 2002. TCP traffic will further increase over UDP in the coming years, as the most bandwidth-consuming over the top (OTT) media applications turn to TCP to avoid detection by firewalls and traffic policy enforcement devices. As an example, Skype, YouTube, Apple TV, Hulu, email, peer-topeer file transfer and web browsing all transmit using the TCP protocol. UDP is predominantly used for VoIP and IPTV transmission over provider networks, where prioritization is controlled to ensure lowest possible packet loss with the benefit of the lower average latency UDP provides. Source: DongJin Lee, Brian E. Carpenter, Nevil Brownlee, 2011

Why the Disconnect? When a provider turns-up a service, standards-based Service Activation Testing (SAT) using the RFC-2544 or ITU Y.1564 standards is normally employed to validate configuration and performance of the service, and to provide a QoS baseline to the customer as proof of compliance with any agreed upon SLA. How is it that immediately thereafter, a client can experience such a significantly lower throughput than what was demonstrated at turn-up? How is it that immediately thereafter, a client can experience such a significantly lower throughput than what was demonstrated at turn-up? The answer lies in the nature of testing vs. actual customer traffic. The goal of turn-up testing is to validate that CIR, EIR, packet loss, delay variation and latency comply with performance objectives. The service is filled with UDP traffic, as UDP can be launched reliably at full line rate without TCP retransmission requests slowing down flows resulting from packet loss that may occur during the test. UDP doesn t care if packets are discarded, so tests can be conducted reliably and with high repeatability. But customer traffic is predominantly transmitted using TCP. The way clients negotiate their willingness to transmit and receive TCP packets is determined by the degree of packet loss in a particular session. The TCP protocol requires that every frame is accounted for, with a receipt acknowledgement required to confirm transmission success. However, if the sender waited for each individual packet to be acknowledged before the next packet was sent, throughput would be greatly impacted, especially over large area connections. TCP Windowing TCP handles this problem with transmission windows - a collection of frames sent together with the expectation that they will all arrive without loss. The size of TCP transmission windows sent adapts to the success of previous windows. If a packet is lost in a window, all packets after the lost packet are retransmitted, and the window size is reduced by roughly half. When windows are successfully received, the window length slowly increases at first, then more rapidly with continued error-free transmission. If packets are regularly lost, the window length will never increase to the size required to achieve full link utilization. The mismatch between port (media) speed and the CIR of a link ensure that this issue is ubiquitous. If a CPE connects to an access link at 1 Gbps, but the CIR of the link is limited to 200 Mbps, bursts of traffic beyond the policed 200 Mbps will result in packet loss, TCP window reduction, and greatly impacted throughput. Standard traffic shaping is unable to effectively smooth out these bursts, as many occur at a

millisecond time-scale (micro-bursts), and the granularity of most shapers is not sufficient to process traffic at this speed. µ-shaping - optimizing bandwidth on a perpacket basis - is able to effectively groom micro-bursts into the CIR in a lossless manner. Bandwidth Performance, Optimized Accedian s H-QoS and µ-shaping technology is recognized as the best available by leading Tier-1 operators. Implemented in a variety of Accedian network performance elements, four main technologies are combined to achieve this unrivalled performance: priority packet bypass, the BLUE queue management algorithm, H-QoS implementation, and faster-than-packet processing granularity. Priority Bypass With instant traffic classification, priority flows bypass shaper queues and are immediately transmitted. The effect is that the most latency sensitive flows are handled as though no shaping was implemented. Most network elements performing shaping require all traffic to be buffered long enough to be inspected, which adds a commonly latency to all flows, regardless of priority (store-and-forward technique). The BLUE Algorithm Developed by IBM in 1999, the BLUE queue management algorithm greatly reduces queue length and resulting latency when compared to standard Random Early Detection (RED) methods used by the majority network element shapers. By using statistical metrics to throttle upstream flows in a way that maximizes window length and reduces packet loss before traffic arrives at the flow classifier (http://en.wikipedia.org/wiki/blue_(queue_management_algorithm)) Packet Processing Granularity The Accedian flow performance assurance (FPA) processor operates at a 1 ppm (part per million) packet processing rate. On a GbE full-line rate flow, this means that the processor is running 5x faster than the rate at which packets are received. This allows each packet to be handled individually, resulting in the most granular smoothing available, operating at the µs level.

This processing speed is 1,000x faster than millisecond-length micro-bursts, allowing lower-priority packets to be precisely interleaved into flows where instantaneous capacity is not fully used by higher-priority streams. The result is best-possible bandwidth capacity utilization (fill) without the packet discard associated with more lumpy, coarse shaping techniques. In addition to packet-handling granularity, Accedian elements offer granular traffic classification into as many as 18 queues per-port. This ensures that even the most complex multi-service client traffic can be precisely optimized. H-QoS Implementation When the MEF 10.3 specification for hierarchical QoS processing is implemented, a service bandwidth envelop is shared between all flow priorities. CIR is consumed hierarchically - any higher-priority flows unused CIR is passed to the next lower priority flow, and so on, until all flows have maximized the use of the total service CIR. Any remaining CIR in the envelop is added to the available EIR, and the same process is repeated. Compare this to the standard method of regulating each flow in isolation to ensure a CIR is not exceeded: for example, policing two flows to 20 Mbps to ensure a CIR of 40 Mbps is respected results in unused bandwidth that could have been shared. Bandwidth Performance Optimization: The Impact Compared to WAN-optimization techniques that require expensive appliances at each service end-point, or are subject to performance variation if virtualized, purpose built, affordable, programmable elements can optimize bandwidth performance without variation or setup complexity. Properly implemented H-QoS and µ-shaping can significantly improve bandwidth performance in a wide variety of applications over regional, national and international networks. Bandwidth performance optimization has the most impact where bandwidth is expensive, or capacity cannot be easily increased, and where uplink performance is critical to application responsiveness or overall QoS. Services affected by

retransmission delays, with bursty traffic, or where there is a mix of traffic priorities competing for limited bandwidth fall into this category. Examples include off-net service optimization, mobile backhaul where control plane traffic and inter-cell synchronization must be maintained under heavy traffic loads, financial networks where algorithmic trading often results in micro-bursts, and data center connectivity where greatly varying TCP traffic utilization over limited bandwidth connections affects latency and usability when compared to on-site servers. Bandwidth performance optimization benefits the provider as well as the client, with smoother traffic entering the operator s network, and full purchased-capacity delivered to the customer. When implemented properly, it s a win-win situation with clear results everyone can easily see in the resulting service performance. 2014 Accedian Networks Inc. All rights reserved. Accedian Networks, the Accedian Networks logo, SkyLIGHT, Plug & Go, AntMODULE, Vision EMS, Vision Suite, VisionMETRIX, V-NID, R-FLO, Network State+, Traffic-Meter & FlowMETER are trademarks or registered trademarks of Accedian Networks Inc. All other company and product names may be trademarks of their respective companies. Accedian Networks may, from time to time, make changes to the products or specifications contained herein without notice. Some certifications may be pending final approval, please contact Accedian Networks for current certifications.