VoIP Over the Internet: Is Toll Quality Achievable? Mansour Karam, Technical Lead SCV Communications Society May 12, 2004 1
Agenda Introduction VoIP versus VoIP over the Internet Challenges for VoIP over the Internet Technological advances that address today s challenges An overview of adaptive networking technology Case studies Agenda 2
Introduction: Migration to VoIP is compelling As VoIP vendors will tell you, migration to a converged voice/data network is compelling for many reasons More effective communications Reduction in CapEx and OpEx Enhanced flexibility and resiliency (in theory) Some pointers http://telecomreseller.com/avayaextra/ http://www.nwfusion.com/columnists/2002/0916taylor.html http://www1.avaya.com/enterprise/news/docs/lp/ccs.html?c=sip&n =SIP_AvCom_ThoughtLdrship&t=internal http://www.cisco.com/warp/public/cc/so/neso/vvda/iptl/msipt_bc.pdf Hence the spotlight on VoIP Introduction 3
Introduction: VoIP versus VoIP over the Internet Today s VoIP deployments work great over local area networks Today s VoIP deployments work reasonably well over dedicated (expensive) private wide area networks Frame relay, ATM, leased lines However, VoIP deployments do not work so well over the Internet, because: Internet is a best effort infrastructure Internet infrastructure shared across a large number of competing applications with widely different characteristics Highly demanding applications (such as Voice) experience quality and availability problems Regardless of WAN fabric, availability is orders of magnitude away from 99.999% or 5 nines Introduction 4
Introduction: adaptive networking Enterprises generally engineer into their network some level of redundancy In particular, more than one path is commonly available Adaptive networking leverages inherent redundancy, through: Monitoring of available paths Assessment of paths according to a set of criteria Dynamic route adjustments to steer traffic through the path that makes the best business sense at any given time Adaptive networking is Air Traffic Control for your WAN Introduction 5
What is the public Internet? Collection of networks that have little incentive to work together Internet VoIP vs VoIP on the Internet 6
What is it like to use the public Internet? Switches Routers Firewalls VPN gateways Load balancers Packet shapers Maintenance windows Software anomalies Feature set compatibilities Technology in constant flex Internet VoIP vs VoIP on the Internet 7
Data collected between RouteScience POP locations, June 2001 EWR AND SJC THR ASH Source: A. Markopoulou, F. Tobagi, M. Karam, "Assessing the quality of Voice Communications over Internet Backbones", IEEE Transactions on Networking, October 2003 VoIP on the Internet data 8
Delay Propagation delay: Delay and loss characteristics East coast: 3.25-11.8 ms Coast-Colorado: 28.3-77.8 ms Coast-to-coast: 31.3-47.2 ms Delay variability: Loss Pattern: mainly spikes During the day Delay in ms Mainly outages reliability problems Happen at least once per day for 6 out of 7 providers Usually preceding changes in the propagation delay Time Source: A. Markopoulou, F. Tobagi, M. Karam, "Assessing the quality of Voice Communications over Internet Backbones", IEEE Transactions on Networking, October 2003 VoIP on the Internet data 9
Converting delay and loss into MOS Speech Transmission Quality (user satisfaction.) Mean Opinion Score (MOS) Desirable Acceptable Best (very satisfied) High (satisfied) Medium (some users dissatisfied) Low (many users dissatisfied) Poor (nearly all dissatisfied) 4.5 4.3 4.0 3.6 3.1 2.6 Not recommended Reference: ITU-T G.107/Annex B 1 VoIP on the Internet data 10
Results from joint Stanford/Routescience study Percentage of calls with MOS x Worst MOS for call Threshold of acceptable quality MOS at the end of call Availability 97% 63% of the calls exhibit a worst period having unacceptable quality 3% of the calls have unacceptable quality Source: A. Markopoulou, F. Tobagi, M. Karam, "Assessing the quality of Voice Communications over Internet Backbones", IEEE Transactions on Networking, October 2003 VoIP on the Internet data 11
Results from joint Stanford/Routescience study Percentage of calls with MOS x 100 10 Worst MOS for call Threshold of acceptable quality MOS at the end of call 12% of the calls exhibit a worst period having unacceptable quality 2% of the calls have unacceptable quality Availability 1 1 2 3 4 98% MOS Source: A. Markopoulou, F. Tobagi, M. Karam, "Assessing the quality of Voice Communications over Internet Backbones", IEEE Transactions on Networking, October 2003 VoIP on the Internet data 12
Results from joint Stanford/Routescience study Backbones networks are over-provisioned and thus expected not to be the bottleneck on the path of a flow. Although this might be the case for data traffic, this is not always the case for VoIP traffic. We observed poor VoIP performance on a large number of ISP backbone networks under favorable end-system configurations Source: A. Markopoulou, F. Tobagi, M. Karam, "Assessing the quality of Voice Communications over Internet Backbones", IEEE Transactions on Networking, October 2003 VoIP on the Internet data 13
Enterprise customer case study, January 2003 Bad minutes per month 345 bad minutes Brownouts: 254 minutes (74%) Blackouts: 91 minutes (26%) 26 bad seconds Internet: 99.2% availability PSTN norm: 99.999% availability VoIP on the Internet data 14
VoIP over a private network Major VoIP equipment vendors recommend the use of private networks: Owned/leased networks Frame Relay ATM Private Networks 15
Frame Relay case study: Online financial services firm VoIP over Frame Relay Headquarters OC-3 ISP 1 Internet ISP 1 Eastern Data Center OC-3 IP/PBX Headquarters OC-3 DS-3 ISP 2 ISP 2 Frame Relay DS3 OC-3 DS-3 IP/PBX End-to-end measurements collected for 11 days Private Networks 16
Example performance problems over Frame Relay and Internet Delay spike High packet loss Link failure Configuration Frame Relay Bad Minutes 5.4 Reliability (%) 99.966 Internet 126.2 99.203 The resulting end-to-end system still does not deliver Toll Quality voice Private Networks 17
Cost of private network Private links are costly Private link costs increases with distance Inter-continental private link costs are very high Private Networks 18
Technological landscape Technology category Technology function Example companies Adaptive networking Sidestep brownouts RouteScience QoS Packet sequencing Packet shaping Compression MPLS Buffer Management schemes Other Scheduling techniques Metering lights Stuffing more in a packet Traffic Engineering Filtering Router vendors Sitara Cetacean Packeteer Peribit Router vendors Caspian Router vendors Technological Advances 19
QoS example: DiffServ combined with sophisticated queue scheduling DiffServ ToS marking allows traffic to be categorized as Voice or Data. Scheduling gives voice traffic priority access to the resources using your favorite scheduling technique Voice Data traffic Scheduler Priority Queuing WRR Technological Advances 20
Challenges of traditional QoS QoS in IP networks largely remains an elusive goal, even though over-provisioning not economically viable in the long run QoS challenges: Strict prioritization degrades under load Architecture scales, but QoS doesn t How is provisioning done for various, different QoS requirements Hard translation from delay, jitter, loss requirements to classes of service Requires cooperation across different ISP backbones All the above challenges impede the actual implementation Technological Advances 21
Challenges of traditional QoS ( ) The crucial issue is that we are trying to get deterministic performance for multiple classes of traffic. If the QoS story were just best effort and one premium class, then queuing mechanisms of today work ( ) With multiple realtime classes voice, video, machine-to-machine, telemetry and others you will degrade back to best effort. It s just too much to do the packet-by-packet routing and the queuing calculations and figure out where and in which queue to stick each packet. Forget it! It s too complicated ( ) Peter Sevcik, Business Communications Review, September 2003 Technological Advances 22
Alternative: packet sequencing In effect, circuit switching, wherein different types of circuits can be created How does it work? Creates itineraries for various flows admitted to the network Insures that the collection of itineraries meet a given schedule A schedule is a collection of appointments, wherein an appointment consists of the deadline by which a packet of a given size is to be processed by a switch Relies on admission control: Call only accepted if an itinerary that satisfies each router s schedule is found Technological Advances 23
Packet sequencing Performance credentials Sub 2-second call setup times Has been tested to provide 99.999% (5 nines) in the lab Challenges: For technique to be effective, requires all network elements in the path to be capable of packet sequencing Very expensive to deploy Difficult to scale Technological Advances 24
Applications succeed only when ALL of the infrastructure works Business objectives are dynamic New applications, new partners, new policies Applications, and users, are getting more demanding VoIP, video conferencing Applications are being stretched over longer distances The wide area network is the key point of vulnerability How do you avoid problems in fabric you don t own? Data Centers Network Infrastructure Brownouts are sudden, and require instant response How do you spot a brownout? Users Adaptive networking 25
Virtualized infrastructure Like the servers and the storage devices, the key is Virtualization Redundancy with intelligent oversight Redundancy should also apply in the WAN: Multiple paths A combination of private and public links, architected as appropriate Define policy for availability requirements Manage the performance cost tradeoff Monitor / assess / adjust / notify Adaptive networking 26
The key to adaptive networking Good/Bad Quality Metric Application Delay Transport Delay Bad means an application quality problem caused by the network Star ratings comparable across apps Delay for a typical app transaction Transport layer impact of low level scores Raw Latency, Loss, Jitter, etc Low level measures Individual network-level metrics do not determine absolute quality What is good for one application type may be bad for another Adaptive networking 27
Closing the loop with automated repair Application Quality Metrics Application needs User location User importance WAN status RouteScience Assess alternate paths Actively control network to: Sidestep brownouts Increase app performance Reduce costs React quickly Maintain stability Validate effectiveness Adaptive networking 28
Managing the application through the fabric 29
What adaptive networking can do for VoIP Add a 9 to VoIP availability Eliminate 90% or more of bad minutes Eliminate network upgrades Provide WAN visibility Improve quality of 1-800 services to India 30
Case study: ISP problem during business hours MOS Threshold of acceptable VoIP quality 7pm Midnight 5am 10am 3pm EST All ISPs suffer unpredictable performance problems No single ISP can deliver sufficient quality for VoIP, 24x7x365 On Net is not always best Case study 31
Adaptive networking delivers sustainable voice quality MOS Threshold of acceptable VoIP quality 7pm Midnight 5am 10am 3pm EST 7pm Midnight 5am 10am 3pm EST Improvement MOS Threshold of acceptable VoIP quality 7pm Midnight 5am 10am 3pm EST Case study 32
Case study: Online financial services firm Headquarters Eastern Data Center OC-3 ISP 1 Internet ISP 1 OC-3 IP/PBX Headquarters OC-3 DS-3 ISP 2 ISP 2 Frame Relay DS3 OC-3 DS-3 IP/PBX End-to-end measurements collected for 11 days Case study 33
Sample hour Adaptive networking-induced route changes: From green to cyan From cyan back to green Link failure Packet loss Performance problem Quality threshold 5 bad minutes for default routing 0.2 bad seconds for Optimized path Case study 34
Example of delay spike Delay spike RTT (ms) Time (hour of day) 35
Example of delay fluctuations Adaptive networking - induced route change from green to cyan RTT (ms) Adaptive networking - induced route change from cyan back to green Time (hour of day) 36
Impact on VoIP availability Configuration Bad Minutes Reliability (%) Frame Relay 5.4 99.966 Internet 126.2 99.203 Optimized Internet 14.7 99.907 Frame Relay Internet RS optimized Internet RS over Internet + Frame Relay Optimized path over Internet + Frame Relay 0.4 99.997 Study comparing suitability of private, public and hybrid options Sidestep brownouts on in-flight calls Deliver a 10-fold increase in availability Reduce bad minutes up to 90% Case study 37
General observations All networks have quality failures Delay spikes Packet loss Link failures Delay fluctuations Large jitter due to layer-2 round-robin Congestion effects due to worms Case study 38
General observations Problems rarely occur in all networks at once Bandwidth is clearly not the problem Performance problems result in multi-minute application outages affecting, for minutes, inter-pbx calls Adaptive networking effectively avoids performance problems by implementing route changes Case study 39
Conclusion VoIP availability 99.999% 99.99% 99.9% 99% Toll quality Adaptive networking Underlying network Availability gap 90% 2001 2003 2005 2007 Adaptive networking fills the availability gap, allowing Toll Quality VoIP over the Internet to become a reality now Conclusion 40