Benchmarking the Session Initiation Protocol (SIP)



Similar documents
SIP: Ringing Timer Support for INVITE Client Transaction

SIP: Ringing Timer Support for INVITE Client Transaction

SIP Infrastructure Performance Testing

Update in Methodology of SIP Performance Testing and its Application

SIP : Session Initiation Protocol

Voice over IP Fundamentals

Session Initiation Protocol and Services

SIP OVER NAT. Pavel Segeč. University of Žilina, Faculty of Management Science and Informatics, Slovak Republic

Improving Quality in Voice Over Internet Protocol (VOIP) on Mobile Devices in Pervasive Environment

SIP Essentials Training

Performance evaluation of the Asterisk PBX

SIP Registration Stress Test

Troubleshooting Voice Over IP with WireShark

Oracle s Acme Packet Net-Net 6300 Performance Testing Report

Analysis of SIP Traffic Behavior with NetFlow-based Statistical Information

ANALYSIS OF LONG DISTANCE 3-WAY CONFERENCE CALLING WITH VOIP

2.2 SIP-based Load Balancing. 3 SIP Load Balancing. 3.1 Proposed Load Balancing Solution. 2 Background Research. 2.1 HTTP-based Load Balancing

Internet Engineering Task Force (IETF) Request for Comments: 7092 Category: Informational ISSN: December 2013

SIP Forum Fax Over IP Task Group Problem Statement Version 1.0

Methodology for SIP Infrastructure Performance Testing

Design of a SIP Outbound Edge Proxy (EPSIP)

Formación en Tecnologías Avanzadas

A Scalable Multi-Server Cluster VoIP System

Chapter 2 PSTN and VoIP Services Context

SIP Trunking. Service Guide. Learn More: Call us at

Network Convergence and the NAT/Firewall Problems

NAT TCP SIP ALG Support

Abstract. Avaya Solution & Interoperability Test Lab

INdigital ESinet SIP interconnection

White paper. SIP An introduction

TSIN02 - Internetworking

Implementing SIP and H.323 Signalling as Web Services

Session Initiation Protocol (SIP) The Emerging System in IP Telephony

AT&T SIP Trunk Compatibility Testing for Asterisk

Efficient SIP-Specific Event Notification

4-4 Approach of VoIP/SIP Interoperability Task Force

Interoperability Test Plan for International Voice services (Release 6) May 2014

Asterisk PBX Capacity Evaluation

Unified Messaging using SIP and RTSP

Terminology and Definitions Acronyms and Abbreviations Acknowledgement

A Service Platform for Subscription-Based Live Video Streaming

A Comparative Study of Signalling Protocols Used In VoIP

SIP A Technology Deep Dive

EXPLOITING SIMILARITIES BETWEEN SIP AND RAS: THE ROLE OF THE RAS PROVIDER IN INTERNET TELEPHONY. Nick Marly, Dominique Chantrain, Jurgen Hofkens

The MOST Affordable Video Conferencing. Conferencing for Enterprises, Conferencing for SMBs

SIP, Session Initiation Protocol used in VoIP

Sonus Networks engaged Miercom to evaluate the call handling

Simulation of SIP-Based VoIP for Mosul University Communication Network

White Paper: Performance of Host-based Media Processing

Chapter 10 Session Initiation Protocol. Prof. Yuh-Shyan Chen Department of Computer Science and Information Engineering National Taipei University

This specification this document to get an official version of this User Network Interface Specification

Integrating Voice over IP services in IPv4 and IPv6 networks

Contents. Specialty Answering Service. All rights reserved.

SIP Trunking Quick Reference Document

nexvortex SIP Trunking

AC : A VOICE OVER IP INITIATIVE TO TEACH UNDERGRADUATE ENGINEERING STUDENTS THE FUNDAMENTALS OF COMPUTER COMMUNICATIONS

3CX Phone System Enterprise 512SC Edition Performance Test

Integration of GSM Module with PC Mother Board (GSM Trunking) WHITE/Technical PAPER. Author: Srinivasa Rao Bommana

Siemens OpenScape Voice V7 SIP Connectivity with OpenScape SBC V7. to Integra SIP Service

Multimedia Communications Voice over IP

Security issues in Voice over IP: A Review

IP Office Technical Tip

A VoIP Traffic Monitoring System based on NetFlow v9

An Introduction to VoIP Protocols

AN IPTEL ARCHITECTURE BASED ON THE SIP PROTOCOL

Alkit Reflex RTP reflector/mixer

Journal of Engineering Science and Technology Review 7 (3) (2014) 1 6. Research Article

Receiving the IP packets Decoding of the packets Digital-to-analog conversion which reproduces the original voice stream

ACD: Average Call Duration is the average duration of the calls routed bya a VoIP provider. It is a quality parameter given by the VoIP providers.

VoIP QoS. Version 1.0. September 4, AdvancedVoIP.com. Phone:

This presentation discusses the new support for the session initiation protocol in WebSphere Application Server V6.1.

INdigital SIP trunk interconnection

EarthLink Business SIP Trunking. NEC SV8300 IP PBX Customer Configuration Guide

Best Practices for SIP Security

BEng (Hons) Telecommunications. Examinations for / Semester 1

of the existing VoLTE roaming and interconnection architecture. This article compares existing circuit-switched models with the earlier

METHODS OF INTEGRATING mvoip IN ADDITION TO A VoIP ENVIRONMENT

Application Note. Configuring Dialogic Host Media Processing Software Release 3.1LIN Software Licenses

Radius/LDAP authentication in open-source IP PBX

Developing and Integrating Java Based SIP Client at Srce

Proposition of a new approach to adapt SIP protocol to Ad hoc Networks

Session Initiation Protocol (SIP) 陳 懷 恩 博 士 助 理 教 授 兼 計 算 機 中 心 資 訊 網 路 組 組 長 國 立 宜 蘭 大 學 資 工 系 TEL: # 340

VoIP Conformance Labs

Introduction to VoIP Technology

Adaptation of TURN protocol to SIP protocol

Configuring Interactive Intelligence ININ IP PBX For tw telecom SIP Trunking service USER GUIDE

Columbia - Verizon Research Securing SIP: Scalable Mechanisms For Protecting SIP-Based Systems

Voice over IP (VoIP) Performance Evaluation on VMware vsphere 5

VOICE OVER IP AND NETWORK CONVERGENCE

Railway Freight Dispatching Telephone System Based on VoIP in Wireless Networks

NTP VoIP Platform: A SIP VoIP Platform and Its Services 1

TECHNICAL CHALLENGES OF VoIP BYPASS

Technical Configuration Notes

Performance Measurement Tools for SIP Server. Samit Jain Columbia University, New York

Resource Management and Containment for Active Services

nexvortex SIP Trunking Implementation & Planning Guide V1.5

SIP Trunking. Cisco Press. Christina Hattingh Darryl Sladden ATM Zakaria Swapan. 800 East 96th Street Indianapolis, IN 46240

SIP Trunking Manual Technical Support Web Site: (registration is required)

A SIP Load Balancer for Performance Enlargement on the Enterprise Network

Transcription:

Benchmarking the Session Initiation Protocol (SIP) Yueqing Zhang, Arthur Clouet, Oluseyi S. Awotayo, Carol Davids Illinois Institute of Technology Email: {yzhan230,aclouet,oawotayo}@hawk.iit.edu, carol.davids@iit.edu Vijay K. Gurbani Bell Laboratories, Alcatel-Lucent Email: vkg@bell-labs.com Abstract Measuring and comparing performance of an Internet multimedia signaling protocol across varying vendor implementations is a challenging task. Tests to measure the performance of the protocol as it is exhibited on a vendor device and conditions of the laboratory setup to run the tests have to be normalized such that no special favour is accorded to a particular vendor implementation. In this paper, we describe a performance benchmark to measure the performance of a device that includes a Session Initiation Protocol (SIP) proxy function. This benchmark is currently being standardized by the Internet Engineering Task Force (IETF). We implemented the algorithm that has been proposed in the IETF to measure the performance of a SIP server. We provide the test results of running the benchmark on Asterisk, the popular open source SIP private branch exchange (PBX). I. INTRODUCTION The Session Initiation Protocol, (SIP [1]) is an IETFstandardized protocol for initiating, maintaining and disconnecting media sessions. The protocol been adopted by many sectors of the telecommunications industry: IP Private Branch exchanges (PBX es) and SIP Session Border Controllers (SBCs) are used for Business VoIP phone services as well in the delivery of Next Generation 911 services over the Emergency Services IP Network (ESINet [2]); LTE mobile phone systems are designed to the SIP protocol and replace the circuit switched telephone service currently provided over cellular networks; SIP-to-SIP and SIP-PSTN-SIP ecosystems are used to reduce dependence on and supplement the traditional long distance trunk circuits of the Public Switched Telephone Network (PSTN). Many commercial systems and solutions based on SIP are available today to meet the industry s requirements. As a result, there is a strong need for a vendorneutral benchmarking methodology to allow a meaningful comparison of SIP implementations from different vendors. The term performance has many meanings in many different contexts. To minimize ambiguity and provide a level field to interpret the results in, we conduct our work in the Benchmarking Working Group (BMWG 1 ) within the Internet Engineering Task Force (IETF 2 ). BMWG does not attempt to produce benchmarks in live, operational networks for the simple reason that conditions in such networks cannot be controlled as they would be in a laboratory environment. Furthermore, benchmarks produced by BMWG are vendor 1 http://datatracker.ietf.org/wg/bmwg/charter/ 2 http://www.ietf.org neutral and have universal applicability to a given technology class. The specific work on SIP in the BMWG consists of two Internet-Drafts, a terminology draft [3] and a methodology draft [4]. The methodology draft uses the established terminology from the terminology draft to define concrete test cases and to provide an algorithm to benchmark the test cases uniformly. We note that the latest versions of the terminology and methodology drafts are awaiting publication as a Request For Comment (RFC) documents. The work we describe in this paper corresponds to earlier versions of these drafts (more specifically to version -09 of these drafts [5], [6]). The remainder of this paper is organized as follows. Section II contains the problem statement. Section III describes related work. Section IV contains a brief background on the SIP protocol. Section V describes the associated benchmarking algorithm from [6] and the script that we wrote to implement the algorithm. This script is available for public download at the URL indicated in [7]. Section VI contains a description of the test-bed. Section VII describes the test harness and the base results that can be obtained without the SIP server in the mix, and Section VIII takes a look at the results when a SIP server is added to the ecosystem. Section IX describes our plans for future work. II. PROBLEM STATEMENT Performance can be defined in many ways. Generally the term is used to describe the rate of consumption of system resources. Time or processing latency is often the metric used to quantify the resource consumption. The probability that an operation or transaction will succeed is another metric that has been proposed. The metrics are generally collected while the Device Under Test (DUT) is processing a well-defined offered load. The characteristics of the offered load whether or not media is being used, the specific transport used for testing, codecs used, etc. are an important part of the metric definition. In this study we use a different type of metric and a different type of load. Loads are chosen whose individual calls have a constant arrival rate and a constant duration. The value assigned to a given load is the number of calls per second, also referred to as session attempt rate. The loads are not necessarily designed to emulate any naturally occurring offered load, rather they are designed to be easily reproduced and capable of exercising the DUT to the point of failure. We

test to failure looking for the performance knee or break point of the DUT while applying a well calibrated pre-defined load. The measure of performance is the value of the highest offered load that when applied to the DUT produces zero failures, the next highest rate attempted having produced at least one failure. The rationale for such an approach has to do with the nature of SIP servers, the applications they are used to create and the environment in which they are deployed. SIP is used to create a diverse and growing variety of applications. There will be a diverse set of user communities each with its own characteristic use pattern. The characteristics of the offered loads will differ and are not easily predicted. Testing to zero failure using a standard pre-defined load is a reproducible way to learn about how a system behaves under a steady load. The approach resembles the use of statistical measures such as means and standard deviations to predict future behavior. It is assumed that the resulting metric will be useful as a predictor of behavior under more realistic loads. III. RELATED WORK There is a growing body of research related to various aspects of the performance of SIP proxy servers. In many of these, performance is measured by the rate at which the DUT consumes resources when presented with a given work load. SIPstone [8] describes a benchmark by which to measure the request-handling capacity of a SIP proxy or cluster of proxies. The work outlines a method for creating an offered load that will exercise the various components of the proxy. A mix of registrations and call initiations, both successful and unsuccessful, are described and the predicted performance is measured by the time it takes the proxy to process the various requests. SPEC SIP [9], is a software benchmark product that seeks to emulate the type of traffic offered to a SIP-enabled Voice of IP service provider s network on which the servers play the role of SIP proxies and registrars. The metric defined by SPEC SIP is the number of users that complete 99.99% of their transactions successfully. Successful transactions are defined to be those that end before the expiration of the relevant SIP timers and with the appropriate 200, 302 and 487 final responses. Cortes et al. [10] define performance in terms of processing time, memory allocation, CPU usage, thread performance and call setup time. A common theme in these studies is the creation of a realistic traffic load. In contrast, the work described in this paper defines a way to use simple, easily produced (and reproducible) loads to benchmark the performance of SIP proxies in a controlled environment. Such benchmarks can be used to compare the performance across vendor implementations. IV. SIP BACKGROUND The base SIP protocol defines five methods, of which four REGISTER, INVITE, ACK and BYE are used in our work. The REGISTER method registers a user with the SIP network while the INVITE method is used to establish a Figure 1: SIP REGISTER and subsequent INVITE methods session. A third method, ACK, is used as the last method in a 3-way handshake. BYE is used to tear down a session. SIP ecosystem consists of SIP user agent clients (UACs) and user agent servers (UASs), registrars and proxies (see Figure 1). A UAS registers with a registrar, and a proxy, possibly co-located with the registrar, routes requests from a UAC towards the UAS when a session is established. Proxies typically rendezvous the UAC with the UAS and then drop out of the chain; subsequent communications go directly between the UAC and UAS. The protocol, though, has means that allow all signaling messages to proceed through the proxy. Figure 1 also shows media traversing directly between the UAC and UAS; while this is preferred, in some cases it is necessary for the media to traverse a media gateway or media relay controlled by the proxy (e.g., conferencing or upgrading a media codec). SIP defines six response classes, with 1-xx class being provisional responses and 2-xx to 6-xx class responses being considered final. A SIP transaction is defined as an aggregation of a request, one or more provisional responses and a final response (see Figure 1). While transactions are associated on a hop-by-hop basis, dialogues are end-to-end constructs. The dialogue is a signaling relationship necessary to manage the media session. For SIP networks where media traverses through a proxy, there will be a dialogue established between the UAC and the proxy and another one between the proxy and the UAS. SIP can be used to create calls or sessions that resemble traditional telephone calls. SIP can be used to create other services and applications, but the present work only examines its use in creating audio/video sessions, which are also referred to as calls. The terms session and call are used interchangeably in this paper. But the term session is more applicable in the context of the Internet and in recognition of the functionality beyond phone calls that SIP offers. One can consider a session to be a three-part vector with a signaling component (SIP), a media component (Real-time protocol or RTP [11]) and a management component (Real-time Control

Figure 2: SIP session as a 3-part vector Protocol or RTCP [11]). Figure 2 illustrates this interpretation. With this interpretation a traditional call is a SIP session with a non-zero component in the media plane. This interpretation enables the study of maximum rates of SIP loads other than INVITEs. A load consisting only of REGISTER requests, for example, will create sessions with components only in the signaling plane. Such rates, particularly registration rates, are of great interest to the SIP community. A load consisting of INVITE requests and including media that traverses the DUT,on the other hand, will require more resources since the DUT will handle the RTP as well as the SIP messages on the DUT. V. THE SESSION ESTABLISHMENT RATE ALGORITHM The algorithm in Algorithm 1 was programmed using the Unix Bash shell scripting language [12] and is available for public download at [7]. The algorithm finds the largest session attempt rate at which the DUT can process requests successfully for a pre-defined, extended period of time with zero-failures. The name given to this session rate is Session Establishment Rate (SER [6]). The period of time is large enough that the DUT can process with zero errors while allowing the system to reach steady state. The algorithm is defined as an iterative process. A starting rate of r = 100 sessions/second (sps) is used and calls are placed at that rate until n = 5000 calls have been placed. If all n calls are successful, the rate is increased to 150 sps and again, calls continue at that rate until n = 5000 calls have been placed. The attempt rate is continuously ramped up until a failure is encountered before n = 5000 calls have been placed. Then an attempt rate is calculated that is higher than the last successful attempt rate by a quantity equal to half the difference between the rate at which failures occurred and the last successful rate. If this new attempt rate also results in errors, a new attempt rate is tried that is higher than the last successful attempt rate by a quantity equal to half the difference between the rate at which failures occurred and the last successful rate. Continuing in this way, an attempt rate without errors is found. The tester can specify margin of error using the parameter G, the granularity, which is measured in units of sps (sessions/sec). Any attempt rate that is within an acceptable tolerance of G can be used as a SER. Algorithm 1 Benchmarking algorithm from [6] {Parameters of test; adjust as needed} n 5000 {local maximum; used to figure out largest value (number of sessions attempted)} N 50000 {global maximum; once largest session rate has been established, send this many requests before calling the test a success} m {...} {other attributes affecting testing, media for instance} r 100 {Initial session attempt rate (sessions/sec)} G 5 {granularity of results; margin of error in sessions/sec} C 0.05 {calibration amount: how much to back down if we have found candidate s but cannot send at rate s for time T without zero-failure} {End parameters of test.} f false {set when test is done} c 0 {upper limit} repeat send_traffic(r, m, n) {Send r req/sec with m media characteristics until n requests have been sent} if all requests succeeded then r r {save candidate value of metric} if (c == 0) then r r+(0.5 r) else if (c == 1) and (r r ) > 2 G) then r r+(0.5 (r r) else if (c == 1) and (r r ) 2 G) then f true else {one or more requests fail} c 1 {found upper bound for metric} r r {save new upper bound} r r (0.5 (r r )) end if end if until (f true) As an example, assume it is known (through the vendor) that a SIP proxy (the DUT) exhibits a SER of 269 sps. To independently verify this, the benchmark algorithm starts with r = 100 sps and continue ramping this rate until failures are seen. At, say, 400 sps, the DUT will exhibit failures and the algorithm will reduce the next attempted rate to 300 sps, which is half of the difference between the last failure (400) and the last known success (200). The algorithm continues in this form until stable state is reached. VI. EXPERIMENTAL TEST BED The test bed, shown in Figure 3 consists of five elements: a switch, hosts generating SIP traffic (with and without RTP), a host to act as a DUT and host to collect traffic using Wireshark. The test bed is on a private LAN isolated from the Internet and a mirror port is configured on the switch to capture traces of the calls in the load without contributing to the resource consumption of any of the functional elements of the test.

Figure 3: Physical architecture of the testbed The SIP traffic is generated by SIPp [13], an open source SIP traffic generator. One hosts acts as a SIP UAC and the other host is the SIP UAS. A host running the open source Asterisk server [14]. The UAC is an Intel Core 2 6420, with 2.13 Ghz speed and 4 Gb Memory; the UAS and the DUT run on Intel Core 2 6320 with 1.86 Ghz speed and 4 Gb memory. The Wireshark trace host is a Macbook Pro, with a speed of 2Ghz Intel i7, and 8 Gb memory. VII. TESTING THE TEST HARNESS Before testing any DUT, it is necessary to discover the SER of the test harness itself. The harness includes the Bash test script, the SIPp load generator, the platforms upon which these run, and the switch that connects them all. Use of a slow machine to run the SIPp elements, for example, would limit the rate at which calls are generated, so that carrier-grade DUTs that operate at or near line speed, might not be testable using a test harness unable to generate the load necessary to produce failures. Following the methodology in [6], the SER of the test harness with and without RTP associated with the SIP calls was measured. The algorithm of Section V is a two-step process: A test with a short duration is run and a candidate SER is found. That candidate becomes the starting rate for a much longer duration test. The duration of the first test is the length of time it would take to attempt 5000 calls; the second test lasts the length of time it would take to attempt 50,000 calls. SIPp, the software used to implement our tests, offers two ways to end a single test run: in one, the test ends after a certain number of call attempts; in the other, the test ends after a fixed time has elapsed. The methodology described in [6] uses the first method. The tests described here use the second one. A. Testing the test harness without RTP First the test harness was tested using SIP calls with no associated RTP. All SIP calls were set to last for 9 seconds and used a granularity of 3 (G = 3, c.f. Section V). Test results are displayed in Figure 4 and analyzed below. 1) Impact of the value of the granularity parameter: Comparing the results in Figure 4 for different granularities, we see that for a 90 second test duration, a granularity of 1, produced an SER of 2,874; a granularity of 2, produced Figure 4: Test harness performance without RTP (Breakpoint on y-axis is SER) an SER of 2,623; and a granularity of 3, an SER of 2,740. The granularity of 1 produced the highest SER. This is to be expected because the granularity of 1 produces an SER that is only one sps lower than the rate at which the first failure occurs. Note however that the granularity of 2 produced a lower value than the granularity of 3, a result we cannot yet explain. Tests with a higher granularity find an SER more quickly than those with a granularity of 1. Testing to determine the SER takes a long time. Each individual test loop of the algorithm lasts a fixed time and the number of iterations for a long test-duration can cause the test process to last more than 24 hours. As an example, if the test-duration is set to 20 minutes and the iterative process is begun using a call rate of 100 calls per second, the first pass through the loop lasts 20 minutes. If no errors occur, the rate is increased to 200 calls per second and, the second pass through the loop lasts another 20 minutes. After another 20 minutes, the rate is again increased to 400 calls per second, and so on. It takes many 20-minute periods to reach the 2,855 SER recorded in the table. Even though the tests used a granularity of 3, and used a relatively short test-duration, the time to obtain the SER was greater than 6 hours. 2) Impact of longer test-durations: It is expected that longer test-durations will produce lower SERs. This is because the longer the DUT sustains a call load at any rate, the more likely it is that a failure such as a SIP time-out will occur. The DUT s operating system as well as its code and hardware are stressed, and the time to perform memory management and other functions will accumulate making delays and eventual failures more likely the longer the test runs. The data show that for a granularity of 1 and test durations of 30s, 60s and 90s, the SERs were respectively 3,941 sps, 3,467 sps and 2,874 sps. Testing with a granularity of 2 and 3, for the same series of test durations produced a similar steep decline in the value of the SERs. Tests for a granularity of 3 were conducted using an extended series of test-durations, 5-, 10- and 20-minute testdurations were attempted and the SERs produced were 2,799, 2,803, and 2,855 respectively. The slight rise in values may

be attributed to the non-deterministic nature of the operating systems of the platforms on which the SIPp applications ran. The conclusion that can be drawn from these data is that the test harness can deliver a load of around 2,700 sps when there is no associated RTP and when the call duration is set for 9 seconds. This means that a system that can support a higher call rate than 2,700 sps, cannot be tested with this harness. B. Testing the test harness with RTP The next results were obtained for the DUT-less test harness when RTP was associated with the SIP calls. To associate RTP with the SIP session, the PCAP-replay method provided by SIPp was used. In that method, SIPp replays a pre-recorded PCAP file. We recorded our file such that it consisted of 9 total seconds, broken down into 8 seconds of sound bytes plus a 1 second payload of Dual Tone Multi-Frequency (DTMF). The tests results reported in this section were such that SIP calls with RTP lasted for 9 seconds. The results when a granularity of 1, 2, and 3, a test-duration of 30s, 60s, and 90s, and a call-duration of 9 seconds were used, are displayed in Figure 5. An extended series of testdurations for a granularity of 3, using test-durations of 5 and 10 minutes was also performed. The results are presented in Figure 5. was running, the UAC sent RTP of 0.090 s instead or 1s and the UAS sent an RTP audio packet of 7.049s instead of 8s followed by an RTP DTMF packet of 0.21s instead of 1s. Thus, in the actual test there were (7.040 + 0.090 + 0.21) or 7.34 s of RTP, so the actual number of calls that would be active is 7.34 s * 277 sps, or 2,033 calls. Thus, the actual number of file handles needed is 2,033 * 2, or 4,066, which is close to but less than the 4,096 maximum that was configured on the platforms. In conclusion, the test harness is able to find the SER for devices whose SERs do not exceed 272 sps when RTP is associated with the test calls. VIII. TESTING THE DUT We used the Asterisk server [14] as a DUT to measure the SER. Asterisk is an open-source code distribution that supports different SIP-based telecommunications services including IP PBX, conference servers and VoIP gateways. Asterisk can be configured to allow RTP to pass through the host it is running on, or it can be configured to enable RTP to by-pass its host and flow directly between the UAC and the UAS. Performance tests involving RTP forced the RTF flows through the machine running Asterisk. A. Testing the DUT without RTP The results of testing for Asterisk s SER when there is no RTP associated with the call load, are displayed in Figure 6. This test measures the ability of the Asterisk to process SIP requests only. SIPp was configured to insert the 9s pause whether or not RTP was actually sent during the pause. A SIP BYE request is sent 9 seconds after the call started. Figure 5: Test harness performance with associated RTP (Breakpoint on y-axis is SER) All the SER values fall between 281 and 272. This is several orders-of-magnitude less than the case when RTP was not associated with the calls. Also, the drop that we observed in Figure 4 as the test-duration increases does not appear in this data. The difference is attributable to limitations on the platforms on which SIPp runs and to the method that SIPp uses to generate the calls in the test. Regarding the platform limitations: The open-file limit on these machines is set to 4,096. Thus, no more than 4,096 PCAP files can be open at any time. SIPp requires two file handles for each call that it sets up: one for the PCAP file that it sends, and one for the PCAP file it receives. In the trace taken while this scenario Figure 6: Asterisk performance without RTP (Breakpoint on y-axis is SER) Comparing the results in Figure 6 for different granularities, we see that granularities 1, 2, and 3, all exhibit the expected drop in SER from test durations of 30s to test durations of 60s. For test-durations of more than 60s, the SER s for all three granularities showed convergence to a constant value: 97 in the case of the granularity of 1; 98 in the case of a granularity of 2; and 99 in the case of a granularity of 3. Tests for a granularity of 3 were extended to include test-durations of 5 minutes and 20 minutes and the SER measured in each case was 99.

B. Testing the DUT with RTP The results of testing for Asterisk s SER when is RTP associated with the call load, are displayed in Figure 7. This test measures the ability of the Asterisk to process SIP requests as well as relay RTP between the UAS and UAC. SIPp was configured to insert a 9s pause and to send RTP during that pause. A SIP BYE request is sent 9 seconds after the call started. Next steps include additional tests using the existing harness and updates to the harness. Some directions for future work include applying the current test harness to a set of virtual implementations of Asterisk to compare the SERs of the Asterisk code running on different operating systems, each with a different speed and memory size. We plan to test for the Asterisk s SER when call-associated RTP does not traverse the DUT. Comparison of the results obtained in VII and those results would be of interest: Are Asterisk s resources used differently when RTP is associated with the call but does not pass through Asterisk and when there is no RTP at all associated with the call? We also plan to obtain the registration rate of the Asterisk and measure the SERs and registration rates of other open-source SIP servers including Kamilio [15] and FreeSWITCH [16], as well as commercial SIP servers. And finally, we plan to update the test script to conform to the algorithm in [4]. The new version should enable the user to select a wider variety of SIPp options and to collect more data about the DUT and the test bed elements, including the memory usage and processing times. REFERENCES Figure 7: Asterisk performance with associated RTP (Breakpoint on y-axis is SER) Comparing the results in Figure 7 for different granularities, we see that granularities 1,2, and 3, all exhibit the expected drop in SER as test durations rise from 30s to 60s to 90s, with values of 88, 85 and 76 respectively for a granularity of 1; 87, 86 and 77 for a granularity of 2; and 85, 85 and 79 for a granularity of 3. Tests for a granularity of 3 were extended to include test-durations of 5 minutes and 10 minutes with resulting SERs of 80 and 77 respectively. IX. CONCLUSION AND FUTURE WORK The current work demonstrates the importance of establishing a baseline model (testing the test harness, Section VII in order to understand its limits and behaviors and how these may impact the test results when an actual DUT is used. Looking at Figure 4 and Figure 7 for a granularity of 1 and a test duration of 90s, we observe that the performance capacity of the test harness drops almost 90% when RTP is used. Clearly, processing RTP impacts the SER rate. Interestingly enough, observing the same data granularity of 1 and test duration of 90s in Section VIII, Figures 6 and 7, we note that there is minimal drop in capacity when RTP is used in the DUT. We postulate that the uniformity in the SER when Asterisk is used as a DUT is attributed to the processing Asterisk is doing for each session. The complexity of processing each session is high enough when Asterisk acts as a back-to-back user agent that it is unable to achieve the same SER rates as the test harness does without RTP. Thus upon the introduction of RTP, the SER remains the same in Asterisk. [1] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Handley, and E. Schooler, SIP: Session Initiation Protocol, RFC 3261 (Proposed Standard), Internet Engineering Task Force, Jun. 2002. [Online]. Available: http://www.ietf.org/rfc/rfc3261.txt [2] (2015, Jan) NENA i3 Detailed Functional Requirments. [Online]. Available: https://c.ymcdn.com/sites/www.nena.org/resources/resmgr/standards/08-003_detailed_functional_a.pdf [3] C. Davids, V. K. Gurbani, and S. Poretsky, Terminology for benchmarking Session Initiation Protocol devices: Basic session setup and registration, IETF Internet-Draft (work-in-progress), draft-ietf-bmwgsip-bench-term-12, November 2014. [4], Methodology for benchmarking Session Initiation Protocol devices: Basic session setup and registration, IETF Internet-Draft (workin-progress), draft-ietf-bmwg-sip-bench-meth-12, November 2014. [5], Terminology for benchmarking Session Initiation Protocol devices: Basic session setup and registration, IETF Internet-Draft (workin-progress), draft-ietf-bmwg-sip-bench-term-09, February 2014. [6], Methodology for benchmarking Session Initiation Protocol devices: Basic session setup and registration, IETF Internet-Draft (workin-progress), draft-ietf-bmwg-sip-bench-meth-09, February 2014. [7] A. Clouet, Y. Zhang, O. S. Awotayo, and C. Davids. (2015, Jan) SIPp Script for benchmarking SIP Servers. [Online]. Available: //https://github.com/itm546-perfsipbench/sippscript [8] H. G. Schulzrinne, S. Narayanan, J. Lennox, and M. Doyle, SIPstone: Benchmarking SIP server performance, Columbia University, Tech. Rep. CUCS-005-02, 2002. [9] Standard Performance Evaluation Corporation. (2014, December) SPEC SIP_Infrastructure. [Online]. Available: http://www.spec.org/specsip/ [10] M. Cortes, J. R. Ensor, and J. O. Esteban, On SIP performance, Bell Labs Technical Journal, vol. 9, no. 3, pp. 155 172, 2004. [11] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, RTP: A Transport Protocol for Real-Time Applications, RFC 3550 (Standard), Internet Engineering Task Force, Jul. 2003, updated by RFCs 5506, 5761, 6051, 6222. [Online]. Available: http://www.ietf.org/rfc/rfc3550.txt [12] Free Software Foundation. (2015, March) GNU Bash. [Online]. Available: http://www.gnu.org/software/bash [13] (2014, Dec) SIPp. [Online]. Available: http://sipp.sourceforge.net/ [14] (2014, Dec) Asterisk Server. [Online]. Available: http://www.asterisk.org/ [15] (2014, Dec) Kamailio SIP Server. [Online]. Available: http://www.kamailio.org/w/ [16] (2014, Dec) FreeSWITCH. [Online]. Available: https://freeswitch.org/