Update in Methodology of SIP Performance Testing and its Application

Update in Methodology of SIP Performance Testing and its Application MIROSLAV VOZNAK, JAN ROZHON Department of Multimedia CESNET Zikova 4, 160 00 Prague CZECH REPUBLIC voznak@ieee.org, rozhon@cesnet.cz http://www.ces.net Abstract: - The paper deals with performance measurement and assessment of SIP elemnents in VoIP network. We have developed a tool for SIP benchmarking and performance testing and our methodology is based on recent RFC 6076 and next IETF drafts so that results of the measurement by this application comply with accomplished standards. Our earliest implementations of SIP benchmarking relied on open-source software and virtualization tools and were greatly improved during the last year with the emphasis on ease of use, understandability and repeatability. The development of our platform and its current stage are the topic of this paper which presents both the cornerstone technologies of the new platform and the main aspects of the methodology in relation to the result analysis.we analyzed four last versions of SIP PBX Asterisk by our tool and we pointed out the differences between the tested releases. Key-Words: - SIP, Benchmarking, VoIP, Asterisk, RFC 6076 1 Introduction As the number of VoIP installations based on Session Initiation Protocol grows, the performance measurement and assessment of SIP based network elements gains on importance. Since the IETF standard RFC 6076, the concern of which is this field of SIP communication, was adopted only a year ago there are still very few options of how to perform the testing in conformance with this standard [4]. With our methodology for testing and benchmarking SIP infrastructure finished [1], we had the opportunity to perform several series of tests on multiple different platforms. From these tests we realized, that it would be very beneficial to modify the existing testing platform to allow us for performing separate test scenarios on each of the important SIP dialogs. This way the movement towards the modular design started. During this work at the beginning of last year the new RFC 6076 was adopted finally standardizing most essential measured parameters. With the parameters standardized we have developed the most important testing scenarios the registration test scenario and the call test scenario, both having its roots in the previously used scenario for complex performance measuring [1], [2]. Each of those scenarios offers a different perspective when defining the SIP server limits and can be run either separately to test some special environments or occasions or simultaneously to simulate the real VoIP client behavior. The latter presented a big challenge, because the testing software does not allow running multiple scenarios at once inherently. However this problem was walked around by exploiting SIP security vulnerability, which allows a client from one address register another. This way the basis of module based testing platform has been created. In this paper, the current state of development of our SIP infrastructure performance testing tool is described. In addition, we present results obtained in performed experiments. Our developed tool enables to determine an optimal and maximal load of the SIP server, which is useful not only in small and medium business VoIP installations but for IP telephony provider as well. 2 State of the Art The topic of performance testing and benchmarking of SIP infrastructure gains on importance with each new VoIP installation. The transition from separate voice networks towards converged packet networks that provide services of all kinds spanning from data to voice and video transmissions is accelerating and the VoIP infrastructure mainly based on SIP protocol has begun a commonplace mainly in small and medium ISBN: 978-960-474-311-7 203

businesses during the past five years. Despite this high significance of the VoIP technology the means and ways of performance measurement and benchmarking of SIP based devices are still in their infancy since the software and hardware capable of complex testing, which is available on the market, comes from the known vendors in this field and their solutions tend to be vendor-specific, which leads to results incompatibility and incomparability between the solutions from two different vendors. These vendors are not pushed to adopt the newly introduced IETF standards as they tend to create standards of their own. This situation was to be changed by introducing the methodology for SIP benchmarking, which was developed and implemented in the Dpt. of Telecommunications of Technical University of Ostrava [1], [2]. This applied research has been supported by CESNET (Czech Educational and Science NETwork), CESNET is an association of Czech universities and academy of science. The proposed methodology and its implementation utilized powerful SIP benchmarking software called SIPp [3] and introduced scenarios and techniques for SIP benchmarking which together with the SIPp created a complex and RFC 6076 [4] conformant testing platform. 3 Methodology In order to perform SIP testing, we simulate both ends of the SIP dialogue to test the main part of the SIP infrastructure, the SIP server. The SIP server represents a set of servers always involving SIP Registrar and SIP Proxy or B2BUA (Back to Back User Agent). The latter is the most used solution in enterprise environment, for both SMEs (Small and Medium sized Enterprise) and LEs (Large Enterprise). The test scenario should be as simple as possible mainly to reduce the complexity of the test and except of that also because it is not possible to test the SIP Proxy (and B2BUA as well) in all the possible configurations. Thus it is useful to focus on basic default configuration and perform the tests with it. The output results then carry the information about the best case scenario according to which we can decide about the SIP server s performance and compare it with its rivals [5]. 3.1 Measured Parameters As mentioned in the Introduction we use the parameters defined in IETF standard for all our measurements [4], [6] and [7]. Hardware utilization parameters cannot be measured, although they could prove useful. The main reason why they cannot be measured is that the DUT (Device under Test) is not reachable at all locations since it might be operated by other subject company or organization that cannot allow access to its infrastructure to foreign users. The group of parameters is therefore measured at UAC (User Agent Client) and includes the call statistics such as number of (un)successful calls and durations of the message exchanges. RTP samples for analysis can be captured here as well. Fig. 1 illustrates the meaning of the evaluated RRD and SRD delays. The complete list of all possible measured parameters includes: Number of (un)successful calls. Registration Request Delay (RRD) time between first Register method and its related 200 OK response [3]. Session Request Delay (SRD), the time between first Invite method and related 180 Ringing message [3]. Mean Jitter a Maximum RTP Packet Delay. Fig. 1. Registration Request Delay and Session Request Delay in SIP Dialog. 3.2 Limit Definition in Results Analysis The previously defined parameters do not suffice to assess the SIP server s performance. To be able to determine the SIP server s performance from the collected data we need to define the limit values for each category of the measured parameters. This definition must come out from the features of the ISBN: 978-960-474-311-7 204

SIP protocol and generally recognized convention from IP and classic telephony [8], [9]. The limit definition for the SIP delay characteristics RRD and SRD comes from the nature of the SIP protocol. When the call is set up the delays between messages should not exceed several hundreds of milliseconds and although these limitations are tied up with the travel of the SIP message from one end of call to another, it can be used for our purposes as well, because of the similarities that come from the need to set up a call quickly enough not to bother the user with noticeable delays.from this, we can estimate that the quality boundary for RRD and SRD is somewhere around 300 milliseconds [10], [11]. However, this value may vary in accordance to the need of each one particular user. Generally, we can say that limit from the SIP transactions point of view is reached, when SRD and RRD characteristics start increasing rapidly. This boundary will give us a slight space as the potential reserve. Fig. 2 shows this in greater detail on one performed SRD and RRD measurement. on scenario modifiability. The scenarios are defined in the XML format which makes it possible to create and generate well-formed SIP messages as well as the malformed ones; therefore it can be used for security tests as well. Fig. 3. Test platform development in time. Fig. 2. Example of possible trend in RRD and SRD measurement with the noticeable change of trend at 600 concurrent calls. It is also possible to increase the complexity of the result analysis by implementing the relative codec translation effectiveness, which includes the per cent relation of the codec translation effectiveness compared with the case without the code translation. This way even the platforms with largely different performance can be compared. 4 Platform To successfully generate high loads of SIP traffic we have been using the open source traffic generator SIPp. This software allows for generating both simple and complex SIP dialogs with the emphasis Using the XML SIPp can create many calls and route them to the SIP server, however to successfully stress test the tested infrastructure we need to create huge number of calls (simultaneous or generated per time unit). And this is where SIPp s inherent limitation comes to scene preventing us to reach reasonable loads. This limitation comes from the SIPp s single threaded software design preventing it from using multiple processor cores to increase its call capacity. This could have been easily worked around by running multiple processes of the SIPp, if there was not a particular problem in the SIPp source codes. This problem appears when multiple virtual network interfaces are being used. SIPp ignores the command line arguments instructing it to use a specific network interface and automatically falls back to the primary network ISBN: 978-960-474-311-7 205

interface. This problem is connected with the media stream only; therefore it does not influence the SIP signaling messages. In the next phase we focused on the right coding the SIPp s media capabilities, since it was the only feasible way to make it run on the single computer in multi-process mode. Through analysis of the source code we managed to find the problematic section of the code and fix it. This way we removed the biggest obstacle from our way to multi-process design. The whole platform synchronization moved from SSH protocol to internal scripting, which is much more efficient and convenient. The whole process of testing platform development is depicted on the Fig. 3, which illustrates the transition from multi computer design to design based on virtual computers and finally the design based on multi process approach. application as quickly as possible but allowed to modify and control its function widely enough to fit our needs. Through series of attempts with different frameworks we focus on the Web2py framework, which is written in Python and which provides both efficiency and user convenience. Using this framework we have developed a web interface application which provides functionality to run and monitor tests, view results and manage the hardware utilization. This application uses these technologies as its important and integral parts: Web2py framework for web page generation, JSON format to transfer results, jquery to display the results graphically, Python to run the utilization distribution algorithms, SIPp for call signaling and media, SQLite database to store the results and test parameters. Fig. 5. Web interface application with example of test results. Fig. 4. scheme. Web interface application functionality There are several languages to pick up from including the PHP, Perl, Python or Java. To allow for a rapid application development we decided to use a web framework that would provide us with enough predefined functionality to develop the Since all the mentioned technologies and applications are distributed freely under the GPL license or its derivatives the whole solution when finalized will also be distributed under this license The basic functionality of the web application is depicted on the Fig. 4 and shows us that the user enters basic parameters of the test to the form which is displayed as the web page on his browser. The data the user enters include IP addresses of ISBN: 978-960-474-311-7 206

individual UACs and UASs, address of the SIP server, scenario, which is to be used etc. These parameters are then passed to the python algorithm using the POST method. The algorithm then counts the best possible distribution of the load among the CPU cores and recounts the parameters so it is usable for individual SIPp processes, after which it runs them. The result files of the individual SIPp processes are periodically monitored and parsed and the data stored in them is inserted to the database from which the user can access it using its browser. The data between server and user s browser are encoded in JSON format and then interpreted using jquery [6] library Highcharts [7] so that the user has the graphical overview of the ongoing test. The example of test results in graphical view is depicted on the Fig. 5. The data about the performed tests and its input parameters are stored for the future usage so that the user can repeat a test as many times as it is needed without the need to repeatedly enter the parameters to the form. caused by raising length of code part responsible for SIP mesagges handling. 5 Results The main contribution of our research is the developed system for SIP infrastructure benchmarking which is able to find limitations and constraints of SIP servers and other elements. The last step in development of our tool was concentrated on implementation of a graphical interface that would allow for test management and control as well as the quick and intuitive result analysis. This interface was created with the emphasis on the simplicity and conformance with the IETF RFC6076. The cornerstones of the developed platform are Web2py web framework, jquery and SQLite database. All these software components are opensource which allows for free platform distribution which will greatly increase the number of users. These users need not to be experts in the field of SIP communication because of the simplified approach in test conduction allows even not experienced users to perform the basic test without greater difficulties. The next contribution of our paper is comparison of last four Asterisk distributions and surprising in their performance that the older versions 1.6 and 1.8 are able to process successfuly more calls than newer Asterisk 10 and 11. Acknowledgement This work has been supported by the Ministry of Education of the Czech Republic within the project LM2010005. Fig. 6. Time domain results of the measurement in the form of graphical chart. This way the powerful web interface can improve the user experience with our platform and allow not experienced users to use it. The results of measured Session Complet Ration are shown on the Fig. 6, where the results of the last four versions of Asterisk PBX, which is the most widely used software PBX today [15], [16]. All measured versions show similar curve of performance with clear knee about 600 simultaneous calls. Next increased traffic load causes different behaviour, the older versions 1.6 and 1.8 are able to process successfuly more calls than newer Asterisk 10 and 11 what is surprising finding and it is probably References: [1] J. Rozhon, M. Voznak, Registration Burst Load Test, In SPRINGER Communications in Computer and Information Science, (Part 2) 2011, Vol. 189 CCIS, July 2011, pp. 329-336. [2] M. Voznak, J. Rozhon, Performance testing and benchmarking of B2BUA and SIP Proxy, In Proc. 33rd International Conference on Telecommunication and Signal Processing, Vienna, August 2010, pp. 497-503. [3] SIPp development team: SIPp SIP performance testing tool [Online]. Available: http://sipp.sourceforge.net/. [4] D. Malas, A. Morton, Basic Telephony SIP End-to-End Performance Metrics, Internet Engineering Task Force: RFC 6076, 2011, ISSN 2070-1721. ISBN: 978-960-474-311-7 207

[5] Transnexus, Performance Test of Asterisk V1.4 as a Back to Back User Agent [Online]. Available: http://www.transnexus.com/. [6] S. Poretsky, V. Gurbani, C. Davids, Terminology for Benchmarking Session Initiation Protocol (SIP) Networking Devices, IETF draft, March 2011. [7] S. Poretsky, V. Gurbani, C. Davids, Methodology for Benchmarking SIP Networking Devices, IETF draft, September 2011. [8] J. Rosenberg, H., Schulzrinne, G.., Camarillo and et al., SIP: Session Initiation Protocol. IETF RFC 3261, URL: http://www.ietf.org/rfc/rfc3261.txt, (2002). [9] A. Johnston, SIP: Understanding the Session Initiation Protocol. Artech House Publishers; 3rd ed., Norwood (2009). [10] M. Kavacky, E. Chromy, L. Krulikovska and P. Pavlovic, Quality of Service Issues for Multiservice IP Networks, In Proc. SIGMAP 2009 International Conference on Signal Processing and Multimedia Applications, Milan, Italy, July 2009, pp. 185 188. [11] I. Baronak and M. Halas, Mathematical representation of VoIP connection delay, RADIOENGINEERING Volume: 16 Issue: 3, September 2007, pp. 77-85. [12] A. Kovac, M. Halas, M. Orgon and M. Voznak, E-model MOS Estimate Improvement through Jitter Buffer Packet Loss Modelling, In Advances in Electrical and Electronic Engineering, Volume 9, Number 5, December 2011, pp. 233-242. [13] M. Voznak and J. Rozhon, Methodology for SIP infrastructure performance testing, WSEAS Transactions on Computers, Volume 9, Issue 9, September 2010, pp. 1012-1021. [14] M. Voznak and J. Rozhon, SIP back to back user agent benchmarking, Proceedings - 6th International Conference on Wireless and Mobile Communications, ICWMC 2010, Valencia, September 2010, pp.92-96. [15] J. Meggelen, J., Smith and L., Madsen, Asterisk: The Future of Telephony. 2nd ed. O'Reilly, Sebastopol (2007). [16] S.Wintermeyer and S., Bosch, Practical Asterisk 1.4 and 1.6: From Beginner to Expert. Addison-Wesley Professional, New York (2009). ISBN: 978-960-474-311-7 208