A Cross Layer Solution for VoIP over IEEE82. F. De Pellegrini, F. Maguolo, A. Zanella and M. Zorzi. depe,maguolof,zanella,zorzi @dei.unipd.it.it Department of Information Engineering, University of Padova, via Gradenigo 6/B, 353 Padova, Italy. CREATE NET via Solteri 38, 38 Trento, Italy Abstract The deployment of WLANs was meant to extend LANs, with no particular care for real time applications. Nevertheless, since VoIP calls are becoming customary, IEEE82. WLANs are challenged with the provision of guaranteed quality VoIP calls. In this paper, we address the concerns on the efficiency of WLAN for VoIP provision, and propose a cross layer scheme aimed to enhance the voice capacity of the current typical WLAN configuration. With the proposed modification, the overhead of the IEEE82. MAC is mitigated allowing the aggregation of VoIP packets of the same flow. We show through extensive simulations that the advantage of our scheme is significant, while being compliant with the current IEEE82. standard. Keywords IEEE82., VoIP, Cross Layer. I. INTRODUCTION The leading RF solution for wireless indoor connectivity is represented by the IEEE82. standard []. Recently, increasing attention has been devoted to the provision of VoIP support over IEEE82. WLANs, and the reason for the interest in such direction is twofold. From the perspective of the mobile phone service providers, there exist the opportunity of leveraging the free ISM band in order to relief the congestion of the cellular networks. In fact, if mobile phones are equipped with IEEE82. RF interfaces and suitable software, handing off users from a cell into existing indoor hot spots represents an appealing economic advantage for providers. In such scenario, the user could leverage the WLAN and the provider could free some channels for additional allocation. The second reason comes from the spreading habit of LAN users to rely on VoIP telephony [2]: with the ongoing migration to wireless connectivity, such users will represents a source of VoIP traffic on WLANs. Some of such applications, anyway, usually are designed on the wired LAN scenario, typically Ethernet LANS. In the wired scenario, the MAC is regulated through the CSMA/CD technique. Despite the nominal throughput over a modern IEEE82.g WLAN link is approaching the throughput offered by a Mb/s Ethernet link, there is evidence [3 6], that the current implementation of VoIP over IEEE82. WLANs is not satisfactory. There exist also a more general reason why the problem of VoIP over WLANs deserve further investigation. Voice applications, in fact, usually prefer UDP transport, and thus increase the injection of non TCP friendly flows in the Network. UDP flows, in turn, compete unfairly with TCP traffic: when such flows are served inefficiently, as it is the case of the typical current WLAN configuration, unfairness towards TCP regulated traffic may experience a sensitive increase. The paper is organized as follows. In Section II we describe the network scenario and the major problems of the existing customary configuration. In section III we revise some related work, and in Section IV we introduce a cross layer scheme designed to mitigate the aforementioned problems. Finally, Section V contains the performance evaluations and Conclusions follow. II. NETWORK SCENARIO The WLAN network considered in this work is depicted in Fig.. VoIP sessions are established towards a alien terminals outside the WLAN. In principle, the Access Point (AP) forwards the traffic towards a gateway which could connect either to a PSTN network or to the Internet. We consider a set of IEEE82. terminals which communicate through an AP, and located each within radio range. The terminals and the AP, then, form a so called one hop network under the Infrastructure mode. This typical WLAN configuration is inherently different from the plain ad hoc mode, where direct communication between terminals is permitted; in this case, in fact, all packets are destined to the AP first, and then forwarded towards the destination. Terminals, and the AP as well, employ the Distributed Coordination Function. We recall that, according to the standard [], there exist an alternative to the DCF mode, namely the Point Coordination Function (PCF). According to some authors, the Point Coordination Function would indeed provide a more performing solution for VoIP support [7, 8]. Nevertheless, the DCF configuration described above is practically mandatory, since, despite several standardization efforts, the large majority of the IEEE82. devices deployed so far run only pure DCF mode, and a sudden replacement of the existing WLANs interfaces is not likely to occur soon. In the use of the DCF, problems such as exposed node, hidden node and related issues might represent a threat for correct operation of WLANs. In the one hop scenario they are not taken into account, since every node is assumed able to listen to the channel and monitor on-going transmissions. Notice that this assumption is not reasonable, in general, since even within buildings and indoor environments, the RF fields are very difficult to be confined. Nevertheless, during the APs placement, a careful frequency assignment can limit such effects. In particular, this is the case when channels assigned to each access point do not clash with those assigned to neighboring APs; to this aim, the IEEE82. standard [] makes available channels to be assigned to WLANs. In the scenario of Fig., some of the IEEE82. terminals establish phone calls with peer voice terminals located in the PSTN/Internet. In reality, we must also account for the background traffic, since some additional stations will exploit the network access for data transfer from/to the Internet [9]. In particular, we will disregard the case of a terminal which requires a voice session and also a data channel concurrently, which might be possible in practice, but such occurrences should be handled using suitable prioritization at the application level. As pointed out in [4,, ], the DCF regulated Infrastructure mode of Fig. is affected by two problems which largely impair the support for VoIP:. asymmetry between uplink and downlink: this phenomenon is due to the symmetric share of the medium reserved to the AP in the scheme of Fig. In presence of several voice traffic sessions, the impact of the inbound flows, managed in the downlink, and the impact of the outbound flows, managed in the uplink, is different. In fact, the CSMA/CA scheme provides statistically a fair share of the medium between terminals, apart from capture issues not considered here. Thus, given voice sessions, the throughput share reserved to each
PSTN or Internet AP Fig.. The WLAN single cell scenario; no direct communication allowed between terminals. inbound voice flow will decrease quadratically with.the maximum number of voice sessions available, in turn, is dictated by the large delays experienced by downlink packets. 2. presence of alien traffic: data flows of IEEE82. terminals cannot be neglected since in the most common scenario several users may decide to establish concurrent data flows towards the Internet and make use of the AP connectivity. In the following we will devise a viable and non disruptive cross layer solution which aims at relieving such problems. III. RELATED WORK Several related papers pointed out concerns on the use of the DCF regulated Infrastructure mode. In particular, a quite common feature of voice codecs is the small payload size of the packets. In the case of the ITU G.7 codec, which packetizes a plain PCM flow, the payload is of per packet, and more enhanced codecs have even smaller payload. Thus, several authors found that the simple packet overhead added by the protocol layers used to convey the voice packet leads per se to a very low efficiency in term of the number of supported sessions. A voice packet is added of RTP overhead, of UDP header, of IP header, and, finally, of MAC header and checksum, thus adding of redundancy. The works [3, 6, ] determined the performance of a IEEE82.b WLAN using DCF with Infrastructure mode in terms of the number of permitted VoIP flows, for several codecs. The analytical study and related simulations show that only connections are possible using G.7 codec ( kbps) and using G.729 codec ( kbps). All the cited papers agree on the fact that a major limitation for the number of concurrent flows comes from the overheads introduced by the RTP/UDP/IP/MAC protocol stack. A certain attention has been given to the effect of the CSMA/CA access method: as remarked in [3], the reduction of the protocol overhead, which can be beneficial in the wired case, is limited in the case of IEEE8. DCF mode since a major overhead is introduced by the exponential backoff mechanism. For example, in the case of the IEEE 82.g PHY, which is under examination in this work, with Mbit/s both for transmit data and ACK, and a G.7 codec, we obtain for the transmission time of a voice packet SIFS DIFS " whereas the most optimistic estimation of the time spent for the average backoff stage is s(seetab.).noticethatthis estimation, does not account neither for collisions or the backoff freezing process. It also explains quite intuitively that decreasing the size of voice packets does not represent per se an advantage in this configuration, and viceversa, that a significant gain can be achieved using longer inter arrival time for voice packets [5]. All the problems described above are exacerbated by the need for coexistence of voice traffic with best effort traffic [6]. At present, very few solutions have been proposed to increase the voice capacity of WLANs, i.e. the maximum number of sustainable VoIP calls. The late 82.e amendments enhance the DCF through a new coordination function, i.e. the Enhanced DCF (EDCF) which is interoperable with the current IEEE82.g and b standards, and offers protection to real-time traffic [2]. The EDCF service differentiation of the IEEE82.e offers protection to real time traffic, but it is not designed to enlarge the VoIP capacity of IEEE82. WLANs. For a solution employing protection of VoIP through a preemptive queue see [9]. Some authors propose adaptive voice source coding in order to relieve voice traffic problems [3]. A solution, based on a booster mechanism, is proposed in [8]. In what follows, anyway, we will not address the control of the voice source, since the bottleneck is located in the downlink, and the AP wireless interface may be located several hops away from the voice sources. Nevertheless such solutions can be fruitfully employed in the uplink, but are out of the scope of the present paper. A recent contribution for the solution of the aforementioned problems is the work in [6]: the aim of the authors is to enhance the performance of VoIP over a WLAN without the need for a massive replacement of currently deployed terminals, and modifying the AP only. Our research line is along the same line, but our solution is different. The work in [6] relieves the scalability problems of IEEE82. WLANS leveraging the broadcast transmission from the access point. Such a solution, though, is prone to channel errors, since all pending downlink packets are enveloped in a long, unacknowledged broadcast frame. The aggregation technique we propose in this work represents, to some extent, a complementary solution. IV. A CROSS LAYER SOLUTION The solution presented in this paper is aimed at enhancing the number of voice sessions with no need of modifying the IEEE82. MAC protocol. In order to do so, we introduce an additional application aware module, which should be placed logically above the MAC layer. The solution we present in the following was implemented using the NS2 simulator [4]. Such a module is able to monitor all the active VoIP flows and take actions accordingly. Clearly, this means that we discriminate among different applications at the link layer, and to this end we purposely broke the transparency rule. Our simple idea was inspired IP Link Layer MAC IEEE82. Phy IEEE82. Enqueued packet VoIP flow $ Incoming packet Fig. 2. Aggregation queue; the AP monitors all the active VoIP dowlink flows and aggregates packets accordingly. on the results found in [5], where authors showed that a critical parameter is the inter arrival time of voice packets. The rational behind the scheme is that when the number of VoIP flows increases, the service time at the MAC layer becomes larger; then, it is more likely that several packets of the same VoIP flow are enqueued waiting for service. Our simple aggregation procedure, represented
G J Characteristic IEEE 82.b IEEE 82.g Voice packet inter-arrival-time( ) Voice packet length RTP layer overhead UDP layer overhead IP layer overhead MAC layer overhead ACK packet size Data rate Mbit/s Mbit/s ACK rate Mbit/s Mbit/s SlotTime ( ) SIFS DIFS(SIFS! ) Preamble length ( $ & ( $ ) * +, - ) / PLCPHeaderLength ( $ & ( $ + * ) / SIGNAL length ( $ & ( $ 2 4 ) ShortRetryLimit 5 5 Signal 8 9 extension ( 6 ) N/A / 8 9, ; = (units of SlotTime), + A (units of SlotTime) Tab.. Parameters adopted for the IEEE82.g Network; parameters of IEEE82.b are reported for comparison. in Fig 2, aims at mitigating the service time increase. Basically, the VoIP packets are monitored on a per flow basis: downtarget packets from the IP layer to the MAC layer are classified as either VoIP packets or data packets. This is feasible by inspection of the IP and RTP headers. Normal data traffic packets are simply enqueued, whereas new incoming VoIP packets are compounded togheter with VoIP packets belonging to the same flow and waiting for MAC service. To this aim, only the intermediate module represented in Fig 2 should be introduced at the Link Layer. At the terminal level, only the aggregation of the VoIP flows from the application will occur, whereas the AP will apply the same aggregation procedure to each multiplexed flow. Notice that the MAC layer is a plain IEEE82. protocol. In addition, we require to the AP the ability to operate per flow monitoring relatively to all the active VoIP sessions. The heuristic solution described before relies on the fact that the larger the service time at the MAC layer, the larger the number of VoIP packets compounded by the aggregation queue, and the larger the MAC overhead reduction. This solution, to some extent, employs a primitive form of adaptive control on the redundancy introduced by the IEEE 82. MAC. V. PERFORMANCE EVALUATION In this section we report the performance figures of the IEEE82.g WLAN solutions described before. Our main effort should be supporting as many voice sessions as possible per AP, and this is also a typical parameter against which we evaluate the performance figures [3]. We first explore the behavior of a plain IEEE82.g network. Thereafter, we will examine the capabilities offered by the IEEE82.g standard enforced with our cross layer solution. The simulation outcomes were obtained using the NS2 simulator [4], with the parameters of Tab. In this series of simulations, we considered only constant bit rate voice codecs; we expect that this is a worst case compared to the use of variable bit rate codecs, since the required average throughput is smaller in this case. We remark that the adoption of a constant bit rate encoding with no silence suppression helps to maintain UDP bindings or to fill persistently the TCP pipe [2]. Also, such codecs are quite popular due to their simplicity: the G.7 codec, for example, which we used in our simulations, is adopted by Microsoft MSN Messenger for the VoIP application [9]. The parameters of the G.7 voice codec are a generated bit rate of Kb/s and an inter frame interval of ms, corresponding to a constant payload size of. In order to assess the performance of the system for increasing number of sessions, two main reference parameters are considered in our simulation study. First, the average delay experienced by voice packets over the wireless link is related to the stability of the WLAN. This is a typical steady state parameter, and from the average delay we cannot infer the real time behavior of voice packets; nevertheless, we obtain a first general description of the impact of voice stations on the WLAN. The second parameter, the useful ratio, is the ratio of the number of packets which arrived in time and the total number of transmitted packets: it is indicative of the perceived speech quality, since each codec has a reference maximum number of loss packets over which the speech must be considered of bad quality. For the G.7 codec, the ratio should be larger than B C. We notice that there exist some degree of arbitrariness on such measures, since they should depends on the subjective perceived speech quality, which in turn depends on the codec used and also on the path over the wired part, which here is not considered. For the sake of simplicity, we resorted to ITU recommendations, and assumed a maximum reference delay budget, i.e. ms, which is also the maximum recommended tolerable one way delay in order to maintain conversation between parties [5]. We assumed that the delay over the wired path should be on the order of tenth of ms, and thus has a smaller impact on the overall delay. For larger delays on the wired path, the voice capacity figures should be scaled accordingly. Notice that, we limited to a homogeneous scenarios where all the wireless nodes use the same transmission rate. Moreover, we assumed that the ACK frame is transmitted with the highest transmission rate. Frame error events are considered by assuming a simple AWGN channel where bit errors occur independently during a packet reception, with a BER. Also, we neglected the effects due to the path loss since the distance from the AP is assumed small. A. Plain IEEE82. The first set of simulations is aimed at catching the behavior of the plain IEEE82. solution. The values reported for this set of simulations are restricted to the downlink, since the uplink packets are far within the allowed delay budget in the region where the system is stable. The up link, in particular, proves quite insensitive both to the number of voice stations and to the BER. In our experiments uplink voice packets showed an average delay never larger than ms. The typical behavior for the downlink is shown in Fig 3 for two values of the bit error rate (BER) under an AWGN channel model. The system shows an abrupt passage from stability to instability, as already found in [3, ]. Under very low BER (D E ), the workload at the AP diverges after voice H ) the bound sessions, while in the case of higher BER (D B E
G.25.2.5 Reference delay. BER=.83 6 BER=4.25 5 Downlink with ftp.25.2. BER=.83 6 BER=4.25 5.5 Reference delay Downlink with ftp.5.5 6 8 2 22 24 26 28 2 25 3 35 4 45 5.25.2.5 Reference delay. BER=.83 6 BER=4.25 5.25.2. BER=.83 6 BER=4.25 5.5 Reference delay Uplink with ftp.5.5 6 8 2 22 24 26 28 2 25 3 35 4 45 5 Fig. 3. Average voice packet delay vs. number of voice sessions. AWGN channel. Voice traffic only and ftp sessions in background. Fig. 5. Average voice packet delay vs. number of voice sessions; AWGN channel Uplink Downlink..8.6.4.2 BER=.83 6 BER=4.25 5 Downlink with ftp 6 8 2 22 24 26 28.25.2..5 BER=.83 6 BER=4.25 5.5 Reference delay 2 25 3 35 4 45 5.8.6.4.2 BER=.83 6 BER=4.25 5 6 8 2 22 24 26 28.25.2.5 Reference delay..5 BER=.83 6 BER=4.25 5 Uplink with 4 ftp 2 25 3 35 4 45 5 Fig. 4. Ratio of useful received packets. AWGN channel. Voice traffic only and ftp sessions in background. Fig. 6. Average voice packet delay vs. number of voice sessions; AWGN channel Uplink Downlink. is lowered to voice sessions. We repeated the same simulations in presence of concurrent downlink FTP sessions Fig 3. We experienced a reduction of the capacity the system of sessions only. The ratio of useful packets, is reported in Fig 4, and shows a strict correspondence with the stability bound determined before. The loss of voice capacity due to concurrent ftp flows, depicted in Fig 4, is in line with the shift in the stability bound measured by means of the average delay. B. Cross Layer Solution In the second set of simulations, we evaluated the enhancements produced by our cross layer solution. The advantage is rather apparent, the shift towards congestion at the AP is smoother compared to the case of the plain scheme, as depicted in Fig 5 and Fig 6. A similar pattern was collected under concurrent ftp sessions, as depicted in Fig 6, showing a capacity drop of roughly voice sessions. The improvement in the voice capacity is shown in the graphs reporting the ratio of useful packets, confirming the trend proved for the average delay. For small BER values sessions are allowed, whereas the higher BER value causes a drop to sessions. Compared to the plain IEEE82.g scheme, we measured a relative improvement of and, respectively, in the maximum number of simultaneous VoIP sessions. The effect of the concurrent FTP sessions causes a drop of roughly sessions (Fig 8) in the number of sustainable voice sessions. We also made a cautionary check in order to ensure that the proposed scheme, which to some extent prioritizes VoIP flows, does not improve voice capacity causing data traffic starvation. The graph of Fig 9 shows, indeed, that the delay experienced by data packets, for the same number of voice sessions, is close to the service time experienced by voice sessions. Finally, close to the voice capacity of the network, i.e. voice sessions and BERD E, the average number of compounded packets is on the order of. We ascribe the sensitivity of our scheme to the BER to the increased size of aggregated packets; this might suggest either to apply further header compression on the IP and UDP headers, or FEC protection at the packet level. VI. CONCLUSIONS AND FINAL REMARKS One important conclusion of our work is that the voice capacity of IEEE82. infrastructured DCF networks can be increased through a non invasive procedure, simply feeding the MAC layer through a compounding module, able to aggregate the packets which belong to the same VoIP flow, and to deliver them in the same frame. We showed that this simple solution is able to relieve congestion at the AP, which represents the bottleneck of
E J.8.6.4.2.6.4.2 BER=.83 6 BER=4.25 5 Downlink with ftp 2 25 3 35 4 45 5.8 BER=.83 6 BER=4.25 5 Uplink with ftp 2 25 3 35 4 45 5 infrastructured DCF regulated WLANs, using a plain IEEE82.g MAC. Overall, we showed that through simple aggregation we a) increase the number of voice sessions served compared to a plain IEEE82.g solution; b) serve several concurrent best effort sessions ( in our experiments); c) avoid starvation of the best effort traffic. As a final remark, we notice that the RTP protocol contains recommendations for packet compounding [6], 6.2. Thus, concerning terminals, the aggregation module for the uplink could be implemented with a simpler mechanism relying only on RTP procedures. Conversely, at the AP, due to the need to multiplex a larger number of flows, an hardware/firmware implementation is due. Thus, based on the indications of this preliminary work, our research direction will be to design a solution able to increase the number of VoIP sessions sustained by a WLAN, being completely compatible with existing terminals, and modifying the AP only..6.4.2 Fig. 7..8.6.4.2 BER=.83 6 BER=4.25 5 Ratio of useful received packets. 2 25 3 35 4 45 5.8.25.2..5 BER=.83 6 BER=4.25 5 Uplink with 4 ftp 2 25 3 35 4 45 5 Fig. 8. Voice packet Data packet Reference delay.5 Ratio of useful received packets. Downlink voice packet delay vs. ftp data packet delay (BER=4.25 5 ) 2 25 3 35 4 45 5 REFERENCES [] IEEE LAN MAN Standards, Part : Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, ANSI/IEEE Std 82., 999 Edition. [2] S. A. Baset and H. Schulzrinne, An Analysis of the Skype Peerto-Peer Internet Telephony Protocol, Technical Report, Columbia University, CUCS-39-4, 24. [3] S. Garg and M. Kappes, Can I add a VoIP Call?, in the proceedings of IEEE International Conference on Communications, Anchorage, 5 May, 23. [4] S. Garg and M. Kappes, On the Throughput of 82.b Networks for VoIP, Technical Report, Avaya Lab. Res., 22. [5] K. Medepalli, P. Gopalakrishnan, D. Famolari and T. Kodama Voice Capacity of IEEE 82.b, 82.a and 82.g WLAN Systems, in the proceedings of IEEE GLOBECOM, Dallas, Texas, 29 Nov. 3 Dec., 24. [6]W.Wang,S.C.LiewandV.O.K.Li, Solutions to Performance Problems in VoIP Over a 82. Wireless LAN, IEEE Transaction on Vehicular Technology, vol. 54, no., Jan. 25. [7] M. Veeraraghavan, N. Cocker, and T. Moors, Support of Voice Services in IEEE 82. wireless LANs, in the proceedings of INFOCOM 2, pp. 488 497, Anchorage, Alaska, April 22 26, 2. [8] A Kopsel and A. Wolisz, Voice Transmission in an IEEE 82. WLAN Based Access Network, in the proceedings of the ACM WoWMoM 2, pp. 24 33, Rome, Italy, July 2. [9] J. Yu, S. Choi and J. Lee, Enhancement of VoIP over IEEE 82. WLAN via dual queue strategy, in the proceedings of IEEE ICC, Paris, 2 24 June 24. [] D. P. Hole and F. A. Tobagi, Capacity of an IEEE 82.b Wireless LAN Supporting VoIP, in the proceedings of IEEE ICC, Paris, 2 24 June 24. [] M. Coupechoux, V. Kumar abd L. Brignol Voice Over IEEE 82.b Capacity, in the proceedings of ITC Specialist Seminar on Performance Evaluation of Wireless and Mobile Systems, Aug. 3 Sept. 2, 24. [2] S. Mangold, S. Choi, G. R. Hiertz, O. Klein and B.Walke, Analysis of IEEE 82.e for QoS Support in Wireless LANs, IEEE Wireless Communications, pp. 4 49, Dec. 23. [3] A. Servetti and J.C. De Martin, Adaptive Interactive Speech Transmission over 82. Wireless LANs, in the proceedings of Workshop on DSP in Mobile and Vehicular Systems, Nagoya (Japan), April 3 4, 23. [4] The Network Simulator - ns-2, available online at http://www. isi.edu/nsnam/ns/. [5] ITU-T Recommendation G4, Enhancement Transmission Systems and Media General Characteristics of International Telephone Connections and International Telephone Circuits, February 996 [6] Schulzrinne et al., RTP: A Transport Protocol for Real-Time Applications, RFC 889, January 996, available online http://www. ietf.org/rfc/rfc889.txt?number=889. Fig. 9. Delay of ftp packets vs. delay of voice packets; BER.