1 Comparison of QoS Guarantee Techniques for VoIP over IEEE Wireless LAN Fanglu Guo Tzi-cker Chiueh Computer Science Department Stony Brook University ABSTRACT An emerging killer application for enterprise wireless LANs (WLANs) is voice over IP (VoIP) telephony, which promises to greatly improve the reachability and mobility of enterprise telephony service at low cost. None of the commercial IEEE WLAN-based VoIP products can support more than ten G.729-quality voice conversations over a single IEEE b channel on real-world WLANs, even though the physical transmission rate is more than two orders of magnitude higher than an individual VoIP connection s bandwidth requirement. There are two main reasons why these VoIP systems effective throughput is significantly lower than expected: VoIP s stringent latency requirement and substantial per-wlan-packet overhead. Time-Division Multiple Access (TDMA) is a well-known technique that provides per-connection QoS guarantee as well as improves the radio channel utilization efficiency. This paper compares the effective throughput of IEEE , IEEE e and a software-based TDMA (STDMA) protocol that is specifically designed to support WLAN-based VoIP applications, on the same commodity IEEE WLAN hardware. Empirical measurements from a VOIP over WLAN testbed show that the numbers of G.729-quality voice conversations that IEEE , IEEE e and STDMA can support over a single IEEE b channel are 18, 22 and 50, respectively. 1. INTRODUCTION Voice over Wireless LAN (VoWLAN) is touted as a killer application for enterprise WLANs because it significantly improves the coverage and mobility of enterprise telephony services. In fact, vendors such as Aruba Networks, Symbol, SpectraLink and Cisco have been shipping VoIP phones specifically designed for IEEE based WLANs. However, there are still several technical barriers facing the VoWLAN technology that need to be overcome before it can truly take off. The first barrier is lack of Quality of Service (QoS) support. ITU-T G recommends the the maximum one-way voice packet delay be below 150 msec. Because 150 msec is the end-to-end path delay budget, it means the wireless LAN channel access delay must be considerably less than 150 msec. In addition, most voice codec specifications such as G require the packet loss ratio in voice connections to be less than 1% to avoid audible errors. The original IEEE WLAN standard only supports a best-effort service model. Almost all existing WLAN deployments operate in the DCF (Distributed Coordination Function) mode, in which each wireless station accesses the shared radio channel using an Ethernet-like medium access mechanism, which is inadequate for real-time voice applications for the following reasons. First, it is impossible to have precise control over the exact transmission timings of voice frames because of collision and random back-off. When the injected traffic load is high, there is more collision and interference, which increases packet loss rate, packet delay and packet delay jitter. Second, it is impossible to prioritize all traffic flows. Consequently, large-volume non-real-time data traffic such as FTP may consume a large proportion of the network capacity, leaving time-sensitive voice traffic to suffer the consequences. The second technical barrier associated with IEEE WLAN is substantial per-packet transmission overhead. Because the maximum transmission rate of an IEEE b WLAN link is 11 Mbps, in theory an IEEE b link should be able to support hundreds of VoIP connections if each of them requires only 8Kbps as is the case with G.729. In practice, several benchmarking tests 3, 4 reported that existing VoWLAN products on the market cannot support more than ten concurrent VoIP calls with comparable quality to toll calls over a single IEEE b channel. A major cause for this gap between theory and practice is the considerable overhead associated with a WLAN packet s transmission. This overhead includes per-packet header
2 bits, link-layer acknowledgment, back-off delay to avoid contention, retransmission due to channel interference and inter-frame spacing for synchronization. IEEE e 5 is designed to provide QoS support for time-sensitive applications on wireless LANs. It supports two new channel access mechanisms: EDCA (Enhanced Distributed Channel Access) and HCCA (Hybrid coordination function (HCF) Controlled Channel Access). EDCA improves upon DCF (Distributed Coordination Function) by introducing four queues, each of which corresponds to a different priority of accessing the shared radio channel. More specifically, each queue is assigned a different combination of medium access parameters, including AIFS (Arbitrary Inter-Frame Space), CWmin and CWmax. Although EDCA can effectively prioritize voice traffic over data traffic, it cannot guarantee the QoS of individual VoIP connections from multiple WLAN nodes when they are competing with one another for a shared medium. In fact, when the number of voice connections increases, EDCA actually increases the probability of collision because of its aggressive medium access parameters such as smaller CWmin and CWmax. HCCA is an enhanced version of PCF (Point Coordination Function), and supports a centralized polling scheme that could schedule network connections according to their bandwidth demand and priority. Unfortunately, most commodity WLAN interface products do not implement HCCA or PCF. In summary, IEEE e takes the right first step toward solving the WLAN s QoS problem by supporting traffic prioritization. But it is ineffective in the face of a large number of voice connections. Neither does it solve the problem of substantial per-packet transmission overhead. The research question we set out to answer in this project is the following: What is the maximal VoIP capacity of a commodity IEEE based WLAN? In other words, this project exclusively focuses on IEEE s MAC protocols and their implementations in commercial WLAN hardware. Towards that end, we develop a software-based Time Division Multiple Access (STDMA) protocol on top of DCF, the default operating mode of most IEEE WLANs, that is designed to simultaneously solve the QoS and per-packet transmission overhead problems associated with IEEE based WLANs. Using a fully operational STDMA prototype as the base, we conducted a comprehensive performance comparison among STDMA, IEEE and IEEE e, and found that STDMA indeed drastically improves the VoIP capacity of an IEEE WLAN channel, because it reduces the collision probability to the minimum and largely eliminates the back-off overhead. Moreover, STDMA is capable of providing individual VoIP connections QoS guarantees on packet loss, packet delay and delay jitter. 2. RELATED WORK The Wireless Rether project 6 tried to provide hard QoS guarantee on based WLANs through a softwarebased token passing scheme. Unfortunately software-based token passing incurs too much overhead to be practical. The IEEE e standard 5 enhances WLAN s QoS by supporting traffic prioritization. Its EDCA mechanism can give better QoS to voice traffic when compared with data traffic. However, this mechanism is not at all effective when all traffic sources carry voice traffic. Gu and Zhang 7 simulate the EDCA mechanism to evaluate its effectiveness in supporting voice connections QoS. They used 4 stations each transmitting at a different priority, and found that EDCA indeed can provide higher throughput and lower latency to high-priority traffic. But they didn t test EDCA under the scenario in which all 4 stations transmit at the highest priority. In addition to EDCA, Garg et al. 8 also evaluated the HCCA, and found that HCCA can improve channel utilization and provide better QoS support. They also found that EDCA may require significant tuning to offer better QoS for high priority traffic. SpectraLink, 9 a leading VoWLAN phone vendor, proposes a simple QoS enhancement mechanism called SpectraLink Voice Priority (SVP), which features two basic ideas: AP transmitting voice frames with backoff value of zero and always queuing voice frames in the head of the transmission queue. This solution is used to improve the AP s priority of acquiring the WLAN channel for downstream voice traffic. PCF and HCCA can be treated as variants of TDMA. In PCF and HCCA, the AP uses explicit polling to ensure contention-free access to shared radio medium. Unfortunately, polling is quite expensive for workloads with small packets such as VoIP because each VoIP packet incurs an extra polling packet overhead. Empirically polling packets alone could decrease the effective network capacity at least by half. Furthermore, because most WLAN interfaces do not support PCF or HCCA, polling has to be implemented in software, which will introduce even higher overhead because its inter-frame space cannot be as small as PCF, 10, 11 and further decreases the network capacity. In contrast, STDMA uses implicit time slotting to achieve the same effect without incurring
3 Schedule Cycle (20 msec) Schedule Cycle (20 msec) A N VDown Up0 Up1 C DDown A Voice Period Data Period Schedule Announcement Frame Contention Slot Figure 1. The channel time allocation of the software-based TDMA (STDMA) protocol. Time is divided into cycles, each of which consists of a schedule announcement slot, which notifies all STAs of their channel time slots during a schedule cycle, a voice period, which is for voice frames, and a data period, which is for non-voice data frames. a per-sta polling frame overhead. In exchange, STDMA needs clock synchronization, and per-scheduling-cycle announcement frames to avoid wasting radio channel slots on STAs that are idle. Given the limited capacity of existing VoWLAN systems, it is not surprising that many researchers try to improve its capacity. Wang et al. 12 proposes to aggregate downstream voice traffic into a smaller number of larger frames that are then multicasted. This scheme can nearly double the capacity of VoWLAN systems because it dramatically decreases the transmission overhead of downstream voice frames. The Atheros chipsets 13 support several new features to improve the capacity of WLAN links. The bursting feature allows a WLAN card to transmit frames back to back without incurring the per-frame backoff overhead. The fast frame feature allows a WLAN card to aggregate two frames into one larger frame. The compression feature allows a WLAN card to compress data frames. But these techniques cannot be applicable to VoIP traffic because typically consecutive frames within a voice connection are spaced out by 20 to 30 msec. It is difficult to aggregate them to larger frames or to exploit bursting. Compression is not that useful either because voice data is already compressed. 10, 11 Neufeld et al. in their SoftMAC project mentioned some techniques to override the MAC, for example, controlling ACK and backoff through software. They also mentioned an TDMA MAC project on based WLAN NIC without providing concrete design and implementation details. Rao and Stoica 14 proposed an overlay MAC layer on based WLAN NIC to mitigate the poor fairness and performance problem due to hidden nodes and to mitigate other MAC anomalies. It also uses TDMA to regulate frame transmission but its design is not geared towards VoIP applications. Its scheduling slot size is 10 msec and detects inactive nodes through long timeout. In VoWLAN traffic, each station uses a time slot of less than 0.5 msec. In this time scale, timeout-based inactive node detection and coarse timing control is too inefficient to be reusable. 3.1 Design 3. DESIGN AND IMPLEMENTATION OF STDMA As shown in Figure 1, the proposed STDMA protocol divides the channel time into schedule cycles, each of which is set to 20 msec in the current prototype and corresponds to the packetization interval of VoIP applications. Each schedule cycle in turn is broken down into three parts: a schedule announcement frame, a voice period, and a data period. The schedule announcement frame specifies how the channel time is allocated among WLAN nodes in the current schedule cycle. The voice period consists of multiple time slots, one for each active voice connection s upstream traffic. The AP transmits all downstream voice traffic in a single time slot, which is at the end of the voice period. The data period is also divided into time slots, each of which is used for transmission of non-voice data frames. Again upstream data traffic is scheduled first, then a contention slot is scheduled for non-active WLAN stations (STAs) to send their traffic request frames (explained later), and finally comes the downstream data time slot allocated to the AP. For example, in Figure 1, the first frame in the schedule cycle is the schedule announcement frame. Then the voice period starts. Each voice traffic-carrying STA uses one of the N voice time slots to transmit upstream voice frames. At the end of the voice period, the AP uses the VDown time slot to transmit downstream voice frames. In the data period, STAs transmit upstream non-voice data frames in time slots Up0, Up1, etc. After the upstream time slots, STAs that are not scheduled in the current If a VoIP application s packetization interval is different from 20 msec, its voice samples should be grouped into 20-msec chunks before being sent out.
4 cycle can send traffic request frames in the contention slot, C. Finally the AP uses the downstream data time slot, DDown, to transmit non-voice data frames to STAs. The AP broadcasts a schedule announcement frame at the beginning of each schedule cycle. This frame serves the double purposes of synchronizing STAs and telling each STA its associated channel slot times for upstream voice and data transmission. Because each STA is guaranteed its own slots, upstream data transmission is largely free of collision. Consequently, the packet loss, delay and jitter can be strictly controlled. In each voice period, every active STA transmits its upstream voice traffic in its own slot according to the schedule announcement frame. The AP aggregates and transmits all downstream voice frames in one voice slot. Ideally, if silence suppression works well, a voice connection only needs one channel slot for both upstream and 15, 16 downstream traffic because voice communication is half-duplex in nature. Unfortunately, several studies showed that state-of-the-art silence suppression cannot effectively suppress most of the background noise. As a result, VoIP traffic is largely full-duplex in practice and therefore downstream and upstream traffic of a voice connection cannot be time-multiplexed on a single time slot. One way to transport a voice connection s downstream traffic is to allocate a separate downstream time slot immediately after each STA s upstream time slot. This design saves power because each voice traffic-carrying STA in principle only needs to wake up during its associated time slots in each schedule cycle. However, a more efficient design, which the current STDMA prototype chooses, is to aggregate and transmit all downstream voice traffic to all STAs in one time slot. This design allows an AP to exploit statistical multiplexing among multiple voice connections, to aggregate multiple voice frames into one physical frame to reduce per-packet transmission overhead, 12 and to use e s TXOP limit mechanism 5 to transmit a batch of frames consecutively at the physical layer. With TXOP limit, once a WLAN device grabs a radio channel, it can transmit multiple frames with an inter-frame space of T sifs (10 µsec) until the TXOP limit is reached. The TXOP limit mechanism significantly improves a WLAN s channel utilization efficiency because less time is wasted on inter-frame spacing and channel contention. STDMA varies the length of the voice period dynamically according to the number of active concurrent voice connections. Because the schedule cycle is 20 msec in the current STDMA prototype, the maximum size of the voice period is limited to 19 msec, with the remaining 1 msec serving as a contention slot (explained later) for low priority data traffic. When the voice period is filled up, STDMA cannot assign a voice slot to a new voice connection request, and treats packets in this new voice connection as non-voice data traffic instead. Consequently, the new voice connection may experience poor network QoS because its voice packets do not have reserved voice slots. When existing voice connections terminate and their voice slots are reclaimed, STDMA can assign voice slots to new voice connections again. Unlike traditional circuit-switched telephone systems, the current STDMA prototype does not explicitly block new voice connections when a WLAN s VoIP capacity is exhausted; it puts them in the lower priority (the data period) so as to maintain the quality of existing voice connections. The data period is used for STAs and the AP to transmit non-voice data frames. Unlike voice traffic, which has constant per-connection bandwidth requirement, the bandwidth demand of data traffic fluctuates considerably. The key challenge in scheduling the data period is how to maximize its channel utilization efficiency without disrupting the QoS of active voice connections. Without proper control, too many STAs may try to transmit frames during a data period, stretch the data period into the next schedule cycle and disrupt the QoS guarantee of subsequent voice frames. To avoid this disruption, STDMA first schedules upstream data traffic from STAs, then a contention slot C and finally downstream data traffic from the AP in DDown. STAs that did not transmit data in previous schedule cycles submit a traffic request frame to the AP during the contention slot. Because traffic request frames are small, even though they could cause collision, they are not likely to stretch the contention slot significantly. When a contention slot is stretched, the downstream data slot DDown can be shortened to accommodate any such stretch when it happens. In the initial state, as soon as an STA has data to transmit, it submits a traffic request frame in the next available contention slot. The traffic request frame includes traffic load information, i.e., the channel time required to empty the data currently in its queue. Different transmission rate is not a problem because the channel time in the traffic request frame is computed from both the size of the buffered data and the transmission rate. The AP determines the amount of channel time allocated to this STA and schedules a data slot for it in the following data period. After an STA starts transmitting upstream packets, whenever it transmits an upstream packet, it
5 piggybacks into the packet a report of its new traffic load. Based on load information in the piggybacked reports, the AP could accurately keep track of each STA s load, and schedules their packets in a way similar to weighted fair queuing. 17 After an STA reports to the AP that its transmission queue is empty, the AP will stop scheduling for the STA until receiving a traffic request frame from the STA in some future contention slot. Compared with polling, which is used in PCF and HCCA, and requires the AP to constantly poll all STAs to determine their load and transmission schedule, TDMA is much more efficient especially for software implementation, because the overhead associated with software-based polling on WLAN is quite high. 6 In addition, because data traffic load tends to be sparse and bursty, constantly polling STAs incurs a substantial channel scanning overhead. 3.2 Implementation The STDMA prototype is designed as an event-driven system, and is layered between the WLAN device driver and Linux s TCP/IP protocol stack. It sets CWmin to zero in order to disable IEEE s backoff-based medium access control mechanism. It replaces Linux s packet transmission queues with its own queues to exert strict control over each packet s physical-layer transmission timing. In STDMA, every STA in a collision domain must agree on which node is scheduled to transmit at any point in time, and be able to transmit its data precisely at the assigned time slot. This requires all STAs to agree on a global clock, and to have precise control over the timing of their frame transmission. The STDMA prototype exploits the TSF (Time Synchronization Function) counter on the WLAN NIC 18 to achieve accurate clock synchronization. When an STA receives a beacon frame or a probe response frame from the AP, its WLAN NIC sets its TSF counter to the value in the received frame without software intervention. Therefore the synchronization between an STA s TSF counter and that in its associated AP is done by hardware and thus is not affected even if NIC interrupt is disabled. Because it is the CPU that controls frame transmission timing, the STAs and the AP need to synchronize their CPU times. To do so, the AP periodically includes its TSF counter value in a schedule announcement frame broadcasted to all STAs, and each STA waits on a timer that is set to expire several microseconds earlier than the synchronization period and spins until its local TSF counter reaches the next multiple of the synchronization period. This method allows the STDMA prototype to achieve microsecond-level clock synchronization accuracy. Even if an STA s clock is successfully synchronized with the AP s clock, it still needs a way to ensure that each of its packets be sent at the proper moment. On most Unix-like OSs including Linux, jiffy update frequency determines the timer resolution, which is generally at millisecond-level granularity. Because STDMA needs to control frame transmission timing at microsecond-level accuracy, high-resolution timer is required. The STDMA prototype modifies the Linux High Resolution POSIX timer project 19 to achieve microsecond-level timer resolution. We modify its API so as to make it accessible to the kernel. Under this timer implementation, 99.4% of the timer events expire within 10 µsec of the target timer value. The STDMA prototype uses this high-resolution timer to transmit data and voice frames according to the schedule announcement frames: When a timer expires, the timer handler transmits frames from the corresponding transmission queue to the device driver. The global traffic scheduler running on an AP handles events from the AP s local packet scheduler and the device driver, and computes an updated schedule based on the events received. When the global traffic scheduler receives a new voice connection request from either the AP s local packet scheduler or the device driver, it checks if a time slot is already allocated to the requesting voice connection. If yes, the request is ignored; otherwise, a new time slot is allocated and added to the end of the voice period. The global traffic scheduler detects the termination of a voice connection by checking if the corresponding time slot lies idle for a period of time that exceeds a certain threshold. When a time slot is reclaimed, it may not be at the end of the voice period. Therefore, the global traffic scheduler needs to move all the following voice slots up in the voice period. When the global traffic scheduler receives an STA s data traffic request frame in a contention slot, it allocates a data slot for the STA, and modifies this STA s data slot according to its traffic load reports. There are two ways for an STA to report its network load to the AP. First, an STA can report the total channel time required to empty its transmission queue, as is used in IEEE e. Second, an STA can report the size of each data
6 AP Computer AP RF monitor 12 WLAN cards 12 WLAN cards 12 WLAN cards STA Computer1 STA Computer2 STA Computer3 Figure 2. The testbed used to compare the performance of STDMA, IEEE and IEEE e. There are totally 36 WLAN cards associated with one AP, each of which logically corresponds to a distinct STA. The 36 WLAN cards transmit voice frames to the AP and the QoS of each voice connection in terms of packet delay and delay jitter is measured and reported. packet in its queue. The second method allows the global traffic scheduler to perform more accurate allocation of channel time to individual STAs because it provides packet boundary information, but such traffic load reports take more space. The current STDMA prototype uses the first method for its implementation simplicity, and embeds the total channel time in the TXOP Duration Requested subfield of the newly added e QoS field. When an STA changes its transmission encoding rate, the STA will report the updated total channel time to the global traffic scheduler. 4. PERFORMANCE EVALUATION We used the testbed shown in Figure 2 to evaluate the performance of the STDMA prototype and compare it with the IEEE and IEEE e standard. There are four computers connected through a 100Mbps Ethernet switch. All of them are DELL PowerEdge 400SC, which features a 2.26 GHz Pentium-4 CPU and 256 Mbytes of memory. One computer serves as an AP and the other three serve as STAs. Each STA computer hosts 12 Wistron NeWeb IEEE a/b/g mini-pci cards (Model No. CM9), which are connected to the computer through three 4-port mini-pci to PCI adaptors. The NeWeb mini-pci card uses the AR5004X chipset, which consists of an AR5213 MAC controller chip supporting IEEE e EDCA, and an AR5112 dual-band radio. The AP computer hosts two Wistron NeWeb mini-pci cards, one of them working as an AP while the other working as an RF monitor to verify that the STDMA prototype indeed works as expected. All WLAN cards operate in the IEEE b mode with short preamble support enabled. ACK frames are encoded in 11Mbps. These computers are located within a 5-meter range and thus always transmit frames at the rate of 11Mbps. All computers run Linux , which includes the MadWifi driver madwifi-ng-r and the High Resolution POSIX timer i386-hrt patch, 19 and uses NTP to synchronize their clocks over the wired Ethernet link so that we can measure the one-way packet latency. We use a UDP sender program to generate traffic at a specific packet rate (packet per second), per-packet payload size, sequence number, and time stamp. To emulate VoIP traffic, we set the UDP payload size to 32 bytes (12 bytes RTP + 20 bytes G.729 voice). An UDP receiver program measures the throughput and packet loss ratio based on the sequence number. The packet delay and delay jitter are computed based on the time stamps taken at the socket send/receive interface to reflect the end-to-end delay perceived by the VoIP application. Packet delay jitter is the difference between maximum and minimum packet delay measurements. 4.1 Packet Transmission Overhead The time required to transmit an WLAN frame can be broken down into the following components: T frame = T difs + T backoff + T data + T sifs + T ack Using the default parameters defined in the IEEE e standard, 5 we can compute the average value of each delay component, and the results are shown in Table 1. In addition, we assume the voice codec is G.729,
7 Component Time % Detailed Description (µsec) T difs % As defined in the IEEE b standard T backoff % T backoff = R T slot. R is a random variable between 0 and CWmin (7). So its average is 3.5. T slot is 20. T data % T data = T plcp + L data R. L data is the length of data. The data part includes WLAN header (26 bytes by adding QoS extension), Logical-Link Control (LLC) (8 bytes), IP header (20 bytes), UDP header (8 bytes), RTP 22 header (12 bytes), voice data (20 bytes for G.729), and FCS (4 bytes). So the total data length is 98 bytes. When using an encoding rate R of 11 Mbps, this takes 72 µsec. When using short preamble (T plcp as 96 µsec), the total is 168 µsec. T sifs + T ack % The ACK frame is similar to data frame. It has 14 bytes data and is encoded in 11 Mbps in our testbed. So the data portion takes 11 µsec. When using short preamble, 96 µsec, the total ACK time is 106 µsec. With T sifs as 10 µsec, the total ACK overhead is 116 µsec. Total 404 Table 1. Detailed breakdown of the time (µsec) required to transmit a frame in an infrastructure-mode IEEE b WLAN. Backoff and ACK overheads are the dominant components for small WLAN frames such as voice packets. Test Throughput Loss Ratio Delay Jitter Configuration (Kpps) (%) (msec) (msec) Vanilla e Vanilla e, TXOP Limit = AIFS=SIFS, CWmin=0, TXOP Limit = AIFS=SIFS, CWmin=0, TXOP Limit = Table 2. The baseline performance in terms of throughput, packet loss ratio, average packet delay and average packet delay jitter under different medium access control parameter settings. There is only one sender and one receiver in this experiment. TXOP Limit is in terms of µsec. Throughput is measured in terms of kilo packets per second (Kpps), where each packet is a 32-byte UDP packet. Headers of UDP, IP, Ethernet and IEEE are not counted toward this 32 bytes. one of the most commonly used codecs for Internet telephony. G.729 captures 20 msec worth of voice samples and compresses them into a 20-byte voice frame. Of the 404-µsec transmission time for a G.729 voice frame, the voice data part T data accounts only for 41%. If one can remove T backoff and T sifs + T ack completely, it is possible to improve a WLAN channel s capacity by a factor of at least 2. Other optimization techniques such as IP/UDP/RTP header compression are less effective because the part of a WLAN frame that corresponds to headers of protocol layers above the link layer accounts for less than 72 µsec. 4.2 Baseline Test: Single Sender and Single Receiver The goal of the baseline test is to empirically establish an upper bound on what the STDMA prototype can deliver in the best case, when there is no contention. In this test, only one STA transmits UDP traffic to the AP, using the default IEEE e voice queue, which sets AIFS, CWmin, CWmax and TXOP limit to 50 µsec, 7 slots, 15 slots, and 3264 µsec, respectively. Each backoff slot is 20 µsec. The UDP sender program pumps traffic as fast as possible until the throughput as perceived by the receiver saturates. Each UDP packet carries a payload of 32 bytes to emulate a G.729 voice frame. The measurement results of this test are shown in Table 2. With the default IEEE e voice queue, the maximum throughput that can be achieved is 3.1 Kpps (kilo packets per second) with a packet loss ratio of 0.39%, average packet delay of 1 msec and packet delay jitter of 17 msec. The 3.1 Kpps throughput means that each frame only takes 322 µsec on the average. However, from Table 1, each data frame and its ACK together already take 284 µsec. So the average inter-frame spacing of this test is 38 µsec, which is even smaller than AIFS used in IEEE e (50 µsec). This discrepancy arises
8 Loss Ratio (%) Loss Ratio (%) Time Slot Size (usec) Figure 3. The impact of voice slot size on packet loss ratio. When the voice slot size is too small, multiple STAs may collide. Because link-layer retransmission is disabled, collision leads to packet loss Timing Delay (usec) Figure 4. The impact of clock synchronization error on packet loss ratio. Any synchronization error smaller than 180 µsec has no impact on packet loss ratio. because of the TXOP mechanism, which allows multiple frames to be transmitted with an inter-frame space of T sifs (10 µsec) until the TXOP limit is reached. To verify this theory, we disabled the the TXOP mechanism, and the throughput of IEEE e indeed drops to 2.2 Kpps. This result shows that the TXOP mechanism can indeed substantially improve a WLAN channel s throughput. We used this feature in the downstream voice time slot design to improve the channel utilization efficiency. Unfortunately, TXOP is not useful for upstream VoIP traffic because packets in a voice connection are spaced apart, i.e., after the N-th packet is transmitted, the N + 1-th packet will not arrive until 20 msec later. After disabling ACK, setting AIFS to SIFS and CWmin to 0, the throughput between an STA and an AP can reach 5.36 Kpps with a packet loss ratio of 0.43%, average packet delay of 1 msec, and delay jitter of 7 msec. The 5.36 Kpps throughput means each frame takes only 186 µsec. From Table 1, each data frame costs 168 µsec. So the average inter-frame spacing is 18 µsec. Since AIFS is already reduced to SIFS in this case, it is not clear if the TXOP mechanism has any performance impact. To answer this question, we set the TXOP limit to 0 and measure the throughput again. The maximum throughput in this case is 5 Kpps with a loss ratio of 0.4%, which suggests the TXOP mechanism still helps a little bit even when AIFS is set to SIFS. Theoretically the TXOP mechanism should not help because there is no backoff and AIFS is already reduced to SIFS. Our conjecture is that the Atheros chipset may not faithfully implement AIFS as SIFS. That is, only some frames are transmitted with inter-frame spacing as SIFS. When the TXOP mechanism is enabled, this implementation deficiency is mitigated. According to the IEEE e standard, the minimum AIFS for AP and STA are PIFS and DIFS respectively. But the Atheros chipset allows one to set AIFS to SIFS. This conjecture is confirmed with the result that the measured throughput decreases when AIFS is set to PIFS or DIFS. 4.3 STDMA Timing The baseline test shows that removing the per-packet ACK and backoff overhead could increase the throughput of a single-sender-single-receiver WLAN link from 3.1 Kpps to 5 Kpps. However, when there are multiple senders sharing a WLAN link, its throughput is likely to decrease because of the additional overhead of arbitrating their accesses to the shared channel. The theoretical packet rate throughput of TDMA is inversely proportional to the time allocated to transmit one voice frame (voice slot size). To empirically determine the minimum voice slot size, we set AIFS to SIFS, CWmin to 0, and the TXOP limit to 0, and vary the temporal distance between consecutively transmitted voice frames. There are three STA computers each having 12 WLAN cards, and the i-th WLAN card on the K-th STA computer (K = 0, 1, and 2) is assigned an ID of K + 3 i. We transmit a voice frame from each WLAN card in a separate voice slot in an increasing order of its ID, i.e., from 0 to 35 and then 0 to 35, etc. This card numbering scheme and the frame transmission order together ensure that no two consecutively transmitted voice frames come from the same STA computer, and if these STA computers are not properly synchronized, collision will occur every 2 or 3 frames.
9 MAC Cards Direction Loss Ratio Latency Jitter (%) (msec) (msec) Up Down Up Down e 22 Up e 22 Down e 23 Up e 23 Down STDMA 50 Up STDMA 50 Down Table 3. Comparison among IEEE , IEEE e and STDMA in terms of the number of voice calls that they can support over a single channel of an IEEE b WLAN assuming each call is a two-way 8Kbps voice connection. Figure 3 shows the resulting overall packet loss ratio when we vary the voice slot size. When the voice slot size is smaller than 200 µsec, the observed packet loss ratio rises sharply because of excessive collision, which in turn leads to packet losses because link-layer retransmission is disabled. Increasing the voice slot size decreases the packet loss ratio slightly, but at the cost of a substantial degradation in packet throughput, because the channel lies idle unnecessarily. Therefore we conclude that 200 µsec is the best voice slot size that strikes the optimal tradeoff between packet loss and channel throughput, and use this value in all subsequent experiments. Although the clock synchronization scheme in STDMA can achieve micro-second accuracy, it is not perfect, because the high-resolution timer still might miss-fire from time to time. An immediate question is the performance impact of this clock synchronization error. To answer this question, we intentionally delay the transmission time of odd-numbered WLAN cards by a fixed amount with respect to a perfect schedule, and measure the impact of this intentional delay on the resulting throughput. We vary this delay from 0 to 200 µsec, and the measured packet loss results are shown in Figure 4. When the delay is 0, it means that each WLAN card with an odd-number ID starts frame transmission exactly at the beginning of its assigned slot. When the delay is 200 µsec, it means the WLAN cards with an odd-number ID start frame transmission at the same time as the next WLAN card. From Figure 4, it is clear that any clock synchronization error that is smaller than 180 µsec has no significant impact on the packet loss ratio. This is because IEEE s physical-layer carrier sensing mechanism can still effectively avoid collision when the voice slots of multiple WLAN cards overlap with each other. Only when multiple WLAN cards attempt frame transmission exactly at the same time will collisions occur and cause the packet loss ratio to shoot up abruptly. This result demonstrates that small clock synchronization error in the STDMA prototype, i.e., 99.4% of the timer events in the STDMA prototype expire within 10 µsec of the target timer value, will not cause any performance problems in practice, because STDMA can nicely leverage the hardware MAC control mechanism to make up for its lack of perfect transmission timing control. 4.4 Throughput Comparison Among STDMA, e and The goal of this experiment is to compare the effective throughputs of STDMA, IEEE e and IEEE in terms of the number of two-way 8Kbps voice connections they can support over a single channel of an IEEE b WLAN. In this experiment, each WLAN card on the STA computers transmits to the AP packets of the total size of 98 bytes and at a rate of 50 packets per second (pps). At the same time, the AP transmits packets to each WLAN card on the STA computers with the same packet size and rate. This set-up emulates a constant-rate two-way voice communication. To test IEEE , we use the Atheros chipset s best-effort queue. To test e, we use the Atheros chipset s voice queue, which is the same as in the baseline test. For STDMA, we use the Atheros chipset s voice queue and use a 200-µsec voice slot size. To emulate more than 36 STAs using 36 WLAN cards, some of them need to support two voice connections, i.e., transmit/receive every 10 msec rather than every 20 msec.
10 Theoretical Throughput Measured Throughput 6 Theoretical Throughput Measured Throughput Data Throughput (pps) TCP Data Throughput (Mbps) The Ratio of Data Period over One Schedule Cycle (%) The Ratio of Data Period over One Schedule Cycle (%) (a) (b) Figure 5. (a) The measured and theoretical UDP data throughput of the STDMA prototype as the percentage of the data period in each schedule cycle is varied. (b) The measured and theoretical TCP data throughput of the STDMA prototype as the percentage of the data period in each schedule cycle is varied. The fact that the measured and theoretical throughputs are close to each other demonstrates that the STDMA prototype can effectively utilize the radio resource even under heavy loads. Table 3 summarizes the number of concurrent 8Kbps voice connections that IEEE , IEEE e and STDMA can support on a single IEEE b channel whose transmission rate is 11 Mbps. Our measurement shows that the IEEE standard can actually support up to 18 simultaneous voice connections. This result for IEEE is somewhat surprising because it is quite different from previously reported test results. 3, 4 We believe this discrepancy arises mainly because our testbed has much lower interference than real-world wireless LANs. When the 19th voice connection is added, the packet loss ratio of downstream voice traffic from the AP goes up drastically, but the upstream voice traffic experiences much fewer losses. In IEEE , the AP uses the same AIFS and CWmin as the STAs, and therefore is not given a higher priority than STAs as far as accessing the shared radio channel is concerned. When the input traffic load of a WLAN channel is close to its capacity, the AP is given less channel time share than it needs to transmit downstream packets and thus becomes the bottleneck. For IEEE e, it can support 22 concurrent voice connections. When the 23rd voice connection is added, unlike IEEE , the packet loss ratio of upstream traffic from STAs goes up while the downstream traffic experiences much fewer losses. In IEEE e, the AIFS of the AP s voice queue is PIFS, which is smaller than DIFS, the AIFS of STAs voice queue. In addition, the AP s voice queue uses a large TXOP limit to transmit multiple frames once it acquires the channel. This is why it is the STAs rather than the AP that become the bottleneck when an IEEE e WLAN channel is fully loaded. The performance improvement of IEEE e over IEEE mainly comes from the fact that e gives the AP a higher priority over the STAs and thus significantly reduces the packet loss ratio of downstream voice traffic. STDMA can support 50 voice calls successfully. Because downstream voice traffic can effectively leverage the TXOP mechanism, the STDMA prototype can transmit downstream packets associated with 50 voice calls within 9.3 msec, leaving 0.7 msec for the schedule announcement frame and the contention slot in each schedule cycle, whose length is 20 msec. Compared with IEEE (18 voice calls) and IEEE e (22 voice calls), STDMA (50 voice calls) can improve a WLAN channel s VOIP capacity by a factor of 2.8 and 2.3, respectively, while offering comparable packet delay and smaller delay jitter. The performance difference between STDMA and IEEE /802.11e systems is mainly attributed to careful reduction in per-packet transmission overhead and downstream voice traffic batching at the AP. 4.5 Data Throughput of STDMA A key challenge of any TDMA-based protocol 23 is how to maximize the utilization efficiency of the shared channel by reducing the channel idle time that may arise when nodes with traffic do not get scheduled in time. The
11 goal of this test is to evaluate how efficiently STDMA can transport data traffic in the data period. In this test, we transmit upstream data traffic from 12 WLAN cards while varying the percentage of each schedule cycle allocated to the data period. For a given data period size, the total offered load from the 12 cards is fixed and corresponds to the theoretical throughput that can be supported by the data period. However, the load generated by each card changes randomly. Some WLAN card sometimes doesn t have anything to transmit and will submit a traffic request frame when new data arrives. The packet size used in all test runs is 1500 bytes. Figure 5(a) shows the measured and theoretical UDP data throughputs when the percentage of each schedule cycle allocated to the data period is varied from 10% to 100%. The theoretical throughput is computed as follows. First we measured the maximum throughput between a STA and an AP with STDMA is disabled, and use this to approximate the theoretical throughput when the entire schedule cycle is assigned to the data period. The theoretical throughput for a lower-than-100% data period is computed as the product of the theoretical throughput at 100% and the data period percentage. The measured UDP throughput under STDMA is very close to its theoretical counterpart, and this shows that STDMA can effectively and efficiently shift the channel resource among WLAN nodes that happen to have data to send at any point in time. The difference between the theoretical and measured throughputs mainly comes from fragmentation. Because the packet size used in the test is 1500 bytes, at 11 Mbps encoding rate, each data packet consumes 1.3 msec of the channel time. When the remaining time in a data period is smaller than 1.3 msec, this time has to be left wasted. However, each data period cannot waste more than one frame worth of channel time. Figure 5(b) shows the measured and theoretical TCP data throughputs when the percentage of each schedule cycle assigned to the data period is varied from 10% to 100%. The TCP throughputs are measured at the socket layer, and increases almost linearly with the increase in the data period percentage. The discrepancy between measurements and theoretical predictions is due to ACK packets, which travel downstream. Unlike UDP, TCP requires transport-layer ACKs. As a result, the TCP test does not have UDP s fragmentation problem because the downstream TCP ACK traffic almost always uses up the remaining time in each data period left by the upstream TCP data traffic. In both cases, the measured throughput closely tracks the theoretical throughput, even when the offered load is close to the channel s capacity. This demonstrates that STDMA can even improve the overall throughput of an IEEE WLAN channel under heavy non-voice data traffic loads. To evaluate the impact of data traffic on the QoS of voice traffic, we measure the packet loss ratio of voice frames when all the voice slots in the voice period are occupied. The impact of data traffic on the average packet delay and packet delay jitter of voice traffic is negligible, because the data period typically stops much earlier before the next schedule cycle s voice period. Consequently, data frames rarely have a chance to disrupt the timings of voice frames in the next schedule cycle. 5. CONCLUSION The number of VoIP calls that existing VoWLAN products can support on an IEEE WLAN channel is disappointingly small 3, 4 because of lack of QoS support and inefficient WLAN frame transmission. To gain a full understanding of the VoIP capacity limit of commodity WLANs, we performed a comprehensive performance comparison among IEEE , IEEE e, and a software-based TDMA protocol that is specifically designed to maximize an IEEE based WLAN s VoIP capacity. Using our VoIP over WLAN testbed, we found the measured VoIP capacity of an IEEE b WLAN is 18 G.729 voice calls, which is much better than previously reported test results for commercial VoWLAN products, 3, 4 presumably because there is minimal radio interference in our experiments. The main performance bottleneck for IEEE arises from the fact that the AP does not have a higher transmission priority over STAs. IEEE e successfully removes the AP s downstream transmission bottleneck via careful choice of such MAC parameters as AIFS, CWmin and TXOP limit, and increases the VoIP capacity to 22 G.729 calls. Its throughput improvement over IEEE b is lower than expected when the input workload is voice-only traffic, but is expected to be more significant when the input workload consists of both data and voice traffic. STDMA can increase the number of concurrent G.729 VoIP calls on an IEEE b WLAN channel to 50,
12 which is 2.3 times better than IEEE e, because STDMA effectively eliminates all performance overheads due to collision. In addition to improved capacity, STDMA also provides better QoS in terms of packet loss, delay and delay jitter. In summary, the main contributions of this work include A software-based TDMA protocol that provides both guaranteed QoS for time-sensitive traffic such as VoIP connections and efficient channel utilization for non-time-sensitive traffic such as FTP connections. The first known TDMA implementation on commodity WLAN hardware that demonstrates the best known VoIP throughput for a single IEEE b channel, 50 concurrent G.729-based VoIP calls. A comprehensive empirical performance comparison among vanilla IEEE b, IEEE e and STDMA, and detailed analysis of the sources of their performance differences. REFERENCES 1. V. Wiki, VOIP Qos Requirements, I. T. Union, G.729: Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CS-ACELP), [Online]. Available: 3. D. Newman, Review: Voice over Wireless LAN, Network World, 01/10/05, [Online]. Available: 4., Aruba conquers challenge of Wi-Fi scalability, Network World, 11/06/06, [Online]. Available: 5. IEEE, IEEE Standard e-2005, S. Sharma, K. Gopalan, N. Zhu, G. Peng, P. De, and T. cker Chiueh, Implementation Experiences of Bandwidth Guarantee on a Wireless LAN, in Proceedings of ACM/SPIE Multimedia Computing and Networking (MMCN 2002), D. Gu and J. Zhang, Evaluation of EDCF Mechanism for QoS in IEEE Wireless Networks, in Proceedings of World Wireless Congress (WWC), P. Garg, R. Doshi, R. Greene, M. Baker, M. Malek, and X. Cheng, Using IEEE e MAC for QoS over Wireless, in Proceedings of International Performance Computing and Communications Conference, SpectraLink Inc., SpectraLink Voice Priority, white paper.pdf, M. Neufeld, J. Fifield, C. Doerr, A. Sheth, and D. Grunwald, SoftMAC - Flexible Wireless Research Platform, in Proceedings of Fourth Workshop on Hot Topics in Networks (HotNets) 05, C. Doerr, M. Neufeld, J. Fifield, T. Weingart, D. C. Sicker, and D. Grunwald, MultiMAC - An Adaptive MAC Framework for Dynamic Radio Networking, in First IEEE Symposium on New Frontiers in Dynamic Spectrum Networks (DySPAN) 05, W. Wang, S. C. Liew, Q. X. Pang, and V. O. Li, A Multiplex-Multicast Scheme that Improves System Capacity of Voice-over-IP on Wireless LAN by 100%, in Proceedings of the Ninth IEEE Symposium on Computers and Communications (ISCC), Atheros Communications Inc., Super G: Maximizing Wireless Performance, superg whitepaper.pdf, A. Rao and I. Stoica, An Overlay MAC layer for networks, in Proceedings of ACM Mobisys Conference 2005, Cisco Systems Inc., Voice Over IP - Per Call Bandwidth Consumption, consume.html, Yiying Zhou and others, Performance Analysis for VoIP System, acpang/course/voip 2005 fall/presentation/b2.ppt, H. Zhang, Service disciplines for guaranteed performance service in packet-switching networks, Proceedings of the IEEE, vol. 83, no. 10, pp , January IEEE, IEEE Standard , 1999 Edition, M. S. Inc., High Resolution POSIX timers, M. Heusse, F. Rousseau, G. Berger-Sabbatel, and A. Duda, Performance Anomaly of b, in Proceedings of IEEE INFOCOM 2003, M. project, MadWifi - Multiband Atheros Driver for Wireless Fidelity (WiFi), H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, IETF RFC3550 RTP: A Transport Protocol for Real-Time Applications, [Online]. Available: 23. M. Narbutt and M. Davis, Effect of free bandwidth on VoIP Performance in b WLAN networks, in Proceedings of IEE Irish Signals and Systems Conference, 2006.