Experimental evaluation of two open source solutions for wireless mesh routing at layer two Rosario G. Garroppo, Stefano Giordano, Luca Tavanti Dip. Ingegneria dell Informazione Università di Pisa Via G. Caruso, 16 Pisa, I-56122, Italy Email: {name.surname}@iet.unipi.it Abstract The paper reports the outcome of an experimental evaluation of two open source solutions for realising a Wireless Mesh Network (WMN). Both works at layer two of the ISO/OSI stack and are transparent to the IP layer, i.e. they allow keeping the existing TCP/IP stack unchanged and avoid dealing with the complex IP configuration and management tasks. The first solution is the upcoming IEEE 802.11s standard, as implemented by the open80211s project, and the other is the B.A.T.M.A.N. routing protocol, in its layer-2 version. We compared them in a small experimental testbed, with main focus on their behaviour in typical mesh situations. We found that both have strengths and shortcomings, but none can be claimed to be completely mature. I. INTRODUCTION The technology backing Wireless Mesh Networks (WMNs) is in most cases common to all kinds of deployments, be they academic, hobbyist, or commercial. The vast majority of them uses the IEEE 802.11 family of standards [1] to implement the point-to-point links (medium access control (MAC) and physical layer). The proper mesh topology is realised through IP layer solutions, such as the Optimised Link State Routing (OLSR) protocol [2], the Better Approach To Mobile Adhoc Networking (B.A.T.M.A.N. or just Batman) routing protocol [3], and other proprietary protocols (see e.g. [4] and the references therein). Lately, however, much interest has been put on routing solutions working below the IP layer. This approach offers several interesting features. It reduces the complexity of mesh platforms, since there is no need to load and run the (cumbersome) IP stack. It provides a faster and more complete access to MAC and physical layer information, which can be used to improve the routing decisions and shorten the time to react to unexpected events. It allows synergies on mechanisms that are common to both the routing and MAC protocols (such as beacons and other periodic messages). It is not bound to a specific layer three addressing scheme (for example, a change from IPv4 to IPv6 would be completely painless, but even non-ip schemes, such as IPX, would go). On the flip side, routing at layer two cannot take advantage of the information on the network architecture carried by the IP addressing scheme, since MAC addresses are flat. Hence, scalability represents a serious hurdle, and in fact the majority of current layer-2 meshing solutions are addressed to small to medium networks only. Clearly, for a proper implementation to be completely transparent to the IP layer, mechanisms must be devised so that the network layer can work while being completely unaware of the multi-hop nature of the underlying (wireless) network. In other terms, the mesh must be entirely managed at layer two, so that all mesh devices, even the farthest ones, are seen by the upper layer as directly connected through a one-hop link, as it happens for a single Ethernet broadcast domain. Presently, there are very few working mesh solutions that are transparent to the IP layer. A couple of them have been developed by the IEEE: 802.16j [5] and 802.11s [6]. The former is in fact a trivial multi-hop relay mechanism, in which Relay Stations (RSs) have been introduced with the primary goal of extending the coverage area of the base station (BS). Therefore, they do not build a proper WMN, since RSs lacks self-configuring capabilities and are used just to form a BSrooted tree. Conversely, IEEE 802.11s is a proper WMN built within the MAC layer. The standardisation is still in progress [7]. Nonetheless, thanks to the open80211s project, there is already a working (though incomplete) implementation [8]. Among the transparent meshing solutions developed outside standardisation bodies, it is worth mentioning Microsoft Mesh Connectivity Layer (MCL) [9], the Roofnet project [10], and Batman Advanced, a version of Batman for layer two routing [3]. Actually, both MCL and Roofnet are layer-2.5 solutions, in the sense that they are inserted into the protocol stack as a layer on its own, between the network and MAC layers. The advantage is that they are transparent to both neighbouring layers, but, on the other hand, they introduce extra overhead, in the form of additional management frames, additional frame headers and extra processing time. The paper presents an experimental assessment of two of the cited protocols: IEEE 802.11s and Batman Advanced. The goal is to verify their performance in terms of route stability and recovery times, two aspects that are essential in the prospect of setting up reliable deployments. Hence we did not focus on gross figures such as the overall network throughput, but on the response of the protocol procedures in specific situations and in a controlled environment. The features of the two protocols are presented in Sections II and III, respectively. Then, Section IV describes the testbed we set-up for the evaluation and reports the outcome of the tests.
II. IEEE 802.11S In this Section we outline just the main concepts of IEEE 802.11s according to one of the latest draft standards. Further details can be found in [11] or directly from [7].We then describe the features of open80211s, the software that provides the current implementation of 802.11s. A. The IEEE 802.11s Draft Standard IEEE 802.11s conceives a new network architecture that is deployed on top of the existing 802.11 standard layer. The goal is to provide a protocol for the auto-configuration of paths between stations in the Wireless Distribution System (WDS). In other terms, stations are called to create and exploit a multi-hop communication network to transport data across the WDS. The routing functions are thus brought at level two of the ISO/OSI stack and renamed path selection and frame forwarding procedures. Clearly, since path selection works at layer two, frame forwarding is performed on the basis of the MAC addresses. The basic IEEE 802.11s network entity is the mesh station (in short, MS). In addition to having all the characteristics of any legacy 802.11 station (STA), every MS is also called to relay the traffic generated by other MSs. The set of MSs and connections among them forms the wireless backbone of the mesh, or Mesh Basic Service Set (MBSS). To properly execute the new mesh services and algorithms, and specifically to build the set of paths that forms the mesh backbone and to allow frame forwarding, the draft introduces new mechanisms and adds new frame formats and Information Elements (IEs). The most meaningful are the Path Selection, the Peer Link Management, and the Link Metric frames for the establishment and management of the paths across the MBSS. The mandatory path selection algorithm that all MSs must implement is called Hybrid Wireless Mesh Protocol (HWMP). HWMP combines a reactive technique with an optional proactively-built tree topology. The tree connects all MSs to a root MS so that a path is always available between all MSs. In the reactive mode, MSs are allowed to communicate by means of paths created on-demand. This mode is used when there is no configured root, or when it can provide a better path to the destination. Both techniques use common messages and processing rules. The Path Request (PREQ), Path Reply (PREP), Path Error (PERR) and Root Announcement (RANN) information elements are flexibly structured to allow for the needs of both protocols. In the HWMP on-demand path selection algorithm, a source MS s wanting to send data to a destination MS t broadcasts a PREQ frame indicating the MAC address of t. All MSs receiving the PREQ create or update their path to s (but only if the PREQ contains a sequence number, DSN, greater than the current path or the same DSN and a better metric). Every MS, before re-broadcasting the PREQ, must update the metric field to reflect the cumulative metric of the path to s. Once t (or any allowed intermediate MS) receives the PREQ, it sends s a unicast PREP. If t receives further PREQs with a better metric (and same or greater sequence number), it sends a new PREP along the updated path. Intermediate MSs shall then forward the PREP(s) to s along the best path (stored during the PREQ flooding phase), and, when the PREP reaches s, the path is set up and can be used for a bi-directional exchange of data. If more than one PREP is received, the PREPs following the first are processed only if their information is not stale and announces a better metric (same rules of PREQ apply). As for the proactive HWMP tree, it can be set up either by using PREQ/PREP messages, or by means of RANN frames. The former technique aims at creating and maintaining a set of paths towards the root from all MSs, whereas the latter just disseminates information about the metrics for reaching the root, but leaves each MS the possibility to set up the path whenever it needs it. HWMP sets up the mesh paths according to the announced metric. The 802.11s draft defines a mandatory metric, called Airtime Link Metric. This metric estimates the channel resource consumption as a function of the loss rate and link bandwidth. The definition is the following: c a = [ O + B t r ] 1 1 e f. (1) In the formula, O is a constant that quantifies the channel access and protocol overhead (in micro-seconds), B t is the test frame length (8192 bits), r is the data rate (in Mbps), and e f is the test frame loss ratio (the probability that, when a frame of size B t is transmitted at the current bit rate r, the frame is corrupted due to transmission errors). The estimation of both r and e f is left to the implementer of the network interface card driver. B. The open80211s implementation The goal of the open80211s project [8] is to develop a reference implementation of the IEEE 802.11s draft standard. Accordingly, it tries to follows as closely as possible the dictates of the draft. However, being the project still in progress, the implementation is not complete. For example, in the current implementation, HWMP works in the reactive mode only, there are no Link Metric frames, and so on. However, apart from these shortcomings, open80211s is indeed a valuable (and fairly updated) implementation of the IEEE draft, which has recently been embedded into the Linux kernel. An interesting part is the computation of the airtime metric. The general formula is the same as (1) (with the constants O and B t set, respectively, to 1 and 8192). Yet it is worth seeing how the terms r and e f, whose definition is left as a local implementation choice, are computed. The open80211s implementation uses these formulae: e f [k] = p f [k] + 8 p f [k 1], (2a) 9 p f = n f, (2b) n tx where k indicates a generic time instant, and n f and n tx count the numbers of failed frame transmissions and trans-
mitted frames, respectively. Both are recorded between two consecutive computations of e f (i.e. between k 1 and k). As for r, this is the rate of the very last transmission, which depends on the policy employed by the rate control algorithm (RCA). The RCA is also the entity that computes e f. The exact time instants k are handled internally by the RCA; however two successive computations must be separated by at least 13 ms. HWMP samples e f every time a path in the node cache is added or updated, i.e. every time a PREQ or PREP is received. III. B.A.T.M.A.N. ADVANCED The development of B.A.T.M.A.N. (Better Approach To Mobile Ad-hoc Networking [3]) started as an attempt to overcome the shortcomings of OLSR. According to the developers of Batman, the OLSR protocol as specified in RFC 3626 was not completely functional in practical scenarios, and particularly for large deployments and in lossy environments. Part of these problems were ascribed to the link-state nature of OLSR, which involves a large amount of protocol and computational overhead and requires an accurate synchronisation of the information over all nodes, tasks that might be difficult to accomplish in large wireless networks. The approach of Batman is to spread the knowledge about the best end-to-end paths to all participating nodes. Every node perceives and maintains only the information about the best next hop towards all other nodes. Therefore there is no need for the global dissemination of local topology information. The basic working principle is rather simple. Every node periodically broadcasts the so-called originator messages (OGMs) to inform the neighbouring nodes (and the network in general) about itself. OGMs are small packets (52 bytes including IP and UDP headers) that contain the address of the originator, the address of the node transmitting the packet, a Time To Live (TTL) and a sequence number (SQ). OGMs are selectively re-broadcast to flood the whole network. Each node is allowed to re-broadcast the OGM only once, and only if it has been received from a neighbour which is currently held as the best next hop towards the initiator of the OGM. The best next-hop neighbour to a given destination node D is chosen as the neighbour through which the OGMs from D are received with the smallest delay and packet loss. The idea is that if OGMs are propagated through paths having links with poor quality or excessive load (saturated), they will suffer from packet loss and/or delay, whereas OGMs that travel along good paths will propagate faster. It can be noted that path computation is performed on a hopby-hop basis, and the OGMs carries no metric information. The quality of the next hop to the destination is simply a (local) function of the number of received OGMs through that link. The SQ field is used to check the bi-directionality of the links and is also included in ranking the potential next hops. Every node keeps a sliding window holding the SQs received from each originator and from each neighbour. The node stores the SQs from the last one (say n) to the one numbered n (W 1), where W is the size of the window. When a new SQ is received (say m), the window is moved and the SQs older than m (W 1) are dropped. If the window associated to the neighbour from which the last OGM has been received is the one that contains the most SQs, then this neighbour becomes the current best-hop towards the originator. Batman was first realised as a classic layer three routing protocol, using UDP packets to exchange routing information. Later on, an extension called Batman Advanced (in short, batman-adv) was developed to work at layer two. This version, similarly to IEEE 802.11s, emulates an Ethernet bridge, so that all nodes appear to be attached to a direct link and all protocols operating on top of it are not aware of the multi-hop nature of the underlying network. The working principles of batman-adv are the same of classic Batman, with the obvious adaptations for handling layer-2 addresses instead of IP addresses. IV. EXPERIMENTAL ASSESSMENT OF THE PROTOCOLS The testbed to assess the behaviour of the two meshing schemes was composed by four mesh nodes, as depicted in Fig 1. The reason for having such a small testbed is that it is somehow possible to control the various environmental parameters. This allows for a much finer investigation than on a large scale testbed, where complex propagation conditions, node anomalies and external factors are often out of control, and thus only gross figures can be collected. Fig. 1. The experimental testbed. The nodes are laptop PCs run by the Linux operative system (specifically, Ubuntu with kernel 2.6.27.7) and equipped with Atheros cards supported by the new ath5k drivers. In all tests, the source and destination nodes are the laptops S and D, while the intermediate nodes are A and B. The direct link between S and D was tore down by placing the laptops sufficiently apart and by setting a lower transmission power, so that the quality of link S-D became poor enough to push both batmanadv and open80211s to always prefer the multi-hop path to the direct link. Both open80211s and batman-adv were run with the default parameter values. Traffic has been generated with the D-ITG software [12]. For all tests, unless otherwise specified, we generated UDP traffic with 512-byte packets and 100 packets per second. The tests can be divided into two sets, as a function of the feature under test: path and throughput stability, and recovery after a node failure. As already outlined, we reckon these two aspects are the most critical in a real scenario. The outcome of each test is described in the following subsections.
Fig. 2. Throughput of batman-adv along the two paths. A. Path and throughput stability The goal of this test is verifying whether the protocols use a stable path from S to D, and, in case of path changes, check whether they result in reduced transfer rates. Although stability cannot be regarded as an absolute value, in mostly static deployments, such as the WMN backbone, frequent path swapping come with an unavoidable overhead and possible performance reduction. Therefore mesh protocols that privilege stability should be preferable. The test consisted in running a 40-second-long traffic flow over a static network. Fig. 2 and 3 report the throughput of batman-adv and open80211s along the paths S-A-D (black) and S-B-D (red) in three different runs. Throughput is averaged over 1 s intervals, and refers to physical layer throughput, i.e. it also includes MAC retransmissions. Hence the more the line is close to the 100 pkt/s value, the better the channel is (less retransmissions). We can note that batman-adv is generally more stable. Conversely, open80211s presents a considerable path swapping. A deeper analysis of the behaviour of open80211s allowed us to discover the reason of this oscillation among the two paths. open80211s implements a feature that was defined as optional in one of the early IEEE 802.11s drafts and that is illustrated in Fig. 4 (which is a graphical representation of an actual traffic trace sniffed from the testbed). When the destination node of the path discovery procedure receives the first PREQ, it shall increase the destination sequence number (DSN) that is going to put in the PREP. However, the node may decide not to increase the DSN it will put in the successive PREPs for a certain time interval (called HWMP_RT_NETDIAMETER_TRAVERSAL_TIME). As a result, when the originator of the path discovery procedure receives more than one PREP, since it must consider only the one(s) with the highest DSN, it will discard all the others, even though they announce a better metric. The example Fig. 3. Throughput of open80211s along the two paths. in Fig. 4 shows two PREPs that are generated within the same HWMP_RT_NETDIAMETER_TRAVERSAL_TIME interval, but only the DSN of the first PREP in incremented. Hence S discards the second PREP (with DSN=169), even though it carries a better metric (152 instead of 8193). This option, intended to improve route stability, in fact produces the opposite effect. The choice of the path depends on the first received PREP, which in turn is produced as a response to the the first received PREQ. However, the receiving order is determined essentially by the medium access policy, which, in IEEE 802.11-based networks, is based on random number generation (the backoff counter). As a result, path selection becomes in turn a randomised process. This explains the oscillations even in a rather controlled environment such as our testbed. We fixed the problem by removing the check on the HWMP_RT_NETDIAMETER_TRAVERSAL_TIME parameter from the code. Thus we let the destination node use the same (increased) DSN for all PREPs it generates in response to PREQs belonging to the same path discovery procedure. We then positively verified the correspondence between the path selected by S and the airtime metric values it received. The outcome of the tests run with the amended code is reported in Fig. 5. Surprisingly, the graphs show that there is no apparent change in the protocol behaviour, as path alternation is still present. However, there is a fundamental difference. The oscillation is now driven by the metric values registered along the two paths. Unfortunately, however, not always does the metric reflect the present channel quality of the paths. This
Fig. 6. Values (in logarithmic scale) of the airtime metric registered over the small testbed network. Fig. 4. values. Time-space graph of PREQ/PREP frames with DSN and metric (M) can be seen from Fig. 6, which reports the values of the Metric field of the PREP messages sent from A to S during one of the tests. Fig. 7. The physical data rate registered over one link of the test network. Fig. 5. Throughput of the modified open80211s along the two paths. From a quick analysis, we found that the values are a function of the current transmission rate r only, and that the computed e f [k] is always zero (apart from very rare exceptions). This latter point can be explained by the fact that the computation of p f is not based on the single frame transmissions, but on the whole frame exchange sequence. Therefore, unless the channel is very unfavourable, it is quite unlikely that a frame is not correctly delivered after the MAX_RETRY_LIMIT attempts allowed for the IEEE 802.11 MAC layer. In addition, MAC can also take advantage of rate adaptation procedures, which usually select a less efficient but more robust modulation scheme after every failed frame transmission, thus continuously increasing the success probability of the next attempts. As a result, all registered values can be traced back to the formula c a = (1 + 8192/r), which is the direct simplification of (1) when p f = 0. With reference to Fig. 6, we can distinguish four sets of values: 228, 342, 456, and 683. It is easy to verify their correspondence to 36, 24, 18 and 12 Mbps data rates. The successive graph (see Fig. 7) plots the physical data rate we registered over the link A-D. The variability of the data rate is apparent. Some plateaux can be detected, but there are also several peaks and notches. A fair relationship with the metric values of Fig. 6 can also be seen. Recall that the highest is the date rate, the smallest is the airtime metric, hence the trends of the two graphics should be complementary (and, for the most part, they are). B. Recovery after a node failure This test aims at verifying the ability of the protocol to restore the communication after a node failure. In practice this event may occur in a number of circumstances, e.g. an actual node failure, or a temporary node unavailability due interferences and/or obstructions. For this purpose we abruptly turned off the wireless interface of the intermediate node that, at that moment, was relaying the traffic from S to D. We then measured how long the protocol takes to find the alternative path to D. Finally, we turn on again the failed node and observe whether and when this node is included again in the forwarding process. The results are in Fig. 8 and 9. The vertical bars show the time instants of the node failure (off) and restart (on). During our tests, batman-adv never succeeded in resuming the communication after the node failure. A slight improvement was registered when the intermediate node was turned on again, but the throughput never reached the same level and stability it had before the failure. We investigated more on this behaviour and discovered that batman-adv declares an originator as lost only if a PURGE_TIMEOUT interval elapses from the last received OGM without hearing any more OGMs from that node. The current default value of PURGE_TIMEOUT is 200 s. Therefore batman-adv is not able to promptly react to sudden topology changes, as it demands all path updates to the expiration of the PURGE_TIMEOUT timer. It is therefore easy to understand that it is not possible to achieve recovery times smaller than the OGM creation interval, since setting PURGE_TIMEOUT to less than ORIGINATOR_INTERVAL would produce continuous path breakages and re-establishments. How to correctly tune the
Fig. 8. Throughput of batamn-adv for the recovery test. seem to be mature enough. It lacks some notable features (such as the proactive path selection, the portal), and it reveals some instability. Conversely, under static scenarios, batman-adv shows much more reliable performance. In case of node failure, open80211s recovers quite rapidly, sometimes even so fast that the service appears to be seamless. On the other hand, batman-adv has serious problems in resuming the communication after an abrupt interruption. As a result, it is hard to find a winner in this comparison, as the two solutions have both strengths and weaknesses. Rather, we could even say that both implementations are losers, since none of them is mature enough to handle situations that are typical of any realistic scenario. ACKNOWLEDGMENT This work was supported by the Italian Ministry of Instruction, University and Research (MIUR) under the PRIN 2007 research project SESAME (Scalable Efficient Secure Autonomic MEsh networks) and under the FIRB project InSyEme (Integrated System for Emergency, grant number RBIP063BPH). The authors would like to thank Alessia Moschini and Francesco Cappuccio for their help in setting up the testbed and running the trials. REFERENCES Fig. 9. Throughput of open80211s for the recovery test. two parameters in order to achieve a good trade-off among stability, overhead and fast-recovery will be the focus of future work. As for open80211s, the failure was almost unnoticeable from the application perspective, as the traffic was immediately routed along the other path. Similarly, the re-activation of the failed node had no impact on the protocol behaviour nor on network performance. Sometimes, the re-entering node was immediately chosen to forward the frames (second and third case). V. CONCLUSION From the experimental analysis we can draw the following conclusions: The current implementation of the IEEE 802.11s draft standard provided by the open80211s project does not [1] IEEE Standard for Information technology Telecommunications and information exchange between systems Local and metropolitan area networks Specific requirements Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, Dec. 2007. [2] T. Clausen and P. Jacquet, Optimized Link State Routing Protocol (OLSR), Oct. 2003, IETF RFC 3626. [3] A. Neumann, C. Aichele, M. Lindner, and S. Wunderlich, Better approach to mobile ad-hoc networking (b.a.t.m.a.n.), April 2008, IETF Internet-Draft (expired October 2008). [Online]. Available: http://www.open-mesh.net/ [4] I. F. Akyildiz, X. Wang, and W. Wang, Wireless mesh networks: a survey, Computer Networks, vol. 47, pp. 445 487, 2005. [5] IEEE Std 802.16j-2009 IEEE Standard for Local and metropolitan area networks Part 16: Air Interface for Broadband Wireless Access Systems Amendment 1: Multiple Relay Specification, May 2009. [6] IEEE P802.11 Task Group s, IEEE Unapproved draft standard P802.11s/D2.06, January 2009. [7], Status of Project IEEE 802.11s Mesh Networking. [Online]. Available: http://grouper.ieee.org/groups/802/11/reports/tgs update.htm [8] The open80211s project. [Online]. Available: http://www.open80211s.org/ [9] Microsoft Research, Self organizing wireless mesh networks. [Online]. Available: http://research.microsoft.com/en-us/projects/mesh/ [10] J. Bicket, D. Aguayo, S. Biswas, and R. Morris, Architecture and evaluation of an unplanned 802.11b mesh network, in ACM Annual International Conference on Mobile Computing and Networking (MobiCom), August 2005. [Online]. Available: http://pdos.csail.mit.edu/roofnet [11] R. G. Garroppo, S. Giordano, and L. Tavanti, Implementation frameworks for IEEE 802.11s systems, Computer Communications, vol. 33, no. 3, pp. 336 349, February 2010. [12] Distributed Internet Traffic Generator (D-ITG). [Online]. Available: http://www.grid.unina.it/software/itg/