Path Optimization in Computer Networks Roman Ciloci Abstract. The main idea behind path optimization is to find a path that will take the shortest amount of time to transmit data from a host A to a host B, in a multi-hop environment. Path optimization has seen an increased interest over the last decade. In order to find the optimal path between two hosts, several steps are involved. The first and most complicated step is finding the available bandwidth of a certain path. There are many tools that are designed to do available bandwidth estimation. In this report we chose to test three tools for accuracy: Spruce, Pathload, and pathchirp. The first set of tests was done in a simple environment where the input data is known and the output is easy to calculate. The second set of tests was done in a more realistic environment using PlanetLab nodes. After calculating which tool gives more accurate results, a simple formula is given for time estimation of the given path. Finding the optimal path after we know the time of each path in the network topology is a straightforward job of applying one of the graph algorithms. Keywords: testing, networks, available bandwidth, measurement. 1 Introduction With Internet becoming a mature and standardized communication medium, serving as a backbone for many commercial, and research fields, path optimization has seen an increased interest over the last decade. The need for path optimization comes with the advancement of network topologies. We can see from Fig. 1 how finding the optimal path could be useful. Finding the optimal path between two nodes could be a valuable asset for a wide range of networks applications. Some of the applications that can benefit from path optimization are: streaming applications, route selection in overlay networks [5], QoS verification [6], network traffic monitoring, etc. Today obtaining routing information from routers which would help in path optimization is often not possible, due to technical and privacy issues [3], and implementing the techniques at the hardware level would cause major hardware changes which would be very costly and impractical at the moment. The only option of implementing such techniques remains,
at the moment, at the application layer, which would require fewer or no changes for the end user. Much interest has been given to path optimization in the recent years. While still in a research phase many techniques of path optimization have been developed [5]. Fig. 1. A generalized and simplified internet communication view. We can see how path optimization would help in speeding data transfer. 2 Methodology Generally, in finding the optimal path, there are three steps involved. The first and most complicated one is finding the available bandwidth of a path, that is, the capacity of a path that is not used out of the total path capacity [2], [3], [4], [5], [6], [7]. The available bandwidth estimation is discussed in this section of the report. The second step in determining the optimal path is finding the time it would take to transmit the required data, for each path in the network topology. A formula for estimating the time of a path is given in a later section of the report. After the time for each path has been determined and an appropriate map of the network has been established, the third step is to determine the optimal path. The last step is a straight forward application of one of the graph algorithms for path finding.
At present there are many tools available for available bandwidth estimation each of them using a different technique. Table 1. Some available bandwith stimation tools. Tool Name 1. Spruce [2] 2.pathChirp [3] 3. Pathload [4] 4. IGI 5. pathchar [8] According to [11] there are two techniques for available bandwidth estimation: single packet, and packet-pair. Since estimation of a path consists of multiple probes the names of the techniques refer to the number of packets used in each probe. Because different tools use different methods one question arises: which one is better? To answer this question we decided to test some of the tools. The purpose of the test was to find which technique gives more accurate results. We chose for testing three tools: Spruce [2] which employs the packet-pair technique, pathchirp [3] which uses packet train and Pathload [4] which also uses the packet train. It is very important to note the fact that just because Pathload and pathchirp use the same method it doesn t mean that they are the same. They differ in the method they use to send their packet trains. Pathload sends the packet train uniformly while pathchirp sends it exponentially-spaced. We performed two sets of tests which will give us an idea about the accuracy of each of the tools. 2.1 First Test on Tools Accuracy (Simplistic model) The first test on tools accuracy we chose to be a simple one. The reason behind choosing such a testing environment was that if we know all the input data then the output can be easily calculated. The test was done using two Pentium 4 machines with 2 GB of RAM running Ubuntu 8.04, and connected using a 100Mbps/xT Cat5e crossover cable. The test is comprised of two parts, which are described in the following sections. 2.1.1. Without additional load on the path First part or first set of tests was done using no load on the path, so the available bandwidth should be equal with maximum path capacity plus/minus a e (error) which is small enough that it can be ignored. To make sure there was no load on the path at any time, we used the operating system s system monitoring tool (Fig. 2), thus making sure the results are real and accurate.
Fig. 2. Screenshot of Ubuntu s system monitor tool showing no network traffic. For consistency purposes we ran each tool three times, recording the results each run. The results for the first test are displayed in Table 2. Calculations of accuracy are described in a later section. Table 2. Results of the first part of the test with no additional load on the path. Tool Run #1 Run #2 Run #3 Spruce 99.5 Mbps 100.4 Mbps 100.4 Mbps Pathload 81.9-100.9 Mbps 15 106 Mbps 95-98 Mbps pathchirp 87.6 Mbps 99.1 Mbps 91.7 Mbps 2.1.2. Test with additional load on the path This test was performed using the same settings as for the first one: two Pentium 4 machines connected by a 100 Mbps/xT Cat5e crossover cable. The only difference between this test and the previous one is that this time we were keeping a constant 1MB load on the path. To accomplish this, a special program was designed. Based on the client/server paradigm the program s purpose was to send data from the client machine to the server, at a constant rate of 1MBps. The data transfer was performed using the UDP protocol. In Fig. 3 we can see how system monitor displays a constant rate of network traffic.
Fig. 3. System monitor showing a constant 1MB data transfer. Each tool was run three times, allowing us to verifying accuracy and check for consistency. The results obtained from the test are presented in Table 3. Table 3. Results of the second part of the test with 1MBps of costant data transfer. Tool Run #1 Run #2 Run #3 Spruce 99.9 Mbps 98.6 Mbps 97.5 Mbps Pathload 18.4-107.9 Mbps 88.6 97.8 Mbps 94.4-97.3 Mbps pathchirp 92.6 Mbps 92.6 Mbps 91.1 Mbps Even if we can already do some observations on tools accuracy, based on the results presented above, a more detailed discussion on the accuracy of the tools and other proprieties is presented in a later section. Using a simple test has advantages and disadvantages. The advantage of the test is that by having total control of the environment we can make accurate observations and calculations about the accuracy and performance of the tools. On the other hand the current test does not represent or simulate a real networking environment, because in the real world networks are composed of hundreds of machines with hundreds of paths between them, where each one of them has a different capacity. That s why we chose to perform an additional test where the environment is a more accurate simulation of how today s internet works, which is described in the next section. 2.2 Second Test on Tools Accuracy (Realistic Environment) The second test performed on the tools was very close to a realistic environment. In performing the following test we used the Planetlab [1] testbed. We chose to test the tools between two machines with a considerable number of hops between them. We decided to use a machine from University of Texas, USA and the second from University of Warsaw, Poland both running Linux operating system. Using the tracert system tool we were able to find out the number of routers between the two machines. The trace route tool displayed a total of 19 routers between the two, failing to identify only two of them with an average RTT of 160ms. We again ran each program three times for consistency verification. The results are presented in the following table. Table 4. Results for the Planetlab test. Tools Run #1 Run #2 Run #3 Spruce 89.6 Mbps 86.1 Mbps 81.3 Mbps Pathload 9.9 10.2 mbps 10 15 Mbps 1.22 2.02 Mbps pathchirp 26.5 Mbps 28.3 Mbps 26.6 Mbps
The test results are discussed in the next section. The only downfall of the current test is that we don t have anything to compare the results to. Because we don t know the capacity of the paths between each set of routers, we can only make observations based on the results from the previous tests. 3 Results In this section we try to analyze the results obtained from the tests we performed on available bandwidth estimation tools. We start with each test in the order they were presented. The first test was performed between two machines connected by a 100Mbps Cat5e crossover cable. In this scenario we know that each tool should output results close to the path s capacity, with an acceptable e error, which system monitor tool showed can be ignored. From the table 2 we can see that Spruce s output is very accurate and shows consistency for each run. At the same time Pathload instead of giving us one estimate, gives us a range. Judging by the results from Table 2, the interval Pathload outputs is too wide and we definitely cannot consider it consistent. On the other hand, pathchirp s results are more accurate than the ones of the Pathload but the error is too big having in mind that we didn t have any traffic on the path. In order to define an acceptable deviation for each tool we calculated the standard deviation for each test, given by the following formula: 1 2 M 2 x ( Xi X ) M 1 i= 1 = (1) Where M is the number of runs and X is tests sample mean which can be calculated using the given formula. X M i= 1 = M Xi (2) After calculating the standard deviation for each of the tools the results are presented in the following table.
Table 5. Standad deviation for the first part of the first test. Tool Name Spruce 0.5 Pathload 3.3 pathchirp 5.8 Standard Deviation: After seing and analyising the results of the first test, Spruce looks to be the most acurrate of all. The second part of the first test when we put a 1MB load on the path changes our opinion a little. First we know that the available bandwidth output should be 92 Mbps. It is easy calculated by subtracting 1MB = 8 Mb from the total capacity of the cable which is 100Mb. After taking a look at Table 3 we can easily see that Spruce and Pathload both output almost the same results, which questions the accuracy of Spruce from the first test. At the same time pathchirp shows a very good accuracy ratio and consistency. Having in mind that pathchirp didn t perform too bad in the first part of the test too, we can freely consider pathchirp more acurate than the other two. The standard deviation of the tools for the second part of the first test is given in the Table 6. Table 6. Standard deviation for the second part of the first test (1MB load on the path) Tool Name Spruce 1.2 Pathload 5.6 pathchirp 1.5 Standard Deviation: Unfortunately we have to limit our comments on the planetlab test and base our reasoning on the comperison of the first test and assumptions about the links capacity between the testing machines. By looking at Table 4 results we can see that Pathload s output does not match any realistic results, knowing that the time interval between runs was on average 5 minutes. A closer look at Spruce s output raises the doubt that even in today s modern networks there is a minimum path capacity of 80Mbps between all 19 routers. At the same time pathchirp s output indicates that somewhere along the way between the 19 routers there is a 30 Mbps link that might have some load on it, which in today s networks is a more realistic number. But yet again we cannot let the above asumptions decide on a tool s accuracy: we shall rely only on the data that is known or can be calculated. x x
4 Timing After determining the available bandwidth on each path that is part of the network map between two nodes, finding the time it will take a certain path to transfer our data is a simple process of applying the given formula. T SizeofData = + e Avail. Bandwidth (3) Where e stands for propagation error and most often can be ignored. After labeling each path with the time it will take for respective path to transfer our data, we represent the network map using the graph data structure where the edges would represent the network paths. This way it will be much easier to find the optimal path in a later step. 5 Finding the Optimal Path After the network map between two nodes (n 1, n 2 ) is represented using a graph, with n 1 and n 2 being two nodes in the graph, (Fig. 4). N 1 89 ms N 3 25 ms 35 ms N 4 45 ms N 2 Fig. 4. Simple model of a graph representing nodes on a network. finding the optimal path (the path that takes the least time) is a straightforward application of Dijkstra's algorithm, with the condition that the search stops after search reached the target node, in our case n 2.
6 Conclusions The current report presented in a simplified form the theory and steps involved in finding the optimal path in computer networks which can be useful for a wide range of applications. A more in depth study is required in the future on path optimization techniques. There is especially room for research on the available bandwidth estimation step, having in mind that a tool should provide an accurate estimate of a path s available bandwidth in as short a time as possible while putting as light a load as possible on the network. For future work we would like to test a larger number of tools in different environments where the accuracy and performance can be calculated even in a realistic setup. Also an area that deserves more attention in the future is the optimization of path finding algorithms. Acknowledgments References 1. Planetlab. http://www.planet-lab.org 2. Jacob, S., Katabi, D., Kaashoek, F.: A Measurement Study of Available Bandwidth Estimation Tools. ACM 2003 3. Ribeiro, V.J., Riedi, R.H., Baraniuk, R.G., Navratil, J., Cottrell, L.: pathchirp: Efficient Available Bandwidth Estimation for Networks Paths. Rice University, SLAC/SCS-Network Monitoring, Stanford Univ. 4. Jain, M., Dovrolis, C.: Pathload: a measurement tool for end-to-end available bandwidth. CIS, University of Delaware. 5. Andersen, D., Balakrishnan, H., Kaashoek, F., Morris, R.: Resilient Overlay Networks. In Proc. ACM SOSP, 2001. 6. Marsic, I., Cheng, L.: Java-based tools for accurate bandwidth measurement of DSL Networks. ICAE 2002, IOS Press. 7. Shriram, A., Kaur, J.: Empirical Evaluation of Techniques for Measuring Available Bandwidth. INFOCOM 2007. 26th IEEE International Conference on Computer Communications. IEEE. 8. Jacobson, V.: pathchar A tool to infer characteristics of internet paths. ftp://ftp.ee.lbl.gov/pathchar. 9. Prasad, R., Dovrolis, C., Murray, M., Claffy, K.: Bandwidth Estimation: Metrics, Measurement Techniques, and Tools. MNET.2003.1248658, IEEE. 10. Hu, N., Steenkiste, P.: Estimating Available Bandwidth Using Packet Pair Probing. CMU-CS-02-166, September 9, 2002. 11. Curtis, J., McGregor, T.: Review of Bandwidth Estimation Techniques. CS Univ. of Waikato.