Denial-of-Service Shrew Attacks Bhuvana Mahalingam mbhuvana@cs.bu.edu 1. Introduction A Denial of Service Attack is defined as An incident in which a user or organization is deprived of the services of a resource they would normally expect to have. In a broad sense there are two types of DoS attacks: Operating System attacks, which target bugs in specific operating systems and can be fixed with patches, and Networking attacks, which exploit inherent limitations of networking and may require firewall protection. Examples of networking DoS attacks include TCP SYN attacks that consume protocol data structures on the server operating system; ICMP directed broadcasts that direct a broadcast address to send a flood of ICMP replies to a target host thereby overwhelming it; and DNS flood attacks that use specific weaknesses in DNS protocols to generate high volumes of traffic directed at a targeted victim. All of the above are high rate attacks that could be detected. The focus of this project is on a class of low rate DoS attacks called shrew attacks, which attempt to deny bandwidth to TCP flows while sending at sufficiently low average rate to elude detection by counter DoS mechanisms. Shrew attacks were defined in [KK03]. The basic idea behind shrew attacks is to exploit TCP s retransmission time out mechanism. In particular, it has been suggested [AP99] that the minimum value of retransmission time out (RTO) should be set to atleast 1 sec. Since the vast majority of TCP flows have RTTs in the range of 10 to 100 ms, their initial time out values tend to be the same, namely 1 sec. The shrew attack sends a square wave of duration roughly 1 to 2 RTTs and a period greater than 1 sec. It is argued in [KK03] that the TCP flows synchronise with the attacker and repeatedly incur timeouts with period of 1 sec and thus obtain almost zero throughput. Since the burst length of the shrew attack is considerably less than its period the average rate of the shrew attack is low, thus potentially eluding detection. 2. Motivation DoS attacks are quite common these days. They are also very damaging. Even big companies sites like Yahoo and Amazon have been attacked. Recently, MyDoom was launched to cripple SCO Group s web site and caused havoc worth billions of dollars. Fortunately, such high rate attacks are difficult to launch since they require a huge client base roughly in the order of several tens of thousands of zombie clients. Furthermore, they are easy to detect because one can simply measure the rate of attack at the routers which would be noticeably more than regular traffic. Thus, it makes it easy for counter measures to be taken. Security is extremely important in the Internet considering it has become part and parcel of our lives. Not only is the task of finding counter measures challenging, but identifying new kinds of attacks is also important to make the infrastructure robust. 1
In this project, we study low rate attacks that are difficult to detect. There are no known mechanisms to counter such attacks. As a result, the potential for damage is significant, which makes the problem even more important to study. 3. Previous Work In this section, we discuss in detail the results of [KK03] which form the basis of our project. RTT calculation TCP detects loss either via timeout from non-receipt of ACKs or by receipt of three duplicate ACKs. When a packet has not been acknowledged within the time out period (RTO) and less than 3 dup ACKs have been received TCP times out. TCP periodically measures the RTT of packets and sets its RTO accordingly. More precisely, RTO is calculated as follows. RTTVAR = (1-5779$5 _6577 R, SRTT = (1-6577 5 RTO = max (minrto, SRTT + max (G, 4RTTVAR)). In the above, R is the measured RTT, G is the clock granularity ang DUHW\SLFDOO\VHWWR and ¼ respectively. When the first measurement R is made, then SRTT = R, RTTVAR = R /2 and RTO = SRTT + max (G, 4RTTVAR). [AP99] have recommended that minrto be set to 1 sec. As a result, the calculation of RTO indicates that if the second term is less than minrto, then RTO would be set to 1 sec. This is mostly the case since RTTs tend to range between 10 s of ms and a few 100 s of ms. The shrew attack exploits precisely this aspect of RTO calculation. Model and analysis of shrew attacks Shrew attack is modeled by a square wave in which the attacker transmits bursts of duration L at rate R in a deterministic on-off pattern that has period T. When the rate R coupled with existing traffic becomes greater than the link capacity loss is incurred. By setting the duration L to be more than the RTT of the flows and period T to be slightly more than minrto, TCP flows can be forced to repeatedly time out, thus obtaining virtually zero throughput. Since L is typically set much less than T, the average rate of the shrew attack, given by RL/T, is very low. In [KK03], the authors give an upper bound on the normalized throughput achieved in terms of the values of minrto and T, assuming certain conditions. For the shrew attack to be effective against a particular TCP flow, the burst duration L needs to be more than the RTT of the flow. Therefore, in general, the above described shrew attack is more effective against low RTT flows than against high RTT flows. In [KK03], the authors give a formula that measures the impact on low RTTs as opposed to high RTTs. 2
Other results By analyzing the instantaneous bottleneck-queue behavior, the authors derive that an optimal shrew attack is a double-rate wave which is a variant of the square wave. However, a square wave approximates the double-rate attack fairly well. One could ask whether Active Queue Management can help thwart shrew attacks. Unfortunately, the authors have experimentally shown that even though RED works well in avoiding global synchronization, under the influence of DoS attacks, RED is helpless since all TCP flows with low RTTs synchronize with the period of the attacker. Experimental evidence shows that effective shrew attacks can come from remote sites as well as nearby LANs. Counter DOS techniques [KK03] explore two kinds of mitigation systems against DoS attacks: router-assisted mechanisms and end-point minrto randomization. In the router-assisted mechanism, they study if there are measures to detect the low-rate DoS attacks. RED-PD is used to detect flows with high rates and drop packets belonging to these flows. Unfortunately, if rates are measured over small timescales, even normal TCP flows could be falsely considered malicious. On the other hand, if rates are measured over large timescales, shrew attacks could be missed. So, it appears that RED-PD is not an effective counter measure against shrew attacks. For the end-point minrto randomization, the authors consider selecting minrto uniformly distributed in the range (a,b). They prove that the normalized throughput is at most n/(n+1)* (b-a)/b. Thus, by increasing the range (a,b) one can potentially counter the shrew attack. However, the authors note that decreasing a could significantly degrade TCP throughput during periods of heavy congestion while increasing b could degrade the throughput of short lived flows. 4. Proposed Work Our plan was to reexamine the experiments conducted in the [KK03] paper in the following ways: 1. Study impact of clock granularities. 2. Impact of shrew attacks on TCP Vegas 3. Impact of reducing minrto to less than 1 sec. 4. Study if randomizing the scheduling of packets and buffer management will counter the attacks. In particular, can fair queueing at the router alleviate the problem? 5. Randomizing the initial timeout at different TCP sources 6. Effect of distance between attacker and attacked router 5. Simulation Set up I used ns-2.1b9a for all my experiments. The topology of the network looks like below. 3
0 4Mbps 4Mbps 2 3 1 4Mbps For most of the experiments, I used one TCP flow and one DoS flow. The TCP flow has source 0 and destination 3 while the DoS flow has source 1 and destination 3. As a result, the DoS attack is aimed at the queue in router 2. The bottleneck link is between nodes 2 and 3. Each link has a capacity of 4Mbps and latency of 10ms. The TCP flow was associated with an FTP application and used payload of size 460 bytes and a maximum window size 40 packets. Note that the bandwidth-delay product for the TCP connection equals 40 packets which is the maximum window size. Thus, if there were no other flows, then the TCP flow would get close to full throughput. The DoS attack is modelled by a square wave in which the attacker transmits bursts of duration L at rate R in a deterministic on-off pattern that has period T. We implemented this in ns by using a constant bit rate traffic generator that started and stopped periodically. The burst duration was set to L and the inter-burst segment (idle period) was set to T-L. The packet size was 46 bytes. 6. Findings 6.1. Impact of clock granularities The default clock granularity (in ns 2.1b9a) is 10ms. We studied if changing this default value smartly would reduce the impact of the shrew attack. We set the following parameters: Burst size of DoS attack is set to 100ms, inter-burst period of DoS attack is set to 1 sec. Figure 1 depicts the normalised throughput as the clock granularity changes. We notice that the throughput actually increases at 0.4 sec. One possible explanation for this is the following. The DoS attack has a period of 1.1 sec. Since the clock granularity is 0.4sec, when TCP flow has been attacked and it recovers from a timeout, it would only restart at multiples of 0.4 sec. In this case, the closest to minrto of 1 sec is 1.2 sec. While the DoS attack has begun at 1.1sec and ends at 1.2sec, TCP flow starts only at 1.2 sec and escapes the deadly DoS attack and reaches a peak normalised throughput of approximately 0.5. Since this only happens every alternate second the averaged throughput turns out to be about 0.11. Figure 2 illustrates this behavior. 4
5
6.2. Impact of shrew attacks on TCP Vegas The authors of [KK03] did not study the impact of shrew attacks on TCP Vegas. We studied this issue and found that Vegas is even more adversely affected than Reno. As Figure 3 shows, the throughput of Vegas is significantly less than Reno. In particular, it is surprising that even when the inter-burst period is as large as 5 sec, Vegas only gains little more than 10% of the bandwidth capacity. To investigate this further, we studied the instantaneous throughput of Reno and Vegas as illustrated in Figure 4. Parameters for Figure 4 are burst size is 0.1 sec, minrto is set to its default of 1 sec and DoS inter-burst period is 3 sec. 6
We notice that timeout for Vegas after the first attack is 2 sec. Subsequently, it is following an exponential back off pattern where the timeouts are 4 and 8 sec. At time 20 sec, the Vegas flow gets back some of the throughput, but again enters a similar timeout sequence. This is hard to understand why. 6.3. Impact of reducing minrto to less than 1 sec. The key idea behind the [KK03] paper is the belief that the minrto should be set to 1 sec as suggested by [AP99]. Since a minrto of 1 sec has been consistently proposed as a good measure for avoiding TCP congestion, it is unlikely that a change in minrto would be implemented in TCP stacks any time soon. Nevertheless, in this project we explored to see if changing minrto would indeed have any effect against the shrew attack. Figure 5 depicts the study done with two values of minrto. The parameters for the study are as follows: burst is set to 0.1 sec and inter-burst period varied as shown in the graph. The two values of minrto chosen are 500 msec and 1 sec. We observe that the graphs are very similar. We expect to see that the throughput is higher when inter-burst time increases as shown in the figure. By reducing minrto we notice that we get higher throughput, for a given shrew attack configuration. Of course, this is not surprising as smaller minrto means that flow recovers from timeout sooner. We did not perform any more experiments by changing minrto since we would not gain any more insights. 6.4. Fair Queuing and shrew attacks Figure 6 shows a simple case of 1 tcp flow and 1 DoS flow with fair queueing implemented using Deficit Round Robin. It is observed that a very high normalised throughput of approximately.88 is reached. Intuitively, this happens because of the fact that fair queuing allocates the link 7
capacity equally among all the flows (two flows in this case). As a result of which, for the TCP flow the available bandwidth is close to 0.5 instead of 1 whenever a DoS attack takes place. Some packets of the TCP flow do get dropped, but not enough to cause a timeout and the TCP flow continues to get near full throughput when the DoS attack is idle. It is interesting to see if adding more DoS attack flows (with the same total rate as before) would reduce the throughput since fair queueing evenly distributes the capacity among the flows. If the DoS attack has 10 flows for 1 TCP flow, it was observed that the throughput did not change much. It was on the same scale as with the 1 TCP flow, 1 DoS flow. The intuition behind this is that it appears that in current ns implementation of DRR, the router considers all the flows from a particular node on a link to be one flow rather than multiple flows. So, essentially this boils down to the same case as before. 0 4Mbps 1 4 2 3 4Mbps 4Mbps 5 6 8
We ran DRR in the above topology, which consists of multiple source nodes executing the DoS attack. There is one TCP flow and 4 DoS flows. The source and destination for the TCP flow is 0 and 3. The sources for the DoS flows are 1, 4, 5 and 6 whose destinations are the same, namely 3 in this case. Each DoS flow had a rate of 1Mbps at burst time (with the same burst and interburst periods); thus, the total DoS rate at burst time was 4Mbps, which is exactly the same as with a single flow in the above experiment. In this case, the router did consider the four flows as distinct ones. Consequently, one would expect the DoS attack to be more successful here. And this is what we observed! The normalised throughput achieved was only approximately 0.43. The instantaneous throughput graph is plotted below. We notice that the throughput reaches the value of 0.2 when the DoS attacks takes place as we expect since there are 5 flows in all (1 tcp + 4 dos). Sometimes, this reduced throughput triggers a timeout, other times it does not. The overall effect is an average througput of 0.43. This means that the DoS attack was partially successful. In summary, fair queueing seems to be an effective mechanism to thwart the basic DoS attack. However, the DoS attacker can retaliate by choosing to have multiple DoS flows with the same total low rate and cause damage. While this damage was not as high as in the case with DropTail queues, it was still significant. One point to note in our above experiment is that the multiple DoS attacks originated from several different sources, thus constituting a distributed low rate DoS attack. DdoS attacks are harder to carry out because they require access to a large pool of resources. On the other hand, they may be harder to detect, especially if they are low rate as in the case of the above experiment. Our main reason to use multiple sources was that the DRR implementation in ns seemed to define flows according to the source ids. If there was a way to set up different flows starting from the same source, then multiple DoS attacks from the same source may have inflicted the same damage. However, we are not sure and this is an issue worth investigating further. 9
6.5. Randomizing the timeout at different TCP sources There are two ways of randomizing timeouts. One is to randomize the minrto. This has already been studied in the [KK03]. As noted in my first report the authors consider selecting minrto uniformly distributed in the range (a,b). They prove that the normalised throughput is atmost n/(n+1)* (b-a)/b. Thus, by increasing the range (a,b) one can potentially counter the shrew attack. However, the authors note that decreasing a could significantly degrade TCP throughput during periods of heavy congestion while increasing b could degrade the throughput of short lived flows. Another approach involving randomizing timeouts is to modify the TCP implementation as follows. Instead of setting RTO = max (minrto, SRTT + max (G, 4RTTVAR)), RTO could be chosen uniformly at random from a range that depends upon minrto and SRTT+ max (G, 4RTTVAR). For instance, we could choose RTO to be in the range between 80% and 120% of max (minrto, SRTT + max (G, 4RTTVAR)). Doing this would imply that the timeouts at different times of a TCP session would be different. This could prevent the DoS attacker from ever synchronizing with the TCP flow. Note that this is fundamentally different from the first approach, where the timeout once chosen randomly would remain the same for the entire duration of the connection. Nevertheless, we are still unsure whether this approach is an effective counter measure. To the best of our knowledge, we would need to modify the ns source files to test this approach and we leave this for future work. 6.6. Effect of distance between attacker and attacked router 0 7 8 5 1 2 3 10 4 6 9 10
In order to study the effect of distance between the attacker and the attacked router, we ran two simple experiments using the above topology. The one way latency on all the links are 10ms. The capacity of every link is 4Mbps. In the first experiment there were 3 TCP flows 1. From node 0 to node 3, 2. From node 6 to node 5, 3. From node 6 to node 7. There is one DoS flow from node 4 to node 3. As a result there is an attack on two routers, node 1 and node 2. The DoS attacker is at hop distance 2 from node 2. We study the impact of this attack on TCP flow 1 (node 0 to 3). Note that the traffic from 6 to 5 can be viewed as cross traffic. The TCP flows are associated with FTP applications and the DoS flow has a peak rate of 4Mbps, burst period of 100ms, inter-burst period ranging from 0.5 to 5 sec. In the second experiment there were 2 more TCP flows: one from node 9 to node 8 and another from node 9 to node 7. The DoS flow was from node 10 to node 3 with the same settings as before. Note that the DoS attacker is at hop distance 3 from node 2. Figure 8 compares the throughput of TCP flow 1 in these two experiments with that achieved when the attacker was within hop distance 1. It is clear from the figure that the DoS attack is equally potent when the attacker is farther from the attacked router. This experiment is too simplified to provide any conclusions. A much more thorough study with more complex topologies and different kinds of traffic flows is needed. 11
7. Conclusions and Future Work In this project, we studied the effect of various system and protocol settings on the impact of lowrate DoS attacks. Our experiments indicate that clock granularities, different variants of TCP such as TCP Vegas and modifying minrto values are unable to mitigate the effect of low rate DoS attack outlined in [KK03]. One promising counter measure appears to be the use of fair queueing in the routers. Our experiments suggest that the TCP flows can regain as much as 90% of the available bandwidth in the presence of a single low rate DoS attack. We also observed, however, that the attacker could counter this counter measure by having a set of very low rate attacks with the same total rate as before and cause substantial damage. This interaction between fair queueing and multiple DoS flows deserves further study. It is worth mentioning that [SLY04] has also experimentally verified that a fair resource allocation mechanism can be used to minimize the number of TCP flows that are affected. Another potential counter measure is to randomize the timeout as discussed in Section 6.5. A study of this approach seems to require modifying the ns source code. Finally, our study of impact of distance between attacker and attacked router indicates that the DoS attack may be equally effective when the attacker may be farther away from the router. 8. Bibliography [AP99] M. Allman and V.Paxson. On estimating end-to-end network path properties. In Proceedings of ACM SIGCOMM 99, Vancover, British Columbia, September 1999. [KK03] Aleksander Kuzmanovic and Edward W. Knightly. Low-Rate TCP-Targeted Denial of Service Attacks. In Proceedings of ACM SIGCOMM 03 [SLY04] Haibin Sun, John C.S. Lui and David K.Y. Yau. Defending Against Low-rate TCP Attacks: Dynamic Detection and Protection. In Proceeding of the 12 th IEEE International Conference on Network Protocols (ICNP 04). 12