Denial of Service Attacks and Resilient Overlay Networks Angelos D. Keromytis Network Security Lab Computer Science Department, Columbia University
Motivation: Network Service Availability
Motivation: Network Service Availability
Why Does It Matter? We are increasingly relying on Internet Services Financial services, Voice over IP (VoIP), e- Government, news, Cloud Computing,... But Internet Services are not dependable... Denial of Service attacks can disrupt online service DDoS attack on Estonia (2007) 2 Weeks, 1M computers, 5,000 clicks per second DDoS attacks against Georgia (2008) Storm Worm: 1.7M infected machines used for DDoS (typically extortion) Ease of assembling and controlling botnets means the problem will persist
Defenses End-users/sites: Bandwidth over-provisioning Multi-hosting/multi-homing Use of Content Delivery Networks ISPs: Blackhole routing Anomaly detection & blocking Centralized vs. distributed
Research Activity IP traceback (attribution) IP Pushback (reactive blocking) Collaborative filtering (reactive blocking) Router/receiver capabilities (proactive blocking) Improve host-based protection
Impediments to deployment Few economic incentives for deployment Most schemes require global adoption & deployment End-users lack the means to react DDoS is mostly an externality for ISPs no market opportunity for router manufacturers Cross-ISP collaboration not always feasible Competition concerns
Overlay Networks A different term of distributed system Collection of systems Connected over a wide-area network, such as the Internet Route traffic amongst them without considering physical topology Addressing, neighborhood, other properties may differ from those of the actual network fabric Good way of introducing new functionality into the network without changing routers/protocols (and, sometimes, end-hosts)
Using Overlay Networks Distribute logical function of a firewall across the Internet Allow users to contact any overlay node Any overlay node can validate a legitimate user Once admitted into overlay, user s traffic is treated preferentially Allowed to reach attacked site All other traffic dropped/rate-limited
Advantages of Overlay Networks Difficult to attack with a DDoS due to distributed nature Assumes large enough overlay Does not rely on ISP co-operation or goodwill Can take advantage of such, where it exists A single overlay can provide protection service to different users Commercialization model similar to CDN A large enough distributed organization can create its own overlay
Issues with Overlay Networks How do users discover (accessible) overlay nodes? Largely static content, users (software) can access any node Overlay network becomes obvious target of attack Dedicated nodes, easier to harden Performance issues Higher latency, lower throughput due to non-direct routing How can we tell who is a legitimate user, vs. a bot? How do we effectively discriminate overlay vs. non-overlay traffic?
WebSOS: Protection for Web Services
WebSOS: Protection for Web Services
WebSOS: Protection for Web Services
WebSOS: Protection for Web Services
WebSOS: Protection for Web Services Can we remove Packet Filtering?
Move: An End-to-End Solution for DDoS
Move: An End-to-End Solution for DDoS Attack
Move: An End-to-End Solution for DDoS
Move: An End-to-End Solution for DDoS
Prototype in Planet-Lab
Migration Performance Round Trip time
Limitations of WebSOS & MOVE Time (in seconds) Latency Overhead for various SSL Services 8 7 6 5 4 3 2 Direct Original Request Cached Requests 1 0 Yahoo! Verisign Columbia Columbia (2nd) Latency increase by a factor of 2 when using indirection
Limitations of WebSOS & MOVE Time (in seconds) Latency Overhead for various SSL Services 8 7 6 5 4 3 2 Direct Original Request Cached Requests 1 0 Yahoo! Verisign Columbia Columbia (2nd) Latency increase by a factor of 2 when using indirection Also vulnerable to some more intelligent attacks...
New Attack: Sweeping Attack
New Attack: Sweeping Attack
New Attack: Sweeping Attack
New Attack: Stalker Attack
New Attack: Stalker Attack
New Attack: Stalker Attack
New Attack: Stalker Attack
EXAMPLE: TCP assumes low-latency, stable, fixed capacity, high-quality link Resilience Results: Throughput Throughput vs Degradation Error Rate of in SACK regular with TCP PER 10 9 C7,06;6/8*+-*?-%&"'-B8/9-5!!-I6+;-#F5K RTT: 6ms RTT: 24ms RTT: 104ms Goodput (Mbps) 8 7 6 5 4 3 Performance region of wired networks Performance region of MANETs!"#$%&"'( )*+,-.*+/0*1-1**234-56/7-.*+/0*1-83-/97-* 6;62/6/8*+- <7.96+83<4 2 1 0 0 2 4 6 8 10 Packet Error Rate(%) %*=0.7(-5*>=3/-!"#-?*0-)60,7@A6+;B8;/9-C716:D-#6.E7/-F063=07-6+;-G=1/8@#6/9-F+H80*+<7+/34 %98HE=<60 '61:6+606<6+ I5#JKD-'4'4-56<6E0839+6+ I&!L!-)6>3-573760.9K
Fix attempt: use many entry points But this solution increases the state stored!!!
Ticket-based mechanism to the rescue Move state to the ticket Ticket is issued by the Overlay using a shared key Ticket becomes a contract between the user and the overlay Use of a shared key guarantees honor of the agreement
Key & Ticket Establishment protocol
Ticket Design Random spreading sequence protects against stalker attacks Packet sequence range guarantees traffic control Ticket design and issue protocol prevent replay, spoofing and computational attacks
Client Connection Initiation
Spread Spectrum Architecture - Replication Multi-Path + Spreading + Ticket allows Packet Replication
A2M: Access Assured Mobile Desktop Computing
Performance Results: Latency End-to-End Latency with Client Packet Replication 2.75 8 nodes 16 nodes 32 nodes 76 nodes 2.50 2.25 2.00 Overlay / Direct 1.75 1.50 1.25 1.00 0.75 0.50 No Repl. 1.5x 2x 3x Client Packet Replication
Resilience Results: Throughput Throughput vs Node Failure 600 Direct No Repl. 1.5x 2x 3x 500 400 KB/Sec 300 200 100 0 1% 3% 4% 9% 17% 21% 27% 35% 43% 46% 55% % Node Failure
Resilience Results: Latency (Web) End-to-End Latency vs Node Failure (Web) No Repl. 1.5x 2x 3x
Performance Results: Latency (Web) Web Latency vs Packet Replication Video Quality vs Packet Replication 0.50 80 hosts 8 hosts 100% Size: 352 x 240 80 hosts 8 hosts 0.40 80% Latency (s) 0.30 0.20 Video Quality 60% 40% 0.10 20% 0.00 Direct 0% 50% Replication 100% 200% 0% Direct 0% 50% Replication 100% 200%
Ticket Generation Benchmark
Resilience Results: Video Streaming 100% Video Quality vs Node Failure Video Quality 80% 60% 40% 20% 0% 0% 50% 100% 200% 0 10 20 30 40 50 % Node Failures
Resilience Results: Video Streaming Video Quality vs Node Failure for Wireless 100% 80% Video Quality 60% 40% 20% 0% 0% 50% 100% 200% 0 10 20 30 40 50 % Node Failures
Resilience Results: Video Streaming Video Quality vs Node Failure 100% 80% Video Quality 60% 40% 20% 0% 5 clients - 0% 5 clients - 50% 5 clients - 100% 5 clients - 200% 8 clients - 0% 8 clients - 50% 8 clients - 100% 8 clients - 200% 0 10 20 30 40 50 % Node Failures
TCP Friendliness of Approach Initial implementation non-tcp friendly provided the worst case scenario (use of non-responsive channels) Current implementation encodes path in the TCP options field for acknowledgments generating a different TCP-window for each path Works for regular TCP, UDP, and UDP-encapsulated TCP Existence of multiple paths makes attacks against TCP more difficult
Conclusion Recent events have demonstrated the continued and real threat of DDoS as an effectve instrument of both cyber-warfare and cyber-crime Overlay-based mechanisms can mitigate the impact of large DDoS attacks Topology- and provider-independent deployment at relatively low cost Performance impact low (< 10%), only incurred during attack periods A pan-european DDoS Protection Network? Leverage PlanetLab/GRID sites as seeds
What is the underlying problem? How clients connect to the overlay: Connection to a single indirection node (entry point) Client s state is stored to this entry point End-to-End connection depends on a small but static set of overlay nodes
What is the underlying problem (II)? How the overlay sees the client: User can establish multiple connections to an overlay node An authenticated client can inject any amount of traffic to the overlay network Even if there is access control in the entry point the user can reset that by attacking the entry point