Tempesta FW Alexander Krizhanovsky NatSys Lab. ak@natsys-lab.com
What Tempesta FW Is? FireWall: layer 3 (IP) layer 7 (HTTP) filter FrameWork: high performance and flexible platform to build intelligent DDoS mitigation systems and Web Application Firewalls (WAF) First and only hybrid of HTTP accelerator and FireWall Directly embedded into Linux TCP/IP stack JIT Domain Specific Language (DSL) for traffic processing This is Open Source (GPLv2)
Challenges Is mostly about application layer (HTTP) DDoS: small HTTP requests and short-lived TCP connections requests prevail responses a lot of concurrent connection fine-grained filtration rules at all network layers! per-request resource consumption! drop early or die! high concurrency
Existing Solutions: How To Filter HTTP requests? Modules on Application HTTP servers Firewalls Deep Packet Inspection (DPI)
Existing Solutions Deep Packet Inspection (DPI) - not an active TCP participant can't accelerate content to mitigate defended Web-resource under DDoS SSL termination is hard User-space HTTP accelerators are too slow due to context switches, copies and are designed for old hardware Firewalls low layers only (IP and partially TCP) rules generation for app. layer is messy (fail2ban etc.) no dynamic rules persistency
L7 DDoS is About Performance: How To Accelerate Web-application DDoS mitigation CDN Filter DPI FireWall + HTTP accelerator Accelerator HTTP server
L7 DDoS is About Performance: How To Accelerate Web-application DDoS mitigation CDN Filter DPI FireWall + HTTP accelerator Accelerator HTTP server Extra communications Hard to manage
Web Application Firewall (WAF) Modern WAF: Heavy buzzwords: XHTML, WSDL,... Machine learning Tons of regexps Run on top of common Web server WAF Accelerator! (~ Web accelerator)
What's Wrong With Traditional Web Servers & Firewalls User-space & monolithic OS kernel (exokernel approach helps much): context switches copies no uniform access to information on all network layers No flexibility to analyze and filter traffic on all layers Designed for old hardware and/or oblivious to hardware features
Tempesta FW Architecture
Synchronous Sockets Reading from a socket in a context other than deferred interrupt context is asynchronous to arrival of TCP segments Synchronous Sockets: process packets while they're hot in CPU caches no queues do work when data is ready
Faster HTTP Parser Switch-driven (widespread): poor C-cache usage & CPU intensive Table-driven (with possible compression): poor D-cache usage Hybrid State Machine (combinations of two previous) Direct jumps (Ragel) PCMPSTR (~strspn(3) very limited) while (++*str_ptr): switch (state) { case 1: switch (*str_ptr) { case 'a':... state = 1 case 'b': case 2:...... state = 2
HTTP benchmark Core Classic HTTP parser: ngx_request_line: 909ms ngx_header_line: 583ms ngx_lw_header_line: 661ms ngx_big_header_line: 1983ms HTTP Hybrid State Machine: hsm_header_line: 433ms Table-driven Automaton tbl_header_line: 562ms tbl_big_header_line: 1570ms Goto-driven Automaton: goto_request_line: 747ms goto_opt_request_line: 736ms goto_header_line: 375ms goto_big_header_line: 975ms I7 (BPU!) Classic HTTP parser: ngx_request_line: 730ms ngx_header_line: 422ms ngx_lw_header_line: 428ms ngx_big_header_line: 1725ms HTTP Hybrid State Machine: hsm_header_line: 553ms Table-driven Automaton tbl_header_line: 473ms tbl_big_header_line: 840ms Goto-driven Automaton: goto_request_line: 470ms goto_opt_request_line: 458ms goto_header_line: 237ms goto_big_header_line: 589ms
Generic Finite State Machine (GFSM) Protocol FSMs context switch for ICAP etc.: (1) HTTP FSM: receive & process HTTP request; (2) ICAP FSM: the callback is called at particular HTTP state, current HTTP FSM state is push()'ed to stack (3) ICAP FSM: send the request to ICAP server and get results (4) HTTP FSM: the callback is called at particular ICAP state, stored HTTP FSM state is pop()'ed back Fundation for TL programs execution (~coroutine)
Tempesta DB: Web-cache & Filter mmap()'ed & mlock()'ed in-memory persistent database no disk IO (size is limited, but can be processed in softirq) Cache conscious Burst Hash Trie: NUMA-aware: independent databases for each node (retrieved by less significant bits); Can be lock-freed Almost zero-copy (only NIC disk) Suitable to store fixed- and variable-size records Quick for large string keys (e.g. URI) as well as for integer keys
Filtering Dynamic persistent rules with eviction (Tempesta DB) Set of callbacks on all network layers: classify_ipv{4,6} - called for each received IPv4/IPv6 client packet classify_tcp - called for each received TCP segment classify_conn_{estab,close} - a client connection is established/closed classify_tcp_timer_retrans - called on retransmissions to client and other TCP stuff and surely HTTP processing phases
Tempesta Language # One-shot function to be called at ingress IPv4 packet if (tdb.select("ip_filter", pkt.src)) filter(pkt, DROP); # Sample senseless multi-layer rule if ((req.user_agent =~ /firefox/i && client.addr == 1.1.1.1) length(req.uri) > 256) # Block the client at IP layer, so it will be filtered # efficiently w/o further HTTP processing. tdb.insert("ip_filter", client.addr);
Benchmark (very outdated) 10-core Intel Xeon E7-4850 2.4GHz, 64GB RAM (One CPU with 10 cores NIC RX and TX queues binding to CPU cores RFS enabled Nginx: 10 workers, multi_accept, sendfile, epoll, tcp_nopush and tcp_nodelay
Features & TODO (by Oct 2015) Simple HTTP proxy, GFSM, classification hooks Load balancing Simple rate limiting module Cluster failovering in progress Web-cache in progress Filtering in progress SSL/TLS (libressl) in progress Tempesta Language (advanced traffic processing) in progress
Thanks! Availability: https://github.com/natsys/tempesta Contact: ak@natsys-lab.com