Translated Document ( from Spanish original) High Availability in Linux Firewalls using VRRP Original by Sancho Lerena <slerena@iname.com> 15 Abril 2002 Translated by Ben Terry 10 June 2002. It is prohibited to modify this document without references to the author of the original work. This document is published under GPL. Any use of this document for commercial purposes is outside the scope of this document. The author is not responsible or liable for any problems that could be caused by the actions taken from the reading of this document
0. Introduction to the High Availability (HA) in firewalls Not written yet. 1. Introduction to VRRP VRRP is a protocol standard used for route redundancy, in effect, a generic redundancy protocol, referenced in RFC 2338. The idea is very simple and allows for implementation in practically every device in a network enviroment. VRRP can be found in production today in almost all platforms. Many types of network hardware, such as routers or load balancers, can implement and participate in internal VRRP. The protocol is very similar to Cisco s HSRP, although their standards are opened and proprietary commitment on the part of markers or manufacturers is subject to no one. Its operation is based on IP multicast and MAC multicast, therefore it is necessary that these are supported in a TCP/IP implementation of the SSOO which is we are using. In the case of Linux it s necessary that these parameters are part of the kernel. It is also worth emphasizing that the protocol has been designed to work simply with IPv4, but proposals for a similar implementation for IPv6 do exist. The concept is simple and is based on the necessity to have a reserve machine that acts as destination in a route. If we have a router and it fails, all the routes that use that gateway as destination are lost, if we have a reserve machine that takes the place from which it has failed, we can avoid the failure automatically and intelligently. This is the concept of Failure Redundancy and this is the first model of Redundancy for Routers, and equally Firewalls. As we will see further ahead, we can extend this model so that instead of having a machine in delay (Stand By) that does not do anything, we can make distribution of load between two machines and if one of the two fails, the other takes control of the traffic bound for the failed device, everything happens transparent and automatic. This advanced model can be implemented with VRRP. We will also see that VRRP can be applied to hosts and non-single gateways, and is able to be implemented in an extremely simple form of clustering with load balancing and HA in any type of network: HTTP, FTP, telnet, and with any ordinary type of TCP/UDP service.
2. Foundations of VRRP Beginning with the initial concept: the need for a multi-homed machine that acts like a gateway, simply routing packets from one interface to the other. In a transparent way, but like a router or filtering like a router. We will even see that we also can do modifications in packets doing NAT transparently before a network failure. IP Real: 10.0.0.2 IP VRRP: 10.0.0.254 VRID = 1 IP Real: 10.0.0.1 IP VRRP: 10.0.0.254 VRRP has several concept anchors, which are interesting principles to know since we will use this terminology to explain the following elements. When we speak of machines, we will talk about gateways, routers, firewalls or hosts, understanding its roll in its surroundings are wanted to implement the redundancy. Virtual Router (VR), is one of the machines that participates in the HA configuration, this can be as we said, a router, firewall or host. The one requisite is that it has a formed VRRP daemon and can execute itself in at least one interface. IP Virtual, is the IP that shared amongst several machines and is the base the HA implementations. This Virtual IP is the one that we will use to refer us to the assembly of machines from an external point of view, that is to say, the jump-off point in the route for all hosts. It does not have anything to do (in principle) with the physical IP of the adapter. VRID (Virtual Router ID): is the identifier (a whole number of 8 bits) of the Virtual Router, or assembly of machines that share the Virtual IP. This number must be unique and can only be used by the machines that share that virtual IP. In the different case of using the same number for the virtual IP's, it is necessary to make sure that the cards that use the same VRID are in different or separated physical networks logically with a VLAN by port. VR Priority, is a whole number of 8 bits, and is the assigned weight to one of the VRID of a Virtual Router, with him we specified the behavior of IS it since we can establish a hierarchy based on the greater priority. The greater priority is in 255. We will see that the node that has greater priority acts as VRMaster and the rest of nodes of the network with same VRID acts like VRBackup. VRMaster and VRBackup, is the way we reference the VR according to which function it has in the HA configuration at the present moment. When a VR in Backup status does not receive traffic for that virtual IP. (Although of course it can receive it for his dedicated IP or other VRID that is like VRMaster).
Version Type VRID Priority Count IP Addr Auth Type Advertising Interval Checksum IP Address ( 1 )... IP Address ( n ) Autication Data ( 1 )... Autication Data ( n ) The VRRP protocol is based on IP and its number assigned by IANA is 112. We can see the head of the protocol as specified by RFC 2338. For more details on the implementation of this protocol, we can reference the documentation of the IETF that is very concise and makes specific in this respect. It is possible to emphasize its importance by the different values that can have the priority field, and which value has vital importance since these values determine the machine s behavior in a VRRP assembly. Priority 0 means that the node has let participate in VRRP group, that well is sometimes not implemented, that way we will suppose that the way to let participate in effective form like Virtual Router, is to stop the VRRP daemon who shows that VRID. Priority 255 means that the VR has the status of Master and acts like so. Actually the VRMaster is determined with the VR that has more priority. In the case of having two VR with the same priority usually we will have a problem since it depends on how it is implemented, in any case it is not easy to arrive at that point if a strategy is followed when implanting VRRP, simply deciding to stagger the degrees of priority in units of 10, 20 or 50. Let us see an example: In this case the Master is machine B (here represented as a generic router). The machine A has a priority of 50 whereas B has 100, the selection of the master is clear. At this moment a "Flood" IP multicast by that network, towards defined exists with VRID 1 that consists of a mere interchange of packets between IP Real: 10.0.0.2 IP VRRP: 10.0.0.254 Prio = 50 VRBackup A VRID = 1 B IP Real: 10.0.0.1 IP VRRP: 10.0.0.254 Prio = 100 VRMaster
the elements of VRID 1 indicating that members exist, that IP and that have the priority. Packets between the nodes interchange, exists "a virtual" interface of network Ethernet with a defined MAC of static form as it follows: 00:00:5E:00:01:XX, where XX are the value in hexadecimal of the VRID, this way each VR has a different MAC for each VRID. VRRP uses 224.0.0.18 like reference multicast IP, this is indifferent except clear is if it interests to us to have it in account from the point of view of the filter that we have applied on firewall. We must allow that traffic between the groups of implied machines. If we ran TCPDUMP in that cable segment we would see the following: [Example with other IP's, 100 is the one of VRRP] 07:57:42.491568 arp who-has 192.168.6.100 tell 192.168.6.3 07:57:42.491885 arp reply 192.168.6.100 is-at 0:0:5e:0:1:6a 07:57:44.508415 192.168.6.3 > 192.168.6.100: icmp: echo request 07:57:44.508693 192.168.6.100 > 192.168.6.3: icmp: echo reply 07:57:45.321438 192.168.6.10 > 224.0.0.18: ip-proto-112 20 07:57:45.321922 192.168.5.10 > 224.0.0.18: ip-proto-112 20 07:57:46.331429 192.168.6.10 > 224.0.0.18: ip-proto-112 20 07:57:46.331904 192.168.5.10 > 224.0.0.18: ip-proto-112 20 07:57:47.341432 192.168.6.10 > 224.0.0.18: ip-proto-112 20 07:57:47.341928 192.168.5.10 > 224.0.0.18: ip-proto-112 20 07:57:47.491230 arp who-has 192.168.6.3 tell 192.168.6.10 07:57:47.491442 arp reply 192.168.6.3 is-at 0:c0:26:70:12:34 We can see the multicast IP establishes the Master, and the backup remains listening in the VRRP channel. If a member does not find packets of others by a Heart Beat (so called because it indicates the state of life of the participants in the VRRP group), it would be promoted as VRMaster and would adopt the IP of the Virtual Router, assigning in addition the virtual MAC to the VR. Let us assume this happens because the master has had a problem and the VRRP packets do not arrive to his companion, this would be due to having the VRRP service stopped or because the machine has failed. In any case, we can suppose that it needs service. If it returned in good condition suddenly, it would listen to the VRRP channel and if it saw that their VRID is superior to the VR like master, it would make an announcement of his VRID and Priority and would settle down like Master. The basic idea is that a Heart Beat based on protocol IP 112 takes place (VRRP), and that through a concrete multicast IP propagates the state of the cluster and the degree of Master/Backup is decided. Configuration of VRRP in Linux We have a basic VRRP daemon, at the moment in a quite stable version that is we can use for this task. First we must obtain the version of VRRPD and compile it, it is not very
complex, rather, quite simple. We can find an extension to the original VRRPD in http://www.linuxvirtualserver.org/~acassen/, Alexandre Cassen <acassen@linux-vs.org> The original implementation of Jerome Etienne, can be found here http://w3.arobas.net/~jetienne/vrrpd/vrrpd-0.4.tgz Once fact this, we happen to see its syntax: vrrpd -v <vr_id> -p <prio> -i <interface> <virtual_ip> The parameters are quite evident, since they are with which we have worked in the examples and the previous explanations. What there is to consider now it is that firewall works (in real cases of production) like a machine that does forward of traffic, that is to say, the traffic enters by an interface and leaves by another one: the traffic is not originated in the local machine, nor has like origin the local machine. That is to say, in the process they are always involved two interfaces of network, soon is logical to think that in the balance process it must include to mount VRRP in the interfaces where there is traffic. The conclusion to this, is to mount a daemon VRRP in each interface where it is wanted to implement HA. Prio = 50 VRBackup IP Real: 10.0.0.2 IP VRRP: 10.0.0.254 We can see a scheme of this idea, shaped like a device of generic routing (represented like two routers), forming a configuration of IS Active/Passive, where Router B acts like the master. First we before see as the nodes are formed to A and B: A Prio = 50 VRBackup IP Real: 10.0.10.2 IP VRRP: 10.0.10.254 VRID = 1 VRID = 101 B 10.0.0.250 Prio = 100 VRMaster IP Real: 10.0.0.1 IP VRRP: 10.0.0.254 Prio = 100 VRMaster IP Real: 10.0.10.1 IP VRRP: 10.0.10.254 10.0.10.250
Castor (Nodo B) Interfaces eth0 Link encap:ethernet HWaddr 00:C0:DF:E2:50:AF inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:79 errors:0 dropped:0 overruns:0 frame:0 TX packets:4 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:9 Base address:0xffe0 eth1 Link encap:ethernet HWaddr 00:A0:C9:4C:F8:CF inet addr:10.0.10.1 Bcast:10.0.10.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:145 errors:0 dropped:0 overruns:0 frame:0 TX packets:73 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:9 Base address:0x2000 Rutas Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 10.0.0.0 0.0.0.0 255.255.255.0 U 40 0 0 eth0 10.0.10.0 0.0.0.0 255.255.255.0 U 40 0 0 eth1 0.0.0.0 10.0.0.250 0.0.0.0 UG 40 0 0 eth1 Pollux (Nodo A) Interfaces eth0 Link encap:ethernet HWaddr 00:00:5E:00:01:6A inet addr:10.0.02 Bcast:10.0.0.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:908 errors:0 dropped:0 overruns:0 frame:0 TX packets:838 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:9 Base address:0xfca0 eth1 Link encap:ethernet HWaddr 00:00:5E:00:01:69 inet addr:10.0.10.2 Bcast:10.0.10.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3269 errors:0 dropped:0 overruns:0 frame:0 TX packets:2541 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:100 Interrupt:11 Base address:0x2000 Rutas Kernel IP routing table Destination Gateway Genmask Flags MSS Window irtt Iface 10.0.0.0 0.0.0.0 255.255.255.0 U 40 0 0 eth0 10.0.10.0 0.0.0.0 255.255.255.0 U 40 0 0 eth1 0.0.0.0 10.0.0.250 0.0.0.0 UG 40 0 0 eth1
We could try to load the daemons by hand in the following way: two for node A and two for node B, each daemon listening on a different interface. For node B (Castor) vrrpd -v 1 -p 100 -i eth0 10.0.0.254 vrrpd -v 101 -p 100 -i eth1 10.0.10.254 For node A (Pollux) vrrpd -v 1 -p 50 -i eth0 10.0.0.254 vrrpd -v 101 -p 50 -i eth1 10.0.10.254 The problem of all this is that generally firewalls have more than two interfaces, and as we will see further ahead, can interest that firewalls in HAS are simultaneously working balancing traffic, which will suppose, to duplicate the number of interfaces: Summarizing, we will have to manually send to many processes in each host. And this is the problem. The main problem of all this is that we will have a good number of VRRP daemons running in the machine, and when it is necessary to stop one of them we do not have form to determine that process is necessary to stop, since doing ps A cannot be differentiated to what interface and VRID belong to each one. In order to avoid that problem vrrp-start and vrrp-stop set out scripts that manages by means of the VRID and the interface the PID of the daemons, this way the syntax to raise an instance of daemon VRRP is as followings: vrrpd-start <vrid> <prio> <iface> <virtual_ip> vrrpd-stop <vrid> <iface> Scripts mentioned previously are the following: vrrp-start #!/bin/bash # # VRRP Daemon Start, 01/03/02 # Sancho Lerena, slerena@gnusec.com VRRPD=/usr/sbin/vrrpd INIC="VRRP Daemon Start, Sancho Lerena <slerena@gnusec.com>" VER="v2.0, 15/04/02" PIDFILE="/var/run/vrrpd.pid" PIDFILE_TMP="/var/run/vrrpd.pid.tmp" echo $INIC $VER if [ $# -lt 4 ] echo " Syntax: " echo " " echo " vrrpd-start <vrid> <prio> <iface> <virtual_ip>" echo " " exit VRID=$1 PRIO=$2 IFACE=$3 VIRTUAL_IP=$4
# We did not verify that the data passed as parameters are correct or with sense if [ -e "$PIDFILE" ] # If it exists we continued verifying # If the file exists, we verified that # there is not a VR installed in the same interface. RES=`grep "$IFACE:$VRID:" $PIDFILE` if [ -n "$RES" ] # If it exists echo "ERROR: A VRID already exists on the interface." exit # We start the daemon /sbin/start-stop-daemon --start -m --pidfile $PIDFILE_TMP --background \ --verbose --exec $VRRPD -- -i $IFACE -v $VRID -p $PRIO $VIRTUAL_IP # We wait until the daemon starts while [! -e $PIDFILE_TMP ] do sleep 1 done; # Obtenemos el PID de este daemonio PID=`cat $PIDFILE_TMP` echo "Starting VRRP Daemon, with PID "$PID" echo "VRRP Data: $VIRTUAL_IP"("$IFACE ") with VRID " $VRID " and Priority " $PRIO # We write this information into the daemon s information file echo $IFACE:$VRID:$PID >> "$PIDFILE" rm $PIDFILE_TMP echo "Waiting for VRRP Daemon" sleep 10 echo "Restoring IP Routing" # Here you must put your IP routes, because when VRRP changes the MAC in your # system, IP routes have been deleted automatically. Please be warned about # this and check this issue with care. vrrp-stop #!/bin/bash # # VRRP Daemon Stop, 01/03/02 # Sancho Lerena, slerena@gnusec.com VRRPD=/usr/sbin/vrrpd INIC="VRRP Daemon Stop, Sancho Lerena <slerena@gnusec.com>" VER="v2.0, 15/04/02" PIDFILE="/var/run/vrrpd.pid" PIDFILE_TMP="/var/run/vrrpd.tmp" echo $INIC $VER if [ $# -lt 2 ]
echo " Syntax: " echo " " echo " vrrpd-stop <vrid> <iface>" echo " " exit VRID=$1 IFACE=$2 # We did not verify that the passed data parameters are correct or with sense if [ -e "$PIDFILE" ] # If it exists we continued verifying # If the file it exists, we verified that there is not # a VR installed in the same interface. RES=`grep "$IFACE:$VRID:" $PIDFILE` if [ -z "$RES" ] # If an entrance with this data does not exist echo "ERROR: No existing VRID on this interface." exit else echo "No existing $PIDFILE, no VRRPD process running." exit; # We obtain the PID PID=`echo $RES cut -f 3 -d ":"` echo "Stopping VRRP Daemon, with PID "$PID echo "VRRP Data: ("$IFACE") with VRID " $VRID kill $PID # We erase this information from daemon s information file grep -v "$IFACE:$VRID" $PIDFILE >> $PIDFILE_TMP rm $PIDFILE mv $PIDFILE_TMP $PIDFILE
3. Switch Over con VRRP Switch Over is when a failure in a member of cluster is detected and this it happens to be like Masters to be like Backup or disconnected node of group VRRP. In this case we can contemplate three events that justify a Switch Over: - Manual Shutdown (to do maintenance, p.e) - Physical Problems (disconnected network, feeding off, etc) - Breakdown detection on a a single interface. The global events but, like which they affect to total physical shutdown of the machine or the loss of connectivity (for example, the Firewall HA electrical provision, the network or the Hot-StandBy operating system with kernel panic), imply Red Datos "A" that the VRRP lets work and that the companion of group VRRP will realize of which the master has go down, in that case W the Switch Over is automatic, but that it Red Control y happens if there is a partial failure or a failure HeartBeat that is not detected by the VRRP mechanism?. fw1 <Activo> fw2 <Pasivo> For example, it can happen thus that one of the networks of firewall falls single, of being, would continue entering packets by the alive interface and they could not be enrutar by the fallen interface. This problem is known as a Black Hole typically. VRRP v2 does not cover it. Diverse manufacturers (Nokia, Cisco) have implemented mechanisms to resolve this problem, although we will approach it from an extremely simple form. If we lose the connectivity, we stop all the VRRP daemons, in this way the machine will lose the status of VRRP master since he will send Heart Beat VRRP packets. The consequence of all this, is that when a firewall that is in Stand By (to the delay) gives account of which the Master no longer sends Heart Beat VRRP packets, it will send Firewall HA Hot-StandBy Red Datos "B" Dfw1 <Down> Red Datos "B" Red Datos "A" fw2 <Activo> Red Control y HeartBeat
his to the VRRP group and the one that has a greater priority remains the Master, obviously in this case, where there only are two elements, the one that is the backup firewall is now the master. The way to implement "connectivity" control is by means of a PING test. The act consists of sending a PING to the host that responds and that is trustworthy (not a non apt remote host on the Internet, but a host that is on the LAN and that cannot be affected by retardations in ping). In the proposed configuration, it would be simple to execute this script using cron, executing every minute, and the monitored IP s of the Switches that are in the two networks of the firewalls, in this case the 10.0.0.250 and the 10.0.10.250, which are the IP's of the Switches of the previous examples. In case it failed the ping, the execution of VRRP in the host would be aborted, including all daemons running VRRP, we could improve this script by implementing some type of alert in the Syslog (or via SNMP, email, etc.) since the host deactivates the VRRP, but its local IP's continue working (in the case that the network is not the problem). vrrp-check #!/bin/bash # # Checking connectivity with ICMP Ping, VRRPD Companion Script VER="11/03/2002 - v1.0" PIDFILE="/var/run/vrrpd.pid" if [ -z $1 ] echo " ping check " $VER echo " " echo " params :" echo " pingcheck <ip_dest> [ <check_time> ]" echo " " exit SLEEP_TIME=$2 if [ -z $2 ] SLEEP_TIME=5 # Run-down time between checks, in seconds # If not specified, check is every 5 seconds # Obtain the PID the VRRPD processes in memory LISTA_PROCESOS=`ps -A grep "vrrpd" tr -s " " cut -d " " -f 2` if [ -z "$LISTA_PROCESOS" ] echo " No VRRP Daemon running, aborting. " exit IP_DESTINO=$1 # IP of verification, passed like parameter #1 RES=0
while [ "$RES" -eq 0 ] do COMANDO="`ping -c 1 "$IP_DESTINO" grep '100% packet loss'`" if [! -z "$COMANDO" ] echo " Ping fail " echo " Shutting down VRRP daemons " kill -s 9 $LISTA_PROCESOS rm $PIDFILE RES=1 else # echo " Debug: Ping ok" sleep $SLEEP_TIME done; 3. Example of Operation How does a host see all this externally? The host behind a Cluster of firewalls, in this case, called hercules, sees the single IP of the cluster of the Firewall. Let s take a look at the graph. This tries to represent the vision that the host is behind the cluster that it sees. It sees the IP of the cluster, and in addition, it does not matter to us which of the cluster s members is the Master. The only matter to us is that we have an IP by where we put the packets and another IP by where the packets leave, the rest is irrelevant. This, clear is supposes to abstract information that from the point of view of the user of the cluster, deberia to be opaque. Internet Firewall HA Hot-StandBy Red Datos "A" Red Datos "B" Switch L2 Red Control y HeartBeat Router Remoto 201.34.12.94 Router Local 212.1.102.45 10.0.0.250 Switch L2 Red A 10.0.0.254 Red B 10.0.10.254 The Hercules configuration is simple: it has a route by defect to the 10.0.10.254, the Virtual IP of network B of cluster of Firewalls. We can ping the IP of the Cluster: Hercules 10.0.10.3 C:\>ping 10.0.10.254 Pinging 10.0.10.254 with 32 bytes of data: Reply from 10.0.10.254: bytes=32 time<10ms TTL=255 Reply from 10.0.10.254: bytes=32 time<10ms TTL=255 Reply from 10.0.10.254: bytes=32 time<10ms TTL=255 Reply from 10.0.10.254: bytes=32 time<10ms TTL=255 C:\>arp -a
Interface: 10.0.10.3 on Interface 0x2 Internet Address Physical Address Type 10.0.10.254 00-00-5e-00-01-6a dynamic