Deploying IP Anycast Core DNS Services for University of Minnesota Introduction and General discussion
Agenda Deploying IPv4 anycast DNS What is ANYCAST Why is ANYCAST important? Monitoring and using ANYCAST DNS service
What is ANYCAST? Anycast describes traffic distribution. Unicast refers to 1-to-1 conversations Multicast refers to 1-to-many conversations Anycast refers to 1-to-any conversations.
What is ANYCAST? Anycast describes traffic distribution. Consider an ACD, where all agents offer the same product set or solution set. All agents of a call center are created equal Your needs can be met by any agent DNS service model approaches the call-center. You really don t care who answers your request, just that you get the right answer.
Why is ANYCAST important? Anycast is designed for short queries. Connectionless protocols are ideal. Anycast provides a level of redundancy that DNS round-robin cannot provide. The overall service can be insulated somewhat from Denial of Service Attacks (DOS) The service scales very well. Traffic is routed to the server that is the closest (best path).
Why is ANYCAST important. The service leverages backbone routing infrastructure. When the service host is down, the route is withdrawn and another route (another anycast server) is selected. The service is relatively easy to clone once set up.
Caching DNS Problems: Hosts respond poorly when caching nameserver is unreachable Caching NS is hard to re-ip (static configs) Goal: Always have caching DNS service on first client-configured IP Solution: Use anycasted servers; configure ANYCAST IPs on clients
Caching DNS Two caching server IPs: 128.101.101.101, 134.84.84.84 Using BIND9 much better behaviour in outage situations. Better queueing, traffic throttling, etc. Core services are to be configured on 17 servers; 17 interfaces External services are to be configured on 2 servers; 2 interfaces Addresses are well known.
Anycast DNS service Each host announces route to IGP cloud
DNS response without ANYCAST Modern resolvers still have to go through a list of servers to get name resolution. A failure of one server must be experienced by each query until the problem is fixed. Connectivity problems and throughput are exacerbated by DNS response issues. Windows XP- SP1 Windows 2000 SP3 MAC OS X FreeBSD5.1 Linux 2.4.20 kernel OpenBSD3.3 Solaris 8 1 sec 1 sec 5 sec 5 sec 1 sec 1 sec 1 sec
Anycast DNS Address Use Caching DNS Service Note the poor behavior of OS stub resolvers The first configured DNS Server is tried on every query Can result in multi-second delays for many queries Perfect opportunity for anycast service
Anycast DNS Address Use Caching DNS Service Use of 128.101.101.101 and 134.84.84.84 is engrained. While the addresses are considered well known, the delegations from the roots have changed to use 128.101.101.1, and 128.101.101.9 as authoritative servers. The use of 128.101.101.101 and 134.84.84.84 to answer external requests is intended as a transition measure.
Internal and External Cache servers Issues to consider DNS cache poisoning for external users. Is this much of a threat could a poisoned server incorrectly redirect traffic to spoofed non-umn hosts? The use of well connected UMN servers as DDOS attack tools. The way we are currently configured, this issue should no represent that much of an issue. Once we separate the authoritative and cache servers, can we effectively mediate the use of the cache for recursion, or do we need to go back to allowing public recursion?
A foo.nts.umn.edu How do we mediate external recursion? A foo.nts.umn.edu X
Anycast DNS failover With the DNS appropriately attached to the routing infrastructure, a new server is selected as soon as the failure is detected. All subsequent DNS traffic flows via the new path.
Anycast DNS service selection Route insertion - path selection is deterministic. If the path specifies a next hop that is inaccessible, drop the update. Prefer the path that was originated this router. Next, prefer the path with the lowest IP address, as specified by the router ID. TelecomB-CN-02 128.101.101.17 ScottH 128.101.101.25 ComH 128.101.101.33 CenH 128.101.101.41 KoltH 128.101.101.49 PWB 128.101.101.57 PeikH 128.101.101.65 TelecomB 128.101.101.73 MCB 128.101.101.81 HellerH 128.101.101.89 InfoTech 128.101.101.129 RegisCtr 128.101.101.137 PSB 128.101.101.145 WBOB 128.101.101.153 BioAgEng 128.101.101.161 GrnH 128.101.101.169 BioSci 128.101.101.177
Dynamic failover Host up doesn t imply service is up Want a mechanism for withdrawing routes automatically when service is unusable The current method uses a home-brew script (named-cron) that will periodically (1 every 5 minute) probe to see that the service is running well. Auto restart in the event of failure Trap to the network management platform.
Monitoring the Anycast service Checking who is servicing cache requests is easy. If the server answering is not local, there is a problem. Unix% dig hostname.bind chaos txt ;; ANSWER SECTION: HOSTNAME.BIND. 0 CH TXT InfoTech-SV-01 It is expected that we will leverage Entuity to receive traps and handle resulting events. Entuity Bulletin board events. Service Center automation of tickets
Monitoring the Anycast service Checking who is servicing cache requests is easy. If the server answering is not local, there is a problem. Unix% dig hostname.bind chaos txt ;; ANSWER SECTION: HOSTNAME.BIND. 0 CH TXT InfoTech-SV-01 It is expected that we will leverage Entuity to receive traps and handle resulting events. Entuity Bulletin board events. Service Center automation of tickets
Monitoring the Anycast service INFOTECH-SV-03 128.101.101.101/32 INFOTECH-SV-01 SCOTTH-SV-03 128.101.101.101/32 SCOTTH-SV-01 128.101.101.101/32 TELECOMB-SV-21 TELECOMB-SV-21
Manual Switchover of Anycast DNS Identify the server to be disabled BioAgeng-SV-01 Locate the peering point. ssh BioAgeng-CN-01 show ip bgp vpnv4 vrf cserv neighbor inc BGP neighbor BGP neighbor is 128.101.101.181, vrf cserv, remote AS 65230 Manually shut down the BGP session. ssh BioAgeng-CN-01 configure t router bgp 65217 ip-address family ipv4 vrf cserv neighbor 128.101.101.181 shutdown end
TCP-Based Anycast Services Unwise to use anycast for long-term TCP services, due to route changes Experience shows that routes are generally stable, though Equal cost load balancing would cause problems But, routers often do flow path caching
Other (Potential) Uses NTP/Time Syslog RADIUS Kerberos Single packet request-response UDP protocols are easy