The topic from this post comes from the experience I had while focusing some time on hardening and growing CounterPath’s infrastructure in Boston. As those of you who work with CounterPath know, we run a "near-production" grade facility in Downtown Boston. We are located in the neutral Collocation room of the top carrier hotel facility in Boston. The beauty of this location is that every provider who has any kind of fiber or telephony and Internet service in Boston has a suite in the building. For CounterPath, this means that any service we are looking to utilize is a cross connect away. The facility is connected to multiple city grids, stores more diesel than I'll use in a lifetime in the basement, and has a [crazy] flywheel system that powers one of our redundant feeds until the Generators kick in (the other feed is on more traditional UPS + generator configuration). If you think your colocation or hosting provider is neutral and does things the right way, you need to stop by One Summer Street.
In Boston we have had a nascent Domain Name System (DNS) service, with our own DNS server feeding our IP providers hosted and redundant DNS servers. It has worked just fine, but was not the kind of configuration that I would write a post about. In the past we have used this DNS very mildly, most of our demo accounts and clients have gone out the door using the public IP addresses of our server infrastructure. This has worked well over time, but as we expose some production services in Boston to our customer base, we need to start looking at the benefits other than just a nice name that DNS provides (some additional levels of resiliency and failover in particular). We have always viewed the Boston Pop as a place where we learn, a place where we try things in the wild with our customers, and a place where we can show our customers how they can deploy services based upon our client and server technology. That has not changed. I wanted to use this post to share a bit more learning, this time about IP Anycast.
[Blogger Note: I'm two paragraphs in and haven't gotten to the point yet - I know everyone is having flashbacks of emails I send on similar technical topics - I'll stop turning this into a soliloquy and get on target.]
IP Anycast has always been a network functionality that I have understood, but not had a direct hand in setting up or using. Perhaps my earliest exposure was using AS1's DNS servers: 18.104.22.168 and 22.214.171.124. These have been around forever, and are rock solid, because they are deployed inside the (now Level(3)) backbone using IP Anycast. They are available, pingable, easy to remember, quick to type, so serve as great resolvers for servers with public network access and a great test tool.
Here is the overview of IP Anycast functionality: IP Anycast leverages Border Gateway Protocol (BGP) to announce the same sets IP addresses multiple times in a way that lets clients on other network connect to the closest instance of an IP address. The great part here is that the servers responding to these clients can be deployed at multiple locations within a provider’s infrastructure. This offers huge benefits when looking at clients connecting from different geographic locations. It also provides a built in reliability and resiliency that cannot be matched. If there was a loss of a connection, or BGP session, that path goes away and the next best path to get to that service can be selected.
When we started to look at what we would do with our DNS configuration for some new services that will be based in Boston, it became clear there was a notable performance and redundancy advantage to using an IP Anycast based DNS service. We could easily setup BIND, and run our own servers, but we would never match the performance and resiliency of an IP Anycast based DNS service. With these services, the DNS provider sets up many physical name servers, typically deployed in different geographic locations and connected to different IP backbones, and allows customers to use the name servers, or their own custom named servers.
We started a trial last month with one provider that has an Anycast network deployed. We looked at how their tools worked, chatted with their support staff, checked out the documentation, and ran some offhand tests to see that what they said they were doing was really what we were experiencing. Everything checked out, but watching this service, looking at uptime and response time is relatively hard to unless you are everywhere.
CounterPath recently added an external monitoring service to watch our services from outside our network. The service provides a simple check of the major externally facing components of our network and gives us peace of mind that our systems are up from someone else's network vantage point. Importantly the tool is easy to use and configure, allows low polling times (one minute), allows for checks of IP, DNS, HTTP, and other protocols, and provides a rich set of alarming methods as well as reporting. We have been using this tool in a very basic way for a few weeks now to monitor our IP connectivity. The cache of information provided by the reporting tools is impressive. The other important aspect from the perspective of checking out our new trial DNS provider was that the network monitoring service was distributed. We're currently using 25 locations, with nine of these in Europe, one in Canada, and the remainder spread across the US. Perfect for what we are looking to test with respect to Anycast.
Perfect, except things went horribly wrong:
Check out those response times. Wow. This is not a service I want to be using. Even worse than horrible response times, we were seeing outages. Something felt wrong. After some investigation, it appeared that while the results from the US (and Canada) and excluded Europe, things looked better:
This looked much better but still was not carrier class. There is a clear European problem, but there is more than that: something is choking our results. All of the failures were when we had a European Monitor (and then a secondary monitor server) check in on the DNS service. The failures were also when we were seeing the crazy high response times and were during business hours US East Coast time. Either the monitoring network was doing something or the DNS service had some systemic problems in Europe.
After taking a look a few times I decided to really dig in and write questions via email to the support teams at the monitoring provider and the DNS hosting provider. My email to the DNS provider let them know that I was seeing something strange, particularly in Europe. I asked about their European infrastructure, what they were doing with respect to Anycast there, and basically asking if my trial account included servers that were Anycast in Europe. The email to the monitoring provider contained careful questions about the operations of the monitoring servers, their secondary testing, how they used DNS, asking about impact of IP Anycast on their DNS monitoring services.
Then came the “Eureka” moment, or perhaps the “DOH!” moment - it all came back 126.96.36.199. Literally. I had setup 188.8.131.52 as the DNS server to which the monitoring network would send resolution requests. Seems perfectly normal, but breaks any of the goodness of IP Anycast deployment. All of the monitoring providers servers were hitting 184.108.40.206 which apparently does not have physical servers deployed in Europe or advertised via BGP in the EU. The traffic was coming back to Level(3)’s backbone, here in the US.
I quickly setup a second DNS monitor that used one of our name servers directly. No more failures from the European servers (so far), and all the resolution times were looking much better; especially within Europe, but even when isolating monitoring to the US and Canada. I am going to keep a close eye on things now, but the difference is night and day.
If there is a lesson here, it may be that all IP Anycast deployments are not the same. The lesson certainly cannot be that old habits are hard to break ( Example: using 220.127.116.11 ). Certainly Genuity and now Level(3) have had a historic focus on traffic and networks within North America so their lack of DNS servers in Europe is not a shock though the focus looks to be changing.
It is clear that we have learned that IP Anycast forms a solid basis for a DNS service. There really is something to the case of outsourcing this component of your infrastructure, unless running an IP Anycast DNS service is possible. This is of course possible with two or more locations, but perhaps not optimal until you grow this in both the geographical and network connectivity perspectives. CounterPath is looking at additional DNS capabilities including DNS monitoring and failover features to protect the uptime and reachability of our services for our customers.
Hopefully this helps our customers when they are thinking about rolling out their own SIP and mobility solutions. We are working on a number of other infrastructure areas for hardening our Boston Pop. Hopefully we will find some time soon to tell everyone a bit more.