Cert Manager, NAT Loopback, and CoreDNS

I recently ran into an interesting issue with my home Kubernetes environment that runs my blog. As I mentioned in a previous post, I run my blog on k3s and I use cert-manager to manage my SSL certificates provided by Let’s Encrypt. Let’s say that I’ve temporarily changed my Internet provider and along with it, my router. This router does not appear to support NAT Loopback. The cert-manager documentation acknowledges the issue but doesn’t provide much of a solution. Cert-manager couldn’t renew my blog’s certificate because its self-check kept failing. I managed to solve the issue through a fairly simple CoreDNS change. Let’s take a look.

It took me a little while to figure out what the issue was. I saw the certificate was pending and looking at the Challenge object, I saw this:

$ kubectl -n blog describe challenges
...
Status:
  Presented: true
  Processing: true
  Reason: Waiting for http-01 challenge propagation: failed to perform self check GET request 'http://therubyist.org/.well-known/acme-challenge/... connect: connection timed out
  State: pending

I could curl the URL in question from an external machine, so I know it was up and responding all the way through to the cert-manager ACME responder Ingress and Pod. From the host Linux server, curl also worked. I narrowed down the problem by trying curl from within the blog’s Pod, which had the same timeout problem. After some searching, I managed to locate the cert-manager documentation about issues with external load balancers. While it wasn’t an exact match, it helped me hone in on the actual issue. I went back to trying the Linux host, but this time adding a -v to the curl command. It was connecting to the Linux server’s private IP address, while the Challenge timeout indicated it was trying to connect to the public IP.

I tried a curl to the public IP (the same IP the Challenge was trying to self-check) and it had the same timeout problem. This was helpful because now I could test the problem independently of Kubernetes. The reason it worked for the actual name (rather than the IP) is because of my /etc/hosts file entry for the domain.

Potential Solutions

My first thought was to disable the self-check. Surely this isn’t the only reason that someone would want to disable this self-check, but this seems like as good a reason as any. Sadly, despite many people suggesting the option on similar issues, the maintainers of cert-manager really like self-checks. Short of forking cert-manager, I couldn’t find a way to disable self-checks.

Other people with the issue went as far as creating specialized proxies to work around it. I wasn’t keen on the idea of running another service to solve this problem.

Another solution I saw someone use involved modifying the ingress resource to trick Kubernetes into DNATing traffic to the external IP. This seemed like too much of a hack to me, plus it required me to adjust it whenever the ISP decided to give me a new IP.

I saw people mentioning CoreDNS and how it could be made to rewrite lookups. This idea seemed like it would work. If I rewrote the DNS queries so they pointed to my Ingress controller’s Service, it should do the trick. Note that messing around in the kube-system namespace tends to be a bad idea. I knew what I was doing and this is a pretty benign change, but you’ve been warned. Here’s what I changed:

kubectl -n kube-system edit configmap/coredns

I edited the Corefile: section to add rewrites for my domains. I found the line that said health and added this below it:

  rewrite name therubyist.org traefik.kube-system.svc.cluster.local

Once I added lines like that for all the domains I needed, I saved the file then killed the CoreDNS Pod.

$ kubectl -n kube-system delete pods/coredns-66c464876b-lflpg

This triggered a new Pod to be created and the cluster’s DNS now pointed therubyist.org to the right place. I tested my curl from within the blog’s Pod and it worked! By the time I checked on the Challenge resource again, it had already completed successfully. My new cert was out there and usable:

$ kubectl -n blog get certificates
NAME                  READY   SECRET                AGE
blog-gnagy-info-tls   True    blog-gnagy-info-tls   15m

Additional Notes

While this works, whenever k3s is restarted (e.g., the server restarts or I upgrade k3s), this ConfigMap is reset to the original contents (per this GitHub issue). I haven’t yet found the perfect workaround for this, but I’ll update the post when I find something I like.

Spread the love

3 responses

Daniel Schaaff

2021-03-01

Another option is to use dns validation instead of http for cert manager.

Loading…
1. Jonathan Gnagy
  
  2021-03-01
  
  Absolutely, though I currently have my DNS for my home stuff (including this blog) through GoDaddy… so no easy integration to let cert-manager do its thing for DNS. Unless you know of a way to that, in which case I’d gladly switch. I’ve considered transferring my domains somewhere else like Route53 to make it easier but I never get around to it.
  
  Loading…
JA

2021-03-05

Nice article ! Good job !

Loading…

Potential Solutions

Additional Notes

Like this:

3 responses