Using cert-manager with Azure AKS and AGIC

Obtain TLS certificate with cert-manager on AKS, Azure App. Gateway Ingress, ClusterIssuer, http01 solver, multiple hosts and non-default namespace

·

8 min read

Motivation

There are many guides on how to use cert-manager, starting with cert-manager documentation. However, I haven't found a clear tutorial for my case, where I wanted to deploy cert-manager on AKS, with the following characteristics:

  • Service pods (- hosts) are deployed on non-default namespace

  • Using Azure Application Gateway Ingress controller (AGIC)

  • Using http01 solver (- instead of dns01)

  • Using ClusterIssuer instead of regular (namespaced) Issuer

  • Getting TLS certificate for multiple subdomains.

There's a somewhat-outdated tutorial from Microsoft, a more-update one here, and a third one here, as well as few articles here and there, but they don't tick all the requirements mentioned above or are outdated.

As a result, I decided to write down my experience with deploying it successfully.

Prerequisites

It is assumed that you are using Linux or Mac (not a big deal if not, I'm using Bash scripts below that can easily be converted to Batch scripts or used as yaml files with kubectl), and have:

  • Deployed your services/web apps on AKS, on non-default namespace, e.g. prod;

  • Bought a domain name, and configured it to point to the Application Gateway's public Frontend IP with an A Record, or to its DNS name using a CNAME Record. In my case, I've point two subdomains to the Application Gateway's DNS name.

  • Configured AGIC as your ingress controller to control and enable access to your services with http using AGIC, and verified successful access to them with a K8s Ingress object with proper rules. In my case, the Ingress forwards each subdomain to a different container installed with separate deployments on AKS.

  • Azure CLI installed;

  • Your kubeconfig available;

  • kubectl installed;

  • Helm installed, and used for deployment of your cluster;

Note: You might be a console guy who's familiar with executing kubectl commands, but otherwise I'd recommend using Lens to look around your K8s cluster easily.

Let's start...

Install cert-manager

There are few ways to install cert-manager on AKS using Helm. I've used the following script, and named it cert-manager.sh (remember to grant it execution permissions):

#!/bin/bash

# Add the Jetstack Helm repository
helm repo add jetstack https://charts.jetstack.io

# Update your local Helm chart repository cache
helm repo update

# Install the cert-manager Helm chart
# Helm v3+
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.10.0 \
  --set installCRDs=true \
  --set clusterResourceNamespace=prod

# https://cert-manager.io/docs/installation/helm/#installing-with-helm

Note the last --set: it tells cert-manager to create its secrets etc. in prod namespace. Otherwise, they'll default to cert-manager namespace. See here. Creating the secret in your services' namespace solves issues where the secret can't be accessed by your resources. To use the script, execute: ./cert-manager.sh. After the successful installation of cert-manager, it's time to create ClusterIssuer resource.

Create ClusterIssuer

ClusterIssuer - or regular Issuer - are resources that represent certificate authorities (CAs) able to sign certificates in response to certificate signing requests. The difference between ClusterIssuer and regular Issuer is explained here. Again I created the ClusterIssuer using Bash script, named clusterIssuer.sh:

#!/bin/bash
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:

    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: <set your email here>

    # ACME server URL for Let’s Encrypt’s staging environment.
    # The staging environment will not issue trusted certificates but is
    # used to ensure that the verification process is working properly
    # before moving to production 
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # After verifying with the staging environment the ability to properly
    # get certificates, you can use the production env. with the following URL:
    # server: https://acme-v02.api.letsencrypt.org/directory

    privateKeySecretRef:
      # Secret resource used to store the account's private key. 
      name: letsencrypt-staging

    # Enable the HTTP-01 challenge provider
    # you prove ownership of a domain by ensuring that a particular
    # file is present at the domain
    solvers:
    - http01:
        ingress:
            class: azure/application-gateway
EOF

As mentioned in the file, this ClusterIssuer is set to work with Lets Encrypt's staging environment. The staging environment issues a non-official certificate, which your browser will eventually warn you about. But since the production environment of Lets Encrypt enables only a limited amount of signing requests, it's better to start with the staging environment, and switch to production after verifying the ability to properly get a staging certificate on your hosts.

Create a secret to be filled by cert-manager

Create an empty TLS secret with this secret.yaml file:

apiVersion: v1
kind: Secret
metadata:
  name: my-services-tls
type: kubernetes.io/tls
stringData:
  tls.key: ""
  tls.crt: ""

and apply it to your namespace: kubectl apply -f secret.yaml -n prod Note: This secret can be part of your services Helm chart, and be installed earlier. That's why it doesn't appear here as Bash script. Eventually, cert-manager will fill the tls.key and tls.crt values of this secret.

Update your Ingress

My initial Ingress looks something like this:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: prod
  annotations:
    kubernetes.io/ingress.class: azure/application-gateway
    appgw.ingress.kubernetes.io/backend-path-prefix: "/"
spec:
  rules:
  - host: service-one.mydomain.xyz
    http:
      paths:
      - path: "/"
        backend:
          service:
            name: service-one-image
            port:
              number: 80
        pathType: Exact
  - host: service-two.mydomain.xyz
    http:
      paths:
      -  path: "/"
        backend:
          service:
            name: service-two-image
            port:
              number: 80
        pathType: Exact

And now it should be updated to look like this:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: prod
  annotations:
    kubernetes.io/ingress.class: azure/application-gateway
    appgw.ingress.kubernetes.io/backend-path-prefix: "/"
    appgw.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: letsencrypt-staging
spec:
  tls:
  - hosts:
    - service-one.mydomain.xyz
    - service-two.mydomain.xyz    
    secretName: my-services-tls
  rules:
  - host: service-one.mydomain.xyz 
    http:
      paths:
      - path: /
        backend:
          service:
            name: service-one-image
            port:
              number: 80
        pathType: Exact
  - host: service-two.mydomain.xyz
    http:
      paths:
      - path: /
        backend:
          service:
            name: service-two-image
            port:
              number: 80
        pathType: Exact

The main additions are:

  1. Referencing the ClusterIssuer in the annotations section

  2. Adding TLS block, with the hosts (- my subdomains) and secret.

Note that I've set only one secret for both hosts. The generated certificate will be issued on the name of the first host, but combine the second host in it in theCertificate Subject Alternative Name property of the certificate, so that it'll take care of both hosts. Setting multiple secrets has proven problematic for me.

Upgrade your Ingress using Helm:
helm upgrade my-chart ./my-chart -n prod
or, if you don't use Helm:
kubectl apply -f my-ingress.yaml -n prod

Verify results

This triggers a chain of reactions, which is explained in cert-manager's troubleshooting guide. The methods mentioned there to troubleshoot the resources using kubectl describe helped me a lot.

If you're using Lens, you'll be able to notice a new Ingress being created, as well as new Certificate, CertificateRequest (CRDs of cert-manager.io), and new Challenge and Order (CRDs of acme.cert-manager.io).
After accepting the challenge, the new Ingress will disappear, and so will the Challenge.
It is expected that the Certificate will have status of Ready: True, and so should the ClusterIssuer. The Order should be in valid state.
And most importantly: the my-services-tls secret should get its tls.key and tls.crt values filled by cert-manager.
In Azure -> Application Gateway, you can expect to see 80 and 443 Listeners. If all set correctly, you should be able to navigate to your services using your host URLs and see them with (invalid - as it's staging only) TLS certificate, after ignoring the browser's warnings.

Inspect the certificates. Verify that both your hosts appear in the Certificate Subject Alternative Name of the certificate.
If everything is OK, it's time to mode to production environment of Lets Encrypt.

Move to Production Env. of Lets Encryprt

1. Update your Ingress

Change the line:
cert-manager.io/cluster-issuer: letsencrypt-staging
to:
cert-manager.io/cluster-issuer: letsencrypt-prod

2. Replace ClusterIssuer

Delete your ClusterIssuer, and create a new one, with:

metadata:
  name: letsencrypt-prod
...
    server: https://acme-v02.api.letsencrypt.org/directory  
...
    privateKeySecretRef:
      # Secret resource used to store the account's private key. 
      name: letsencrypt-prod

Full updated Ingress:

#!/bin/bash
kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:

    # You must replace this email address with your own.
    # Let's Encrypt will use this to contact you about expiring
    # certificates, and issues related to your account.
    email: <set your email here>

    # ACME server URL for Let’s Encrypt’s staging environment.
    # The staging environment will not issue trusted certificates but is
    # used to ensure that the verification process is working properly
    # before moving to production 
    # server: https://acme-staging-v02.api.letsencrypt.org/directory
    # After verifying with the staging environment the ability to properly
    # get certificates, you can use the production env. with the following URL:
    server: https://acme-v02.api.letsencrypt.org/directory

    privateKeySecretRef:
      # Secret resource used to store the account's private key. 
      name: letsencrypt-prod

    # Enable the HTTP-01 challenge provider
    # you prove ownership of a domain by ensuring that a particular
    # file is present at the domain
    solvers:
    - http01:
        ingress:
            class: azure/application-gateway
EOF

You can delete the letsencrypt-staging secret from your namespace.
New secret named letsencrypt-prod will be created after successfully moving to production environment.

Note: If you encounter issues with getting rid of the remnants of the staging environment, just remove cert-manager and all its resources, as explained here.
Also remove and re-create the my-services-tls secret, and remove letsencrypt-staging secret from your namespace.
Then re-install it as mentioned above, just this time configure the ClusterIssuer and Ingress to work with the production environment from the get-go.

Helpful resources

The following resources helped me during this setup:

  • This answer helped me get rid of stuck challenge, which got stuck due to an incorrect setup;

  • As mentioned above, cert-manager troubleshooting is a great source to understand what's going on;

  • In general, cert-manager documentation should be read.