Notes to Self

Alex Sokolsky's Notes on Computers and Programming

Backing up k8s cluster

Use Velero:

Listing Backups

kubectl get backup --all-namespaces --show-labels=true

On Demand Backup

Identify velera pod:

kubectl get pod -n velero

Start the backup by passing the velero command to the pod:

kubectl exec -it velero-7f8d8d6bf9-86v9c --container velero -n velero \
    -- /velero backup create vault-backup --include-namespaces vault

To see the details of the created backup:

kubectl exec -it velero-7f8d8d6bf9-86v9c --container velero -n velero \
    -- /velero backup describe vault-backup

Backup Verification - Restore Backup to a Different Namespace

To avoid the need to shutdown vault to test the backup, you can verify the backup by restoring it to a different namespace, e.g. vault1.

Before you attempt to restore

A simple attempt to restore into a different namespace will fail, because this new namespace does not have the service account for vault to use.

Begin with creating the new namespace, target for the restore:

kubectl apply -f - <<EOF
apiVersion: v1
kind: Namespace
metadata:
  labels:
    istio.io/dataplane-mode: ambient
    kustomize.toolkit.fluxcd.io/prune: disabled
    name: vault1
  name: vault1
EOF

Prepare trust-policy.json (Attention: cluster-specific, so you will need to edit it!) for the new namespace vault1:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/E7281548B3E48DB5B68C580B64F46EC1"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-east-1.amazonaws.com/id/E7281548B3E48DB5B68C580B64F46EC1:sub": "system:serviceaccount:vault1:vault"
        }
      }
    }
  ]
}

Create a role for the new service account to use and attach the trust policy to it:

aws iam create-role --role-name test-vault1-kms-unseal --assume-role-policy-document file://trust-policy.json

Attach the existing (cluster-specific!) policy to the newly created role:

aws iam attach-role-policy --role-name test-vault1-kms-unseal \
    --policy-arn arn:aws:iam::123456789:policy/test-vault-kms-unseal-20231003173000386500000001

Create a new service account vault in the newly created namespace vault1 and associate it with a newly created role:

kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789:role/test-vault1-kms-unseal
  name: vault
  namespace: vault1
EOF

Create a secret required for vault OIDC login. Pull the secret from: Okta Admin Dashboard -> Applications -> Applications -> Vault - -> General -> Client ID

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: okta-oidc
  namespace: vault1
stringData:
  OKTA_OIDC_CLIENT_SECRET: <VALUE_FROM_OKTA>
EOF

Restore the backup into a new namespace

velero restore create test-restore  --from-backup vault-20250327010027 --namespace-mappings vault:vault1

Verification

There should be no errors in velero logs:

kubectl logs deployment/velero -n velero

or

velero restore logs test-restore

Verify that the namespace vault1 is restored and is active:

kubectl get namespace vault1

Cleanup

Delete the restore:

velero restore delete test-restore

Clean the namespace vault1:

kubectl delete namespaces vault1

Delete the role you created

aws iam detach-role-policy --role-name test-vault1-kms-unseal \
    --policy-arn arn:aws:iam::123456789:policy/test-vault-kms-unseal-20231003173000386500000001
aws iam delete-role --role-name test-vault1-kms-unseal

Restore from a Backup

Follow these steps.

Verify the backup

velero describe backup <name> --details

Prepare for the Vault shutdown

Scale down the Vault configurer:

kubectl -n vault scale deployment vault-configurer --replicas=0

Verify that the scale down actually happened with:

kubectl -n vault get deployment

Expected output:

Delete the Vault instance. This is done to minimize the risk of Vault being in a partially restored state. Vault will be inaccessible (if not already) after running the following commands.

kubectl -n vault delete vault
kubectl -n vault delete pvc
kubectl -n vault delete secret

Finally, wipe the namespace clean:

kubectl delete namespaces vault

Restore from the existing backup

Pass velero restore create --from-backup vault-20240111010024 to the velero pod:

kubectl exec -it velero-6c474ffbb-xg8ht -n velero \
    -- /velero restore create --from-backup vault-20240112010005

Re-initialize vault Namespace

Now that the namespace is deleted, re-initialize it:

flux resume kustomization vault
flux reconcile kustomization vault

Create the vault secret

Create a manual secret required for vault OIDC login.

You can pull the secret from: Okta Admin -> Applications -> Applications -> Vault - -> General -> Client Secret

kubectl apply -f - <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: okta-oidc
  namespace: vault
stringData:
  OKTA_OIDC_CLIENT_SECRET: <VALUE_FROM_OKTA>
EOF

Scale the Vault configurer back to one replica

kubectl -n vault scale deployment vault-configurer --replicas=1

Verify that the changes were successfully made

kubectl -n vault get deployment

If necessary, delete the Vault pods and allow the configurer to restart them

If the Vault statefulset pods are in CrashLoopBackOff after 5-10 minutes of performing the restore operation, then the pods need to be manually restarted.

The Vault statefulset pods normally go through a number of restarts on a clean start-up.

Check that the pods are in a CrashLoopBackOff state:

kubectl -n vault get pod -o wide

Delete the pods to restart them:

kubectl delete pod -n vault

Finally, verify that all the pods are in a Running state.

kubectl get pod -n vault

Expected output:

NAME                                    READY   STATUS    RESTARTS      AGE
vault-0                                 4/4     Running   0             19m
vault-1                                 4/4     Running   0             17m
vault-2                                 4/4     Running   1 (18m ago)   18m
vault-configurer-7f7b996c95-h4lxz       1/1     Running   0             28m
vault-csi-provider-csi-provider-k2l87   2/2     Running   5 (25m ago)   28m
vault-csi-provider-csi-provider-mg2ll   2/2     Running   5 (25m ago)   28m
vault-csi-provider-csi-provider-nf7m7   2/2     Running   0             26m