Troubleshooting with Crossplane and FluxCD… could be better

I’m gaining valuable experience with Crossplane and FluxCD at my current job. While the GitOps approach promises seamless operations, my journey has been a mix of discovery and challenges. If you are considering adopting these tools I would be hesitant to recommend it because I find they have been error prone and difficult to troubleshoot. While I appreciate the innovations, there’s a part of me that still misses the (relative) simplicity of Terraform.

Obscure Errors

Well, for starters, I’ve posted on several of these already:

https://faulttolerant.work/2023/08/15/flux-helm-upgrade-failed-another-operation-install-upgrade-rollback-is-in-progress/

https://faulttolerant.work/2023/08/15/flux-error-timed-out-waiting-for-the-condition/

https://faulttolerant.work/2023/07/11/flux-kustomization-wont-sync-but-theres-no-error-message/

https://faulttolerant.work/2023/06/16/flux-reconcile-helmrelease-doesnt-do-what-it-sounds-like/

Dependency Resolution Creates Duplicates (and more obscure errors)

This is at least the second time in a couple months I’ve run into this. Crossplane Upbound Provider B depends on Crossplane Upbound Provider A so both Providers are created. Then if I also have an explicit creation of Crossplane Upbound Provider A that happens afterward, I get an error (at least if the name I gave it was different than the auto-generated name).

I think that was what caused the two errors below. It was not clear to me at the time that these meant a duplicate Crossplane Provider.

#On the ProviderWarning  
InstallPackageRevision  33m (x6 over 39m)     packages/provider.pkg.crossplane.io  cannot apply package revision: cannot patch object: Operation cannot be fulfilled on providerrevisions.pkg.crossplane.io "upbound-provider-azure-managedidentity-xxxx": the object has been modified; please apply your changes to the latest version and try again

#On the ProviderRevisionWarning  
ResolveDependencies  2m1s (x31 over 27m)  packages/providerrevision.pkg.crossplane.io  cannot resolve package dependencies: node already exists

Why Did The Finalizer Block Deletion?

Crossplane managed objects have a Finalizer, which at times has been a hurdle when trying to delete a resource. Generally, there is a reason for adding a finalizer into the code, so you should always investigate before manually deleting it.

So how does one investigate the finalizer? By looking at the documentation/code/logs for that crossplane managed resource. According to https://docs.crossplane.io/latest/concepts/managed-resources/#finalizers:

When Crossplane deletes a managed resource the Provider begins deleting the external resource, but the managed resource remains until the external resource is fully deleted.

When the external resource is fully deleted Crossplane removes the Finalizer and deletes the managed resource object.

You should track down the external resource and see why that is failing to delete. But it gets very tempting to just remove the finalizer.

Logging

Logs can be invaluable when troubleshooting, but with Crossplane and FluxCD, it’s often perplexing to determine which log to consult. Is it the Crossplane pod logs, Provider Logs, or one of the many Flux logs? Crossplane and Flux both provide guides, but a consolidated guide might be a topic for another day.

How do you decide which log you consult first? Share your experiences in the comments below.