ChatGPT: do not trust but do verify

Posted by Emily Zall

As a DevOps Engineer I know not to trust ChatGPT, always fact check and make sure it makes sense. But with that major caveat I still find it a very helpful tool in many scenarios.

Here’s some things I like to ask it to do:

Explain a concept
Give me troubleshooting ideas
Summarize a long document (especially yaml/kubernetes
Answer a question I have about a specific document. Including, but not limited to, you can give it a long yaml file and ask it to tell you what a certain field is nested under. But now that I have a yaml plugin for my IDE, that’s an easier way to see that. I may write a post on Jetbrains plugins too.
Evaluate my work to provide any suggestions
Improve my documentation
Identify issues with Kubernetes resources and yaml files. It doesn’t always work but you can copy it in and ask what’s wrong with it. It’s the most intelligent diff tool I’ve seen, it will compare your resource to known examples and attempt to not just show you every difference but to use context to see which of the differences are relevant. You can give it a specific document to refer to or let it use everything in its corpus.
Variable naming. If it’s a challenging one, you can explain what the variable does and get suggestions. I found the suggested names were descriptive yet fairly concise and I like how it explained the reasoning behind them.
Ask it for reassurance when you’re feeling discouraged! ^_^
Just for fun. Behold the Dapr Kafka ^

“the object has been modified; please apply your changes to the latest version and try again”

Posted by Emily Zall

I don’t know why this error sometimes comes up on a resource managed by FluxCD but it’s useful to know that it seems to usually resolve itself with time.

Edited to add: per https://github.com/crossplane/crossplane/issues/2114 this error is “almost never a cause for concern” and “frequently a red herring”.

Troubleshooting with Crossplane and FluxCD… could be better

Posted by Emily Zall

I’m gaining valuable experience with Crossplane and FluxCD at my current job. While the GitOps approach promises seamless operations, my journey has been a mix of discovery and challenges. If you are considering adopting these tools I would be hesitant to recommend it because I find they have been error prone and difficult to troubleshoot. While I appreciate the innovations, there’s a part of me that still misses the (relative) simplicity of Terraform.

Obscure Errors

Well, for starters, I’ve posted on several of these already:

https://faulttolerant.work/2023/08/15/flux-helm-upgrade-failed-another-operation-install-upgrade-rollback-is-in-progress/

https://faulttolerant.work/2023/08/15/flux-error-timed-out-waiting-for-the-condition/

https://faulttolerant.work/2023/07/11/flux-kustomization-wont-sync-but-theres-no-error-message/

https://faulttolerant.work/2023/06/16/flux-reconcile-helmrelease-doesnt-do-what-it-sounds-like/

Dependency Resolution Creates Duplicates (and more obscure errors)

This is at least the second time in a couple months I’ve run into this. Crossplane Upbound Provider B depends on Crossplane Upbound Provider A so both Providers are created. Then if I also have an explicit creation of Crossplane Upbound Provider A that happens afterward, I get an error (at least if the name I gave it was different than the auto-generated name).

I think that was what caused the two errors below. It was not clear to me at the time that these meant a duplicate Crossplane Provider.

#On the ProviderWarning  
InstallPackageRevision  33m (x6 over 39m)     packages/provider.pkg.crossplane.io  cannot apply package revision: cannot patch object: Operation cannot be fulfilled on providerrevisions.pkg.crossplane.io "upbound-provider-azure-managedidentity-xxxx": the object has been modified; please apply your changes to the latest version and try again

#On the ProviderRevisionWarning  
ResolveDependencies  2m1s (x31 over 27m)  packages/providerrevision.pkg.crossplane.io  cannot resolve package dependencies: node already exists

Why Did The Finalizer Block Deletion?

Crossplane managed objects have a Finalizer, which at times has been a hurdle when trying to delete a resource. Generally, there is a reason for adding a finalizer into the code, so you should always investigate before manually deleting it.

So how does one investigate the finalizer? By looking at the documentation/code/logs for that crossplane managed resource. According to https://docs.crossplane.io/latest/concepts/managed-resources/#finalizers:

When Crossplane deletes a managed resource the Provider begins deleting the external resource, but the managed resource remains until the external resource is fully deleted.

When the external resource is fully deleted Crossplane removes the Finalizer and deletes the managed resource object.

You should track down the external resource and see why that is failing to delete. But it gets very tempting to just remove the finalizer.

Logging

Logs can be invaluable when troubleshooting, but with Crossplane and FluxCD, it’s often perplexing to determine which log to consult. Is it the Crossplane pod logs, Provider Logs, or one of the many Flux logs? Crossplane and Flux both provide guides, but a consolidated guide might be a topic for another day.

How do you decide which log you consult first? Share your experiences in the comments below.

Flux: Helm upgrade failed: another operation (install/upgrade/rollback) is in progress

Posted by Emily Zall

Delete the secret sh.helm.release.v1.<release-name>.<version> and retrigger the reconciliation loop (flux suspend then flux resume on the helmrelease in question).

It may also be useful to know that helm ls will not show any reconciling or failed helmreleases if you want to see them all, you need to run helm ls --all

Flux Kustomization won’t sync but there’s no error message

Posted by Emily Zall

I added a resource and it’s in a file that is listed in my Flux Kustomization. The gitrepository has synced but my change is not showing up. I did kubectl describe to see the events on the Kustomization but I don’t see any messages about my resource. What’s wrong?

Flux processes changes in a batch and must have a successful dry-run before applying changes. This means that any Warnings on the Kustomization will prevent changes from syncing. So run kubectl describe and look for any other warnings on the Kustomization and resolve them, even if they are on a different resource than the one you are concerned with.

CLI command to show status of an AKS upgrade

Posted by Emily Zall

The other day I was monitoring an AKS Kubernetes version upgrade but the notifications Azure Portal had stopped updating. I found out that I can check the status from the command line.

emilyzall@Emilys-MBP ~ % az aks show -g my-rg -n my-cluster --query 'provisioningState'
"Upgrading"
emilyzall@Emilys-MBP ~ % az aks show -g my-rg -n my-cluster --query 'provisioningState'
"Succeeded"

“|” and “|-” in YAML and Kubernetes

Posted by Emily Zall

How this came up

I was looking into an issue with a Flux Kustomization patch and I noticed that in the example given, one of the patches started with “|” and the other started with “|-“

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
  name: podinfo
  namespace: flux-system
spec:
  # ...omitted for brevity
  patches:
    - patch: |-
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: not-used
        spec:
          template:
            metadata:
              annotations:
                cluster-autoscaler.kubernetes.io/safe-to-evict: "true"        
      target:
        kind: Deployment
        labelSelector: "app.kubernetes.io/part-of=my-app"
    - patch: |
        - op: add
          path: /spec/template/spec/securityContext
          value:
            runAsUser: 10000
            fsGroup: 1337
        - op: add
          path: /spec/template/spec/containers/0/securityContext
          value:
            readOnlyRootFilesystem: true
            allowPrivilegeEscalation: false
            runAsNonRoot: true
            capabilities:
              drop:
                - ALL        
      target:
        kind: Deployment
        name: podinfo
        namespace: apps

What are “|” and “|-” in this context?

These are YAML syntax components.

A patch field in a Kustomization expects a YAML or JSON formatted string. You could write it as a single line string with new-line indicators and trailing spaces for indentation but you usually wouldn’t because it is awkward to read.

patch: "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: not-used\nspec:\n  template:\n    metadata:\n      annotations:\n        cluster-autoscaler.kubernetes.io/safe-to-evict: \"true\""

You would rather write this as multiple lines with the proper indentation. However:

# NO, the value for patch is YAML but not a string
    - patch:
        - op: add
          path: /spec/template/spec/securityContext
          value:
            runAsUser: 10000
            fsGroup: 1337

# NO, this does not preserve newlines and is not recommended
    - patch:
        "- op: add
          path: /spec/template/spec/securityContext
          value:
            runAsUser: 10000
            fsGroup: 1337"

So how do I write a valid YAML string across multiple lines? Use the Literal Block Style Indicator: |

This is how you indicate in YAML that everything nested below this line should be interpreted as a multiline string with internal newlines and indentation preserved.

What about the minus sign then?

It only will matter if there are trailing newlines.

Block Chomping Indicators: The chomping indicator in YAML determines what should be done with trailing newlines in a block scalar. It can be one of three values:

No indicator: This means that trailing newlines will be included in the value, but a single final newline will be excluded.
The ‘+’ indicator: This means all trailing newlines will be included in the value.
The ‘-‘ indicator: This means that all trailing newlines will be excluded from the value.

How do I tell if there are trailing newlines?

You can use cat -e <filename> to see EOL (end-of-line) characters. This displays Unix line endings (\n or LF) as $ and Windows line endings (\r\n or CRLF) as ^M$

Do trailing newlines matter?

It really depends. I suggest adhering closely to the style used in the documentation you are referring to in order to be on the safe side.

Fault Tolerant

Faults happen – build resiliently

Kubernetes

ChatGPT: do not trust but do verify

“the object has been modified; please apply your changes to the latest version and try again”

Troubleshooting with Crossplane and FluxCD… could be better

Obscure Errors

Dependency Resolution Creates Duplicates (and more obscure errors)

Why Did The Finalizer Block Deletion?

Logging

Flux: Helm upgrade failed: another operation (install/upgrade/rollback) is in progress

Flux Kustomization won’t sync but there’s no error message

CLI command to show status of an AKS upgrade

“|” and “|-” in YAML and Kubernetes

How this came up

What are “|” and “|-” in this context?

What about the minus sign then?

How do I tell if there are trailing newlines?

Do trailing newlines matter?