Flux error: “timed out waiting for the condition”

Flux has the unfortunate habit of having a helmrelease fail with the error: “timed out waiting for the condition” without providing any details about what condition it was waiting for.

Here’s a couple things to try:

  • Look for unhealthy pods and check their logs – somewhat frequently this works, however not always because it might be waiting for a resource other than the pods or it might be timing out on deleting something
  • Check flux logs for errors. There should be info here but I haven’t necessarily found them that useful for troubleshooting. Don’t get too caught up in an error that could be a red herring flux logs --all-namespaces --level=error
  • Delete the helmrelease. This is what I had to do today. I had removed several Kustomizations that should have led to the helmrelease being deleted but for some reason flux was timing out on that. When I deleted the helmrelease using kubectl delete though, it remained gone and seems to have cleaned up the resources it had been using
  • Disable wait on the helmrelease – you can disable the health checks to view the failing resources. This is recommended on https://fluxcd.io/flux/cheatsheets/troubleshooting/ but I would only use it as a last resort. The health check is usually there for a reason so removing it could lead to more messiness.
# USE WITH CAUTION, may lead to instability
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
 name: podinfo
 namespace: default
spec:
 install:
   disableWait: true
 upgrade:
   disableWait: true

flux reconcile helmrelease doesn’t do what it sounds like

I was faced with a failed Helm release.

emilyzall@Emilys-MBP datadog-agent % helm history datadog-agent -n datadog
REVISION	UPDATED                 	STATUS  	CHART         	APP VERSION	DESCRIPTION
1       	Thu May 25 09:45:40 2023	deployed	datadog-3.25.1	7          	Install complete
2       	Tue Jun 13 09:10:18 2023	failed  	datadog-3.25.1	7          	Upgrade "datadog-agent" failed: timed out waiting for the condition
3       	Tue Jun 13 09:30:23 2023	failed  	datadog-3.25.1	7          	Release "datadog-agent" failed: timed out waiting for the condition

I want to see if this error is still happening. Being new to Flux CD at first I thought that it might retry this failed Helm release automatically as part of the sync. This didn’t seem to be the case though based on the last date in the history above.

Maybe I could force it to retry the Helm Release using the flux reconcile command. It sounded good at the time!

emilyzall@Emilys-MBP datadog-agent % flux reconcile
The reconcile sub-commands trigger a reconciliation of sources and resources.

So…

emilyzall@Emilys-MBP datadog-agent % flux reconcile helmrelease datadog-agent -n datadog --verbose
► annotating HelmRelease datadog-agent in datadog namespace
✔ HelmRelease annotated
◎ waiting for HelmRelease reconciliation
✗ HelmRelease reconciliation failed: Helm rollback failed: release datadog-agent failed: timed out waiting for the condition

Oh, I guess it’s still failing. But wait why is it timing out within seconds, that seems mighty quick. I found that the default timeout is 5 minutes and my HelmRelease resource was configured with a 20 minute time out. So (???). I searched for something like “flux reconcile helmrelease” and found https://stackoverflow.com/questions/65677606/is-there-a-way-to-manually-retry-a-helmrelease-for-fluxcd-helmoperator.

There is a way to manually retry a Helm Release with Flux but it is not flux reconcile helmrelease. Actually you have to do flux suspend then flux resume.

emilyzall@Emilys-MBP datadog-agent % flux suspend helmrelease datadog-agent -n datadog --verbose
► suspending helmrelease datadog-agent in datadog namespace
✔ helmrelease suspended
emilyzall@Emilys-MBP datadog-agent % flux resume helmrelease datadog-agent -n datadog --verbose
► resuming helmrelease datadog-agent in datadog namespace
✔ helmrelease resumed
◎ waiting for HelmRelease reconciliation
✔ HelmRelease reconciliation completed
✔ applied revision 3.25.1

Core Concepts for Code Review

Throughout my career, I’ve had the privilege of having my code reviewed by many talented developers, which has helped me gain a deeper understanding of code review. To further refine my skills, I also delved into various books and articles on the subject. Based on my experiences as both a reviewer and a reviewee, as well as my research, I’ve identified several core concepts that I believe are essential for effective code review. In this post, I’ll share my insights and personal take on these concepts.

Understand the problem being solved

Before starting the review, refer to any documentation you have about what the code is supposed to do and make sure it’s clear. Having precise requirements is crucial for assessing whether the code performs its intended function.

Start with what you know

If you have a Merge Request with numerous modified files, examine if there are any files that are uncomplicated to review, and start with those. Gitlab incorporates checkboxes into the interface that enable you to indicate the files you have already assessed, and other tools may have comparable features.

Go in chronological order

Starting from the chronological beginning of the code can help you comprehend the sequence of events. However, in configuration code such as Terraform, this may not be as straightforward.

Make sure the code is doing what it says it does

Ensure that the code performs the tasks that its names imply. Are the classes, functions, and variables functioning as intended? Does the overall code align with the specifications, designs, or acceptance criteria? If you are uncertain about what the code is doing or if it adheres to the standards, do not hesitate to ask for clarification. If the code is perplexing to you, it is probable that others will also find it baffling.

Could these changes break existing functionality?

Where does this code get invoked or how is this infrastructure used? Could these changes have adverse affects?

What will happen for edge cases?

For example, if a function accepts a null value, will it handle that properly?

Try it out if you can

Running code locally while reviewing a Merge Request offers multiple advantages. It helps to detect conflicts with the existing codebase, identify potential issues, assess the code’s functionality, and provide more comprehensive feedback to the developer. Additionally, testing code locally can reveal performance concerns or resource utilization issues, ensuring that the code meets the required quality standards before merging it into the main branch.

Build a custom code review checklist

Creating a custom code review checklist can significantly improve the code review process’s efficiency and effectiveness. Here are some steps to build a custom code review checklist:

  1. Analyze previous code reviews or production issues to identify common patterns or recurring issues.
  2. Prioritize the most critical issues or those that have had the most significant impact on the system or product.
  3. Develop a checklist that includes these issues or patterns, and any specific areas of concern for the project, team, or technology stack.
  4. Refine the checklist based on feedback from the team, incorporating any additional items or removing any that are redundant or not applicable.
  5. Continuously iterate and improve the checklist as new issues emerge, and the team gains experience and expertise.

By following these steps, you can create a concise, focused, and practical checklist that addresses the most critical issues and areas of concern for your project, team, or technology stack.

Take the time to read and understand every line

Devote sufficient time to scrutinize and comprehend each line of code. While a high-quality code review can be time-consuming, the outcome is more maintainable and less error-prone code. If you encounter a statement that is unclear, research it, and then inquire with the developer for clarification if it remains confusing.

Keep calm and stay fault tolerant

Hi there! My name is Emily and I am excited to share my experiences and thoughts on the world of DevOps and tech through this blog. With over a decade of experience in the tech industry and specifically in DevOps since 2018, I have a lot to share about this ever-evolving field. When I’m not working, I love to singing Barbershop, playing complicated board games and all things cats. I have two cats (Puppy and Cation), a foster cat, and I also volunteer doing Trap Neuter Return. I live in Rhode Island. I’m the organizer of https://www.meetup.com/rhode-island-codes/ and I’m excited to connect with readers from all over and build a community centered around all things DevOps.