In the spirit of https://jvns.ca/blog/2023/10/06/new-talk–making-hard-things-easy/ and Matt Surabian’s DevOps Days Boston talk “Teaching and Learning When Words No Good Sense Make” – why is Kubernetes so hard?
In a word, complexity, but let’s break that out a couple dimensions of that complexity
Overloading of terms
Overloading is when a single term can have multiple meanings. For example, What The Heck Is Ingress? <— a whole episode of the Kubernetes Unpacked podcast devoted to that question
Very long, very nested resources defined in yaml
We have a single Kubernetes Composition in our codebase that is 529 lines long and has 5 levels of nesting. So the level of Cognitive Complexity is high, putting a strain on working memory.
Distributed system
Kubernetes is a distributed system because it spreads workloads, storage, and operations across multiple machines, coordinating them to appear as one cohesive system from the user’s perspective. This distributed nature allows Kubernetes to provide its core benefits of scalability, resilience, and flexibility. However,
Developing distributed utility computing services, such as reliable long-distance telephone networks, or Amazon Web Services (AWS) services, is hard. Distributed computing is also weirder and less intuitive than other forms of computing because of two interrelated problems. Independent failures and nondeterminism cause the most impactful issues in distributed systems. In addition to the typical computing failures most engineers are used to, failures in distributed systems can occur in many other ways. What’s worse, it’s impossible always to know whether something failed.
https://aws.amazon.com/builders-library/challenges-with-distributed-systems/
Declarative syntax
- Indirectness of Action:
- Unlike imperative languages where code flows sequentially, in Kubernetes, you specify a desired state. The system’s controllers then work to realize it. This indirect approach can make it difficult to pinpoint problems since you’re not commanding each step but relying on Kubernetes’ logic to interpret and act on your intent.
- Asynchronous Behavior:
- While imperative code executes in a predictable sequence, Kubernetes’ actions often run asynchronously. After declaring a desired state, the system might not immediately reflect that outcome. And it can be hard to tell if something has failed or whether it’s still processing.
- Error Reporting:
- In imperative programming, errors are usually tied to a specific action or code line. In Kubernetes, however, errors may be symptomatic of deeper issues. For example, a pod in a “CrashLoopBackOff” state might be due to various reasons, demanding a more in-depth examination of logs, events, and configurations to trace the root cause.
How do we make it easier?
There’s no one simple answer to this. But to start, I recommend:
- COMMENT your Kubernetes!
- learn the fundamentals: https://jvns.ca/blog/2017/06/04/learning-about-kubernetes/
- When you get discouraged, take a break but don’t give up. You’re not alone in your struggles!