How To Gain Back Your Velocity When Working With Kubernetes
Speaker: Lukas Gentele
Summary
Lukas Gentele, CEO of Loft Labs, discusses the challenges engineers face with Kubernetes and how it slows them down. He emphasizes the importance of virtual clusters and their role in enhancing Kubernetes' efficiency. Lukas highlights the success of Loft Labs' open-source projects, particularly vCluster, which has seen significant adoption. He contrasts the simplicity of past deployment methods, like Ruby on Rails with Heroku, to the complexity of Kubernetes. Lukas also touches upon the cyclical nature of Platform as a Service (PaaS) adoption, where initial enthusiasm often gives way to frustration due to its limitations. He concludes by advocating for transparent abstractions in Kubernetes and demonstrates the capabilities of DevSpace, a tool developed by Loft Labs.
Transcription
Hello everyone. Welcome to my 30-minute rant about why Kubernetes is slowing engineers down. Hopefully, you'll get some hints on what we can do about it and understand the root of what is actually slowing folks down.
I'm Lukas, the CEO at Loft Labs. We started the company to bring virtual clusters to the masses. Rich, my colleague, and Carl both gave a talk yesterday about virtual clusters. If you missed that one, definitely make sure you watch the recording. Everything we do on the commercial side is based on our work in the open-source space and the great work that we do with our open-source users and our community.
Two of the projects that have been very popular are the vCluster, which is a foundational technology for creating virtual clusters. We've seen amazing adoption in the community. We launched a project two years ago, and in the first nine months, we saw about a million virtual clusters created. That was an insane number for me at the time. Then, last year, we saw 25 million virtual clusters created. Just last month, we saw five million virtual clusters created, so we're very likely going to cross the 100 million virtual clusters mark sometime this year. It's super exciting.
If you don't know what a virtual cluster is, it allows you to spin up a Kubernetes cluster inside of a namespace of another Kubernetes cluster. It might sound weird, but it really makes things easier in terms of sharing a Kubernetes cluster, multi-tenancy, sharding the Kubernetes API server if you're running it at scale, and it saves a ton of cost. If you're spinning up many Kubernetes clusters, it's preferable to consolidate things into fewer clusters because you can share resources. Just think of things like Istio, storage, monitoring, logging, Opa. All of these are repeated in each cluster. So, having fewer clusters and creating these virtual clusters inside of them is much more efficient. All the virtual clusters can use the underlying platform stack, monitoring, logging, etc. Developers and engineers work in the virtual clusters. It's kind of like a VM; inside, you're the king, and outside, you're restricted. That's what vCluster is about.
DevSpace is another project we started a couple of years beforehand. It's an open-source project and a developer tool for Kubernetes. That's more of where this talk is heading towards.
When we talk about Kubernetes and developer efficiency, it's helpful to go back in time. What did we do before Kubernetes? About 10 to 15 years ago, everyone was on the Ruby on Rails hype train. Companies like GitLab, GitHub, and even those outside of the tech space like Gusto got started with Ruby on Rails deployed on Heroku. It was quick and easy. But the sentiment 10 years later is that monoliths are not efficient. We need microservices, Kubernetes, containers, and to split things up.
When you think about a monolithic Ruby Rails application, it's tightly coupled code. Everything is very closely knit together, leading to dependency hell inside the application. Systems like GitLab did an amazing job keeping their release speed up, but many companies struggle with such a tightly knit codebase. Centralized applications like these may run into scaling concerns. They need distributed applications, database sharding, and more. Upgrades become complex, and releasing new code is challenging because the risk of something going wrong affects the entire system rather than just a part of it.
So, essentially, we're slowing down the release cycle in many cases. Microservices and Kubernetes are for the win, right? We need that. But the problem is it's really slowing us down, and a lot of people are seeing that today. If you're thinking about engineers directly working with Kubernetes, deploying applications to Kubernetes, there's a lot involved.
It essentially starts with images. The first thing you need to do is build images, tag images, push the images to a registry. Now you need credentials for some kind of registry to push them to. Then you deploy. Now you've got to know, besides having a Docker file, you also need a Helm chart, you need a Kustomization. You need to know what a Kubernetes deployment is and what a Kubernetes service is, and you need to wire up everything. Engineers need to know how upgrade is or how to deploy a Kustomization. And then, ultimately, now we want to expose it. It was just like "Heroku open", that was two words to remember. Now we've got to actually find out what we deployed here. We need to get the pod right. I need to find out because it's not like Kubernetes deploys like this API, and I can say, "kubectl logs API". That's just not how it works. Kubernetes doesn't know what this API is. The Helm chart may be called API or Authentication Service or whatever it's called, but it's going to create a deployment, it's going to create a service, StatefulSet, etc., and these are creating ReplicaSets, and then they are creating the actual pods in Kubernetes which then host the containers. So we need to find out what they are called. We can do that with "kubectl get pods", and then we can finally do port forwarding. Another reason I drew this arrow here is, well, we make a change and we want to see if things work now, so we have to do everything all over again. It's a pretty annoying process, and that really slows a lot of folks down.
Of course, you can automate these kinds of things in CI. You can say, "We'll just bake this all into GitHub actions." But then what we see a lot of engineers do is kind of this, I call it the abuse of the CI pipeline. It's kind of like, "I can't test it really locally. I can't run these integration tests locally anymore." With my Heroku Rails application, I can. I can deploy the Heroku myself very quickly from the CLI, I can run tests against it, and I can see what's wrong. If I'm now baking all of this into GitHub actions or GitLab CI, the engineer is going to continuously commit and push. That's a very slow workflow because you also now need to optimize your pipelines. By default, your pipelines are not going to cache anything in between, so it's very slow. And CI minutes are kind of expensive compared to actually building an image on your laptop and running it there. So, really resorting to this workflow is not great. Automating the same thing with CI is not great. There's really not a great solution for this dilemma.
So, how about PaaS? PaaS is the solution for everything. It's been the solution in 2007, and it worked great for these companies. Essentially, people discovered that every about four years or so, it has this interesting cycle. In 2007, everything was about Heroku. That was the developer's favorite. Then Google also launched App Engine. Cloud Foundry came along a couple of years later, trying to standardize things for the enterprise, doing it open source. Then AWS launched Beanstalk as another alternative PaaS solution. Then the container wave started happening, and the first approaches towards running containers were actually PaaS as well.
Companies like DotCloud and Tutum, some of the earliest companies that ended up as the first engineers working at Docker and building those amazing technologies down the road. It's very interesting that we kind of resort to this, "We need to build a platform." As an engineer, we see the problems, so we're like, "We need to build a platform. That's the answer." And then, last but not least, to just add a more Kubernetes native example, a more recent example here as well, Google with Cloud Run essentially launched a PaaS as well that is connected to Google Cloud, going back to a proprietary solution again, building on open source but being container-native, unlike a lot of other solutions. It's very interesting. We see that cadence.
We see a lot of these PaaS coming out. And what people usually see after the initial hype of a PaaS is it's not the silver bullet. One or two years later, everybody is like, "This PaaS is super inflexible." In the beginning, everybody loves it, but then one or two years later, everybody starts to hate that PaaS. Everybody wanted to migrate off Heroku; it was inflexible, it was a lock-in. It's very interesting to see. It's a cycle. I don't know how long it's going to repeat, but we've definitely seen it a couple of times happening at this point. It's very interesting. I think one of the strengths of a PaaS is that it's opinionated. The person or the group or whoever is building that PaaS is coming together to decide, "We have certain ideas and best practices on how to do things. How can we make that accessible to a broader audience?" So it's inherently very opinionated, which is a strength because that's what makes a PaaS so simple and so attractive. But ultimately, you're limiting users' choices.
And that option where you can tweak these things, and then it gets similarly complex again if you do that. If you're even able to do this, a lot of PaaS don't even go that far. It's a pretty big advantage and, at the same time, a challenge for a PaaS to be opinionated. PaaS kind of means magic. I'm running "Heroku push", and I don't know what's going on. It's some kind of magic that is going on on the Heroku side to actually get my application up and running. I run "Heroku open". Do I know if there's a load balancer provisioned, DNS entries? I don't know. It's very interesting that a PaaS is a lot of magic, and that's a huge benefit as well because it makes things easier for the developer. They don't need to think about what's happening; it just happens. And if that works, it's great. The problem is if it doesn't work, it really obstructs people from digging deeper. Sometimes your application is just not suitable for that particular PaaS and may not run very well there, and you may need to change something. But you can't, and you don't really know what's going on. That's a big problem of a PaaS using too much magic.
Last but not least, PaaS are ultimately abstractions. It's layering and layering, and that creates a lock-in because we have some kind of opinionated way to do things on top, rather than actually giving us deeper level access to the containers or whatever the PaaS is spinning up under the hood. That makes it so hard to move away from something like Heroku. A lot of companies obviously had to go through that cycle.
When we're looking at Kubernetes now, it's complicated. PaaS is obviously not a great solution, but still, Kubernetes is pretty difficult. It's not our solution towards deploying applications simply. That's not really what it was intended for. But Kubernetes has some interesting approaches that are benefits and, at the same time, downsides. It gives you endless choices. I don't need to pull up that landscape; you're going to see it in every second talk about Kubernetes. There's a bazillion tools on there, and that's great because it gives you choice. But at the same time, there's a lot of FOMO. There's always a hype, like we see the same thing with vCluster. Everybody wants to do everything with vCluster right now. Obviously, you should because it's awesome. But it's definitely a hype cycle. There are some cases where vCluster makes a ton of sense, and there's probably cases where it doesn't. You should use it where it does make sense. That kind of FOMO, the fear of missing out, is actually keeping us from doing something productive a lot of times. At the same time, you're reluctant to actually make a bet on a technology if there are like 20 options, and you don't really know. You're looking at them, and you're like, "Which one are we supposed to pick?" Ideally, we want to use this for the next five-plus years. But maybe this project's not going to be around for a year. I don't know. Which one is still going to win? Is Linkerd going to win? A lot of people are looking at it that way. I don't think that's the way to look at it. I think there's a coexistence for a lot of these projects, and there are very valid reasons why there are multiple alternatives. But in a lot of cases, people are holding off on just rolling with something because they're reluctant to make that decision. I tried to find a fancy word for this and started Googling, and apparently, there is "decidophobia", which means you're too afraid to make a decision. So that you're actually holding off on it for a while. Decidophobia and FOMO are definitely real things in the Kubernetes space. Having endless choices is great, but it can lead to some problems as well.
Kubernetes is very explicit, and I consider that a big strength of it. Kubernetes is not really a ton of magic. There's like readiness probes and liveness probes, and everything is very explicit. You define everything specifically. I don't think anyone has done that for Heroku. What is the readiness probe? I don't know. There's something in place that they probably put in place, but that's the magic part. Kubernetes is incredibly explicit. There's still a lot of stuff that happens under the hood, but almost everything is exposed. Almost everything needs to be specified. Not a lot of things are default on and assumed. There's no readiness probe by default. You kind of have to set that up and define it in your YAML files. That's a big strength, in my opinion. But it also leads to that YAML hell. Everybody knows a thousand lines of YAML code is hard to read and maintain. When you're thinking about individual developers joining a new team and going through that codebase, they're like, "Gosh, what is this doing?" There are so many new Kubernetes options coming out as well that it's super hard to keep up with all of the explicitness.
Kubernetes is very composable, and that's great. Your application may be a simple three-tier backend, frontend, and some kind of data layer application, but it may be more complicated. You can design all of that in Kubernetes however you want. Services, stateful sets, the decoupling of things makes a ton of sense in Kubernetes. But that composability is also very complex. It's more complicated than in a lot of cases, and that flexibility is actually hurting quite a bit.
So, if you ask me, what is the right answer to actually overcome these issues? In my opinion, it's transparent abstractions. Transparent abstractions allow us to create something on top of Kubernetes without actually hiding Kubernetes from the end user. In my opinion, that's different than a PaaS because a PaaS creates this layer that is not really see-through, that is not really accessible. I argue that we should have transparent layers on top of Kubernetes and don't take Kubernetes away from everybody. In my opinion, you can achieve this by creating something client-only, and if we're in their CLI first. Working in the CLI terminal is super straightforward. You see the success of Heroku as well. Being client-only means there is no server magic going on. It's a translation of everything I do is going to be translated to something that I could also do with kubectl. By being client-only, I achieve that. If I have some kind of server in the middle that I now send some kind of GraphQL request to, and somehow that ends up somehow in Kubernetes, there's some magic happening there. If I'm in a CLI, there's a lot less surface area for magic to happen. And at the same time, you can still create a very efficient experience. In my opinion, you need to make sure that whatever this layer is, it's very customizable and very extendable. Essentially, when you're building an abstraction layer for within your company, first, you may be using different technologies within your company. There may be three different teams working on different things, and nobody can define the golden path for everybody at the same time. So there needs to be some common ground and then a ton of flexibility so everybody can optimize for their specific scenario. Last but not least, I think being based on the Kubernetes API is very important. Again, not having a server-side component that does any kind of translation and magic, but rather saying, "Hey, like I showed earlier, we need five kubectl commands to actually do what we want to do. How can we just automate that?" How can we bake it into one command that is easier and skips a couple of steps but essentially can be translated to Kubernetes API server requests? I think that is super important.
So, the question is obviously, what is the best tool? And as we discussed earlier, for each category, there's always one best tool, and it's gonna win and succeed. No, obviously, that's not the case for this category as well. That's why I'm showing you a variety of tools here. And since we're building the tool DevSpace, I'm going to demo that in a couple of minutes to walk you through how the experience looks there and explain a little bit about what happens under the hood. But any of the tools in this space will do. There may be tools that are better suitable for your case than DevSpace, so I just urge you to check all of them out. There are essentially two buckets, and I'm glad we made the jump from one bucket to another one very recently with DevSpace. The first bucket is CNCF projects. There are two projects in CNCF sandbox that are very exciting to push on developer velocity, developer experience with such a transparent layer approach rather than creating a PaaS offering.
The first one is Telepresence. They've been around forever. I think they started four or five years ago. The idea is you deploy your application somehow, via GitOps, via Helm CLI, it's not very specified. But then you start part of the application locally. Let's say you have a microservice system, and you have four microservices. They're deployed via ArgoCD, and now, as a developer, on every branch, you're essentially deploying into a different namespace. As a developer, I now get access to that dev cluster. What I want to do is skip all of this image pushing, having to push to GitHub, going through the CI pipeline. I just want to debug one of these microservices. I want to see, "What if I make a change here? Does it still work?" I want to run this integration test. I want to be as close to Kubernetes as possible, as close to the real EKS environment as possible. So, I'm deploying that application, then I spin up Telepresence to do some network magic. What they're doing is they're deploying something very lightweight in that namespace.
And then they're connecting the local environment to the other services in that namespace. This allows me to run one or potentially two services locally. I can start hard reloading my debugger since they run locally on my laptop. I can debug them with my IDE just like a regular process. But when I do a network request to services B and C in my microservice system, I can communicate with the real services running in production mode inside Kubernetes. This is obviously very useful, providing a quick cycle because we don't even need to build images for any of this. We're running that one service locally, so we can just start it in Dev mode, allowing for different settings.
The tricky part about Telepresence arises when you need to mount a config map or attach a volume to the current service. A local laptop can't easily mount a volume from your remote Kubernetes cluster and pull the data. There are workarounds for these issues, and Telepresence offers some solutions. However, there are significant constraints in terms of operating systems. Everything works great on Linux, especially on popular distros. On Mac, you can make it work, but on Windows, it's challenging, especially when talking about volume mounts.
Another solution in CNCF is DevSpace. I'm obviously biased, having been one of the maintainers of DevSpace since 2018. We recognized this problem early on. At the time, not many people considered Kubernetes for individual developers and Dev teams. The first talk I gave about DevSpace in 2018 had only four attendees. The mindset then was that Kubernetes was for production. However, this mindset has changed since, and it's great to see the progress. Many now realize the importance of shifting things left, starting earlier with access to Kubernetes, and ensuring parity between Dev, staging, and production environments.
Unlike Telepresence, with DevSpace, nothing runs locally. The idea is to move the runtime towards Kubernetes entirely, but skipping the image-building part. We connect your local file system and ports with the remote container. By launching a modified container, we can swap out the production image with a Dev image. We then connect the file system to resync either source code or even compile binaries, allowing for hot reloading of the container without rebuilding images. This means you can code in your local IDE, attach a remote debugger, and access things on localhost because of port forwarding, similar to Telepresence. Every time you change a line of code, your application updates inside the remote container. This creates an efficient Dev experience.
There are other tools as well, like Skaffold from Google, Tilt (recently acquired by Docker), Tether, and Garden. There are many more tools, but I've listed the most popular ones with active user bases and Slack channels. Whichever tool you pick, test them with your team, see which one works, and then run with the one that works best. Improve your workflow. Since it's CLI only, it's straightforward to switch tools.
That being said, this was the last slide. I can walk you through a DevSpace tutorial quickly to give you a glimpse of how the project works. Then, maybe we'll have a few minutes for questions.
Let's switch to VS Code. I've already checked out our Golang quick start project. It's simple and has a Docker file in there. Essentially, it sets the work directory and builds a Go binary, then runs go run. We have a single main.go file here, which outputs some basic HTML code.
If I want to run this, I have Docker Desktop running in the background. This allows me to run commands like kubectl get pods and kubectl get namespaces. I can use V cluster to get a fresh virtual cluster. I don't want to mess up my Docker Desktop Kubernetes cluster. Sometimes, I have to reset it every time I try an operator that doesn't work, which can mess up my cluster. So, I'll run vcluster create demo, which spins up a virtual cluster. This virtual cluster is essentially a container running in a namespace. Inside that namespace, there's a control plane being launched with a separate data store from the underlying cluster. Whatever I do in a virtual cluster doesn't affect the underlying cluster, except for the pods which are visible in the underlying cluster.
Now, we're inside this project, and I want to test DevSpace. I'll run devspace init to initialize it. DevSpace will ask me about the programming language, and it detects there's a Golang file. It will also ask how to deploy this. I'll choose Helm for this demonstration. DevSpace detected that I'm signed in with Docker Hub, so I'll confirm that as the image registry.
We now have a devspace.yaml file which defines how this project is deployed. It's straightforward and very readable. If I want to fire this up, I'll tell DevSpace which namespace to use. I'll call this namespace "demo" and run devspace dev. DevSpace will show us what it's doing under the hood. It's transparent, and everything is translated to kubectl commands.
Now, I'm inside the remote container. I can run go main.go to start the application. Our Dev image typically overwrites pid1, so the application isn't running yet. I can start it myself, which allows me to see the logs immediately and restart it if needed. If I make changes to my main.go file and save it, I can rerun the application and see the changes immediately in my browser. Everything feels like localhost, but it's all happening remotely. There's also a mode where you can connect directly to the container, and everything would be remote. We're connecting into the container via SSH and VS Code's remote extension. This even allows for setting breakpoints for debugging.
Since this is a virtual cluster, I can exit the container and run vcluster disconnect to leave the virtual cluster's context. I can then run vcluster delete demo to delete my virtual cluster, and it's as if nothing ever happened.
That concludes the talk and demo. I hope this was helpful and interesting. If you have any questions, I'll be here for another five or ten minutes. Thank you.
Do we have any questions?
We're just wondering about the advantages of using a virtual cluster versus just having a separate namespace. Is it just the sub-namespacing?
The advantage of a virtual cluster is that you can clean it up just like a namespace. In this localhost scenario, if I was using my Docker desktop with a more complicated application, a lot can go wrong and mess up your cluster. Do you really want to mess up the cluster or work on a virtual cluster? Another thing is you can run the command "v-cluster pause". This pauses the entire virtual cluster and everything in it. The state remains, but no containers are running. This means your Docker locally won't run any containers, and in EKS, your cluster will hopefully scale down when you pause the virtual cluster.
I'll also add that sometimes you have cluster-wide resources. Also, deleting a namespace doesn't always work for some reason. I have a question: do you have to exit the v-cluster first before deleting it? If you try to just delete it, it doesn't do both?
No, it wouldn't do anything. When you're inside the virtual cluster, you're talking to the virtual cluster's API server, so you have no access to the underlying cluster.
Got it. And if you make changes to the Dockerfile, like adding a new port, do you need to exit and restart?
That is unfortunately the case. It's quick to do that. When you abort "devspace dev" and then change the file and run "devspace dev" again, it's immediate. But we did have requests to do hot reloading by watching the devspace yaml file. We're thinking about it, but it's a lot of work.
You mentioned breakpoints earlier. Were you talking about telepresence or Devspace?
If you have VS Code or IntelliJ, you can connect to a remote container via SSH. Devspace injects an SSH server for you if you set the SSH to true in the devspace yaml file. Then you can connect your VS Code to that SSH server, and your files will be the remote files. Your terminal will be the remote terminal without running any devspace command, and you can set breakpoints immediately.
Cool, thank you so much. Enjoy the rest of the conference.
Stay up to date
Sign up to the Navigate mailing list and stay in the loop with all the latest updates and news about the event.