In the early days, people debated how safe it was to store their money in the bank; now, we debate running databases on Kubernetes. Over the years, Kubernetes has evolved significantly, transforming into a capable platform for handling various workloads, including stateful ones. In this blog, I will consolidate some of the best arguments from both sides and provide you with some points to discuss with your team lead in your next conversation.

It's an interesting topic with varying answers. In this blog, I will consolidate some of the best arguments from both sides and provide you with some points to go over with your team lead in your next conversation.

Why was Kubernetes considered bad for databases?

Before the introduction of StatefulSets in Kubernetes 1.3, there was no native means for running stateful workloads (e.g., databases), meaning the fear of losing all your data when a pod restarts. Also, keep in mind that Kubernetes has been around since 2014, so this is more than enough time for the concern to build.

Many argue that stateful sets and persistent volumes didn't necessarily fix the problem, with recent data suggesting that more people are running their databases on Kubernetes than ever before. As we see in the DOK report from 2022, 70% of the 501 respondents believe Kubernetes is ready for stateful workloads.

The data is great; however, this doesn't directly translate into a yes or no. Next, let's look at some things to consider when running databases on Kubernetes.

Considerations for Running Databases on Kubernetes

StatefulSets and Friends

StatefulSets gave databases a fighting chance of running on Kubernetes. They provide a way to manage stateful applications, ensuring stable network identities and persistent storage for each pod. This was a crucial step in making Kubernetes a viable platform for databases, which require consistent naming, network identities (i.e. address you can use to connect) , and stable persistent storage.

The introduction of Container Storage Interface (CSI) drivers in Kubernetes 1.13 (released in December 2018) further enhanced this capability. CSI drivers provided cloud providers with a standardized way to expose their storage systems to Kubernetes; it's important to note that while StatefulSets, Persistent Volumes, and CSI drivers solved many problems, they are now things you need to worry about.

Team Experience and Skills

Speaking of concerns, the next concern is talent; depending on your team's expertise, you may or may not have a great time; if a good portion of your team is already comfortable with Kubernetes, adopting databases on this platform will be significantly easier. They'll be familiar with concepts discussed previously as opposed to spending valuable hours on the Kubernetes learning curve or, better still, spending weeks hiring someone who knows how to.

Assess your team's current Kubernetes expertise and consider if it will be a worthy investment to hire new talent or have existing members of the team

Node Selection

When running databases on Kubernetes, node selection becomes a crucial consideration. Unlike stateless applications that can run on any available node, databases often have specific requirements that influence node choice.

You'll often want to choose specific node sizes and dedicate them to your database workloads. This ensures that your database has consistent access to the resources it needs without competing with other applications.

For many standard database workloads, managed database services already handle this complexity for you. Their teams of experts have carefully selected and tested optimal hardware configurations, taking the guesswork out of node selection. If you don't have specialized requirements, managed services can provide equally reliable performance without the overhead of managing infrastructure choices. Consider whether you truly need custom hardware configurations for your database. If your requirements align with standard database workloads.

Why you Should run your Database on Kubernetes

So far, we have largely discussed the challenges of running databases on Kubernetes, however, these might not be much of a challenge if you have been around the cloud native block, if so here are some good reasons to consider:

Operators are awesome

Database operators have come a long way, offering the ability to create, backup, and monitor databases. The experience on Kubernetes has become significantly easier. A good example is the PostgreSQL Operator by Zalando. It offers automatic failover and backups by providing custom resources that make defining your database cluster declarative.

Another area where operators shine is in handling upgrades. Upgrading the PostgreSQL version of a large cluster becomes much more manageable when using an operator, as opposed to manually upgrading each node in your self-hosted cluster. The process is often as simple as updating a few configuration parameters.

The important thing to note here is that database operators significantly simplify database management on Kubernetes, but you should also keep in mind how sustainable this in the long term by factoring in your team size and expertise

Portability

One key aspect of the Kubernetes ecosystem is its level of portability. This becomes particularly relevant when considering the trade-offs when selecting what route to take to deploy your database.

While managed services offer convenience, running your database on Kubernetes provides a degree of flexibility. Your database setup becomes a set of Kubernetes manifests, which can be deployed across different environments or cloud providers with minimal changes.

If you are looking to run a database in Kubernetes, consider if this portability is a worthy benefit to your specific environment.

When to not run your Database in Kubernetes

Operational Simplicity

So far, a lot of the pros of running your database on Kubernetes assume that you or someone on your team is familiar with and comfortable running applications on Kubernetes. Running a stateful workload such as a database on Kubernetes is no small feat. Managed services can be a huge time operational and time save as it is often trivial to set up a database cluster.

Scale

Scaling databases on Kubernetes presents unique challenges that differ from scaling stateless applications. While Kubernetes offers robust autoscaling for stateless applications, it lacks the fine-grained controls and predictability often required for stateful workloads like databases. Managed services, on the other hand, typically include purpose-built autoscaling solutions that ensure performance consistency and reliability. The key takeaway here is that if your database requires precise and predictable scaling, managed services might be the better option. On the other hand, if you need the flexibility to customize your scaling strategy and have the expertise to manage it, Kubernetes can be a viable choice.

Cost

Running a database in Kubernetes requires a broader perspective on costs. Unlike managed services, where pricing is often straightforward, Kubernetes introduces additional cost considerations. First, it's rare to deploy a Kubernetes cluster solely to host a database. Instead, your database typically coexists with other workloads. This raises questions about capacity planning—how much of your cluster's resources should be reserved for the database versus other applications?

Next, you need to account for node costs, which vary based on your cloud provider and the instance types you choose. These costs can escalate quickly if you need high-performance nodes to support a resource-intensive database. On top of this are persistent volume costs, which depend on storage types (e.g., SSDs vs. HDDs).

In contrast, managed database providers like Civo offer predictable pricing models. From the get-go, as such you can reliably predict your costs month to month and scale as your project grows.

Summary

Running stateful workloads on Kubernetes has come a long way, and the narrative that you cannot or shouldn't run your stateful workloads on it is certainly not true. However, it's important to carefully consider the points highlighted in this post to make an informed decision.

Simplicity remains key when running stateful workloads such as a database, if your use case is not complex and will not benefit from some of the features highlighted in this post, then a managed service is the way to go. For more environments with more unique constraints like

If you’re looking to learn more about running your database on Kubernetes, check out these resources:

Managed Databases, optimized for performance

Civo Databases reduces the complexity of managing your own databases by handling the heavy lifting, allowing you to focus on developing applications with minimal administrative burden.

👉 Launch a database with Civo today!