Automating Database Backups With Kubernetes CronJobs

For many years system administrators have used Cron to automate recurring tasks on Unix systems, in comparison CronJobs in Kubernetes is the new kid on the block, reaching general availability in April of 2021, CronJobs provides a resource that allows users to schedule recurring tasks, each CronJob is similar to a crontab (cron table) file on a Unix system.

In this tutorial, we’ll look at how to automate database backups using Kubernetes CronJobs, to store the backups, we will be leveraging Civo’s object storage.

Prerequisites

This article assumes some working knowledge of Kubernetes. In addition, you would need the following installed:

Creating a Database

We’ll begin by creating a database using the Civo CLI:

civo db create backup-labs -m PostgreSQL

This creates a one-node database cluster. Using the -m flag we supply the type of database we want to create. At the time of writing, Civo supports PostgreSQL and MySQL.

From this, you should see the following output:

Automating Database Backups With Kubernetes CronJobs Creating a Database

Seeding the Database

Before we create a backup, we’ll need a database. In your terminal, run the following command to create one:

psql -U civo -h [HOST_IP] -W -c 'create database customers;'

Next, let’s create a schema and populate the database with some mock data. In a directory of your choice, create a file named schema.sql. Add the following code to define the Customers table:

Creating a Table

CREATE TABLE Customers (
    ID  serial ,
    Name varchar(50) NOT NULL,
    Phone varchar(15) NOT NULL,
    Address varchar(50),
    Birthday date NOT NULL,
    CustomerEmail varchar(50) NOT NULL,
    PRIMARY KEY (ID)
);

Apply the schema changes

psql -U civo -d customers -h 74.220.17.133 -W -f  schema.sql

Adding Mock Data

Begin creating a new file called data.sql within your editor of choice, and add the following code:

INSERT INTO Customers (Name, Phone, Address, Birthday, CustomerEmail)
SELECT
    md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) as Name,
    substring(md5(random()::text || clock_timestamp()::text)::uuid::varchar(50), 1, 15) as Phone,
    md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) as Address,
    current_date - interval '18 years' - random() * interval '50 years' as Birthday,
    md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) || '@example.com' as CustomerEmail
FROM generate_series(1, 100); -- Adjust the number of rows as needed

Apply the schema changes

psql -U civo -d customers -h <YOUR-DATABASE-IP> -W -f data.sql

You can verify the mock data was indeed generated by running the following command:

psql -U civo -d customers -h 74.220.17.133 -W -c 'select * from customers;'

Output should be similar to:

Automating Database Backups With Kubernetes CronJobs Schema Changes Output

Creating a backup script

With a database created, we can shift our attention towards the backups. For this demonstration, we will be using a bash script to perform the backup operations. Create file named backup.sh and add the following code:

#!/bin/bash

DB_HOST=$DB_HOST
DB_NAME=$DB_NAME
S3_BUCKET=$S3_BUCKET
BACKUP_PREFIX=cronjob

# Create a timestamped backup filename
BACKUP_FILENAME="${BACKUP_PREFIX}_$(date +%Y%m%d_%H%M%S).sql"

# Create the database backup
PGPASSWORD="$DB_PASSWORD" pg_dump -U civo -h $DB_HOST $DB_NAME > ./$BACKUP_FILENAME

# configure aws cli
aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID
aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY
aws configure set default.region LON1

# Upload the backup to S3
aws --endpoint-url <https://objectstore.lon1.civo.com> s3 cp $BACKUP_FILENAME s3://$S3_BUCKET

# Cleanup (optional)
rm $BACKUP_FILENAME

Cleaning up files already uploaded is essential to avoid filling up the container’s file system with outdated backups.

The script starts by defining several environment variables that will be used later:

DB_HOST: The hostname or IP address of the PostgreSQL database server
DB_NAME: The name of the PostgreSQL database to back up
S3_BUCKET: The name of the bucket to upload backups to
BACKUP_PREFIX: A prefix that will be added to backup filenames

It then constructs a backup filename using BACKUP_PREFIX, the current date/time, and a .sql extension. This ensures each backup has a unique name.

The pg_dump command creates a compressed backup file in custom format. It connects to the DB server using the configured credentials and database name and writes the output to the backup filename generated earlier.

As we have used object store, we created resides in the LON1 region on Civo. The endpoint URL is https://objectstore.lon1.civo.com.

The AWS CLI is configured using the access key and secret access key environment variables. This allows uploading the backup file to S3.

Civo's object storage is S3 compatible, which means it can be accessed and managed using the same tools and APIs as Amazon S3. Therefore, we can utilize the AWS CLI, a command-line interface tool for interacting with AWS services, to upload backups to Civo's object storage.

Finally, the backup is uploaded to the specified S3 bucket and then deleted locally. The upload location in S3 will be s3://$S3BUCKET/$BACKUPFILENAME.

Containerizing the backup

Next up, we need to create a container image we can deploy to our Kubernetes cluster. Create a file named Dockerfile and add the following directives:

FROM ubuntu:22.04 
RUN apt-get update && apt-get install -y \\
    curl \\
    openssl  \\
    postgresql-client \\
    python3-pip  \\
    libsasl2-modules \\
    libssl-dev \\
    postgresql-client-common \\
    libpq-dev

RUN pip3 install awscli
RUN mkdir /scripts 
COPY backup.sh /scripts
WORKDIR /scripts
RUN chmod +x backup.sh
ENTRYPOINT [ "./backup.sh" ]

Next, we need to build and push the image to a container registry. In this demo, we will be using ttl.sh, an ephemeral container registry that doesn’t require authentication to use, this makes it easy to use in demos such as these. In production, you’d probably want to use an internal registry or something like DockerHub to store your images.

Build and push the image

export IMAGE_NAME=k8s-db-backup
docker build --push -t ttl.sh/${IMAGE_NAME}:1h .

Notice we used 1h as the image tag? This tells ttl.sh that we want to store our image for an hour.

Creating an Object Store

Before we begin scheduling backups, the last resource we need to provision is an object store. This can be any S3-compatible storage, for this demonstration, I would be using Civo’s object storage. To create an object store using the CLI, run the following commands:

Generate object store credentials

civo objectstore credentials create k8s-backup

Create the object store

civo objectstore create –region LON1 k8s-db-backup --owner-access-key k8s-backup --wait`

Finally, you’d need to obtain your object store credentials. To do this using the CLI, run:

civo objectstore credential secret --access-key=[your access key]

Scheduling Backups

With all the moving parts in place, we can finally schedule a backup using the CronJob resource. Create a file named backup.yaml and follow along with the code below:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: db-backup
spec:
  schedule: "*/5 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: db-backup
            image: ttl.sh/k8s-db-backup:1h
            env:
            - name: DB_HOST 
              value: [YOUR DB IP ADDRESS]
            - name: DB_NAME
              value: "customers"
            - name: DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: db-password 
                  key: password
            - name: AWS_ACCESS_KEY_ID
              valueFrom: 
                secretKeyRef:
                  name: civo-credentials
                  key: access-key-id  
            - name: AWS_SECRET_ACCESS_KEY
              valueFrom:
                secretKeyRef:
                  name: civo-credentials
                  key: secret-access-key
            - name: S3_BUCKET
              value: "k8s-db-backup"
          restartPolicy: OnFailure

In the manifest above, we created a new CronJob resource named db-backup. The job also specifies the docker image containing the backup script we created earlier and configures the appropriate environment variables.

Before we can apply this manifest, we need to supply the secrets. To do this, create a file named secrets.yaml and add the following code:

---
apiVersion: v1
kind: Secret
metadata:
  name: db-password
type: Opaque
data:
  password: <BASE64_ENCODED_DB_PASSWORD> 
---
apiVersion: v1
kind: Secret
metadata:
  name: civo-credentials
type: Opaque
data:
  access-key-id: <BASE64_ENCODED_AWS_ACCESS_KEY_ID> 
  secret-access-key: <BASE64_ENCODED_AWS_SECRET_ACCESS_KEY>

💡 To encode your credentials run echo -n "credential" | base64

Apply the manifests:

# create the secret 
kubectl apply -f secrets.yaml 

# create the cronJob 
kubectl apply -f backup.yaml

In five minutes, the CronJob should kick in, and Kubernetes will spawn a new pod using the image we provided to perform the backup. To verify the pods get created, run the following:

kubectl get pods

Your output should be similar to:

Automating Database Backups With Kubernetes CronJobs Scheduling Backups

Viewing Backups

To view the backup that has been created, head over to your Civo dashboard. Click on the object store tab and select the bucket you created. You should see the following:

Automating Database Backups With Kubernetes CronJobs Viewing Backups

Summary

Whether you run your database within or outside Kubernetes, backups will always remain an essential part of your disaster recovery plan. In this tutorial, we covered one of many ways to backup your database using CronJobs.

Looking to learn more about backups? Here are a few ideas:

Learn how to secure your file backups using Minio and Restic
Check out this guide on using Velero for Postgres Backups

Automating Database Backups With Kubernetes CronJobs

Prerequisites

Creating a Database

Seeding the Database

Creating a backup script

Containerizing the backup

Creating an Object Store

Scheduling Backups

Viewing Backups

Summary

Jubril Oyetunji

These may also be of interest

Why are network policies in Kubernetes so hard to understand?

Kubernetes monitoring with Prometheus and Grafana

Creating a MariaDB Galera Cluster on Ubuntu

Kubernetes

Compute

Databases

CivoStack Enterprise

Civo FlexCore

CivoStack for Service Providers

Cloud GPU

Carbon neutral GPU

Kubeflow as a Service

Case studies & testimonials

Learn

Blog

White papers

Documentation

Civo news

Meetups

Marketplace

Use Civo for your demos

Automating Database Backups With Kubernetes CronJobs

Prerequisites

Creating a Database

Seeding the Database

Creating a backup script

Containerizing the backup

Creating an Object Store

Scheduling Backups

Viewing Backups

Summary

Jubril Oyetunji

These may also be of interest

Why are network policies in Kubernetes so hard to understand?

Kubernetes monitoring with Prometheus and Grafana

Creating a MariaDB Galera Cluster on Ubuntu