For many years system administrators have used Cron to automate recurring tasks on Unix systems, in comparison CronJobs in Kubernetes is the new kid on the block, reaching general availability in April of 2021, CronJobs provides a resource that allows users to schedule recurring tasks, each CronJob is similar to a crontab (cron table) file on a Unix system.
In this tutorial, we’ll look at how to automate database backups using Kubernetes CronJobs, to store the backups, we will be leveraging Civo’s object storage.
Prerequisites
This article assumes some working knowledge of Kubernetes. In addition, you would need the following installed:
Creating a Database
We’ll begin by creating a database using the Civo CLI:
civo db create backup-labs -m PostgreSQL
This creates a one-node database cluster. Using the -m
flag we supply the type of database we want to create. At the time of writing, Civo supports PostgreSQL and MySQL.
From this, you should see the following output:
Seeding the Database
Before we create a backup, we’ll need a database. In your terminal, run the following command to create one:
psql -U civo -h [HOST_IP] -W -c 'create database customers;'
Next, let’s create a schema and populate the database with some mock data. In a directory of your choice, create a file named schema.sql
. Add the following code to define the Customers
table:
Creating a Table
CREATE TABLE Customers (
ID serial ,
Name varchar(50) NOT NULL,
Phone varchar(15) NOT NULL,
Address varchar(50),
Birthday date NOT NULL,
CustomerEmail varchar(50) NOT NULL,
PRIMARY KEY (ID)
);
Apply the schema changes
psql -U civo -d customers -h 74.220.17.133 -W -f schema.sql
Adding Mock Data
Begin creating a new file called data.sql
within your editor of choice, and add the following code:
INSERT INTO Customers (Name, Phone, Address, Birthday, CustomerEmail)
SELECT
md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) as Name,
substring(md5(random()::text || clock_timestamp()::text)::uuid::varchar(50), 1, 15) as Phone,
md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) as Address,
current_date - interval '18 years' - random() * interval '50 years' as Birthday,
md5(random()::text || clock_timestamp()::text)::uuid::varchar(50) || '@example.com' as CustomerEmail
FROM generate_series(1, 100); -- Adjust the number of rows as needed
Apply the schema changes
psql -U civo -d customers -h <YOUR-DATABASE-IP> -W -f data.sql
You can verify the mock data was indeed generated by running the following command:
psql -U civo -d customers -h 74.220.17.133 -W -c 'select * from customers;'
Output should be similar to:
Creating a backup script
With a database created, we can shift our attention towards the backups. For this demonstration, we will be using a bash script to perform the backup operations. Create file named backup.sh
and add the following code:
#!/bin/bash
DB_HOST=$DB_HOST
DB_NAME=$DB_NAME
S3_BUCKET=$S3_BUCKET
BACKUP_PREFIX=cronjob
# Create a timestamped backup filename
BACKUP_FILENAME="${BACKUP_PREFIX}_$(date +%Y%m%d_%H%M%S).sql"
# Create the database backup
PGPASSWORD="$DB_PASSWORD" pg_dump -U civo -h $DB_HOST $DB_NAME > ./$BACKUP_FILENAME
# configure aws cli
aws configure set aws_access_key_id $AWS_ACCESS_KEY_ID
aws configure set aws_secret_access_key $AWS_SECRET_ACCESS_KEY
aws configure set default.region LON1
# Upload the backup to S3
aws --endpoint-url <https://objectstore.lon1.civo.com> s3 cp $BACKUP_FILENAME s3://$S3_BUCKET
# Cleanup (optional)
rm $BACKUP_FILENAME
The script starts by defining several environment variables that will be used later:
DB_HOST
: The hostname or IP address of the PostgreSQL database serverDB_NAME
: The name of the PostgreSQL database to back upS3_BUCKET
: The name of the bucket to upload backups toBACKUP_PREFIX
: A prefix that will be added to backup filenames
It then constructs a backup filename using BACKUP_PREFIX
, the current date/time, and a .sql
extension. This ensures each backup has a unique name.
The pg_dump
command creates a compressed backup file in custom format. It connects to the DB server using the configured credentials and database name and writes the output to the backup filename generated earlier.
As we have used object store, we created resides in the LON1 region on Civo. The endpoint URL is https://objectstore.lon1.civo.com
.
The AWS CLI is configured using the access key and secret access key environment variables. This allows uploading the backup file to S3.
Civo's object storage is S3 compatible, which means it can be accessed and managed using the same tools and APIs as Amazon S3. Therefore, we can utilize the AWS CLI, a command-line interface tool for interacting with AWS services, to upload backups to Civo's object storage.
Finally, the backup is uploaded to the specified S3 bucket and then deleted locally. The upload location in S3 will be s3://$S3BUCKET/$BACKUPFILENAME
.
Containerizing the backup
Next up, we need to create a container image we can deploy to our Kubernetes cluster. Create a file named Dockerfile
and add the following directives:
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y \\
curl \\
openssl \\
postgresql-client \\
python3-pip \\
libsasl2-modules \\
libssl-dev \\
postgresql-client-common \\
libpq-dev
RUN pip3 install awscli
RUN mkdir /scripts
COPY backup.sh /scripts
WORKDIR /scripts
RUN chmod +x backup.sh
ENTRYPOINT [ "./backup.sh" ]
Next, we need to build and push the image to a container registry. In this demo, we will be using ttl.sh, an ephemeral container registry that doesn’t require authentication to use, this makes it easy to use in demos such as these. In production, you’d probably want to use an internal registry or something like DockerHub to store your images.
Build and push the image
export IMAGE_NAME=k8s-db-backup
docker build --push -t ttl.sh/${IMAGE_NAME}:1h .
Notice we used 1h
as the image tag? This tells ttl.sh that we want to store our image for an hour.
Creating an Object Store
Before we begin scheduling backups, the last resource we need to provision is an object store. This can be any S3-compatible storage, for this demonstration, I would be using Civo’s object storage. To create an object store using the CLI, run the following commands:
Generate object store credentials
civo objectstore credentials create k8s-backup
Create the object store
civo objectstore create –region LON1 k8s-db-backup --owner-access-key k8s-backup --wait`
Finally, you’d need to obtain your object store credentials. To do this using the CLI, run:
civo objectstore credential secret --access-key=[your access key]
Scheduling Backups
With all the moving parts in place, we can finally schedule a backup using the CronJob resource. Create a file named backup.yaml
and follow along with the code below:
apiVersion: batch/v1
kind: CronJob
metadata:
name: db-backup
spec:
schedule: "*/5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: db-backup
image: ttl.sh/k8s-db-backup:1h
env:
- name: DB_HOST
value: [YOUR DB IP ADDRESS]
- name: DB_NAME
value: "customers"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-password
key: password
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: civo-credentials
key: access-key-id
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: civo-credentials
key: secret-access-key
- name: S3_BUCKET
value: "k8s-db-backup"
restartPolicy: OnFailure
In the manifest above, we created a new CronJob resource named db-backup
. The job also specifies the docker image containing the backup script we created earlier and configures the appropriate environment variables.
Before we can apply this manifest, we need to supply the secrets. To do this, create a file named secrets.yaml
and add the following code:
---
apiVersion: v1
kind: Secret
metadata:
name: db-password
type: Opaque
data:
password: <BASE64_ENCODED_DB_PASSWORD>
---
apiVersion: v1
kind: Secret
metadata:
name: civo-credentials
type: Opaque
data:
access-key-id: <BASE64_ENCODED_AWS_ACCESS_KEY_ID>
secret-access-key: <BASE64_ENCODED_AWS_SECRET_ACCESS_KEY>
echo -n "credential" | base64
Apply the manifests:
# create the secret
kubectl apply -f secrets.yaml
# create the cronJob
kubectl apply -f backup.yaml
In five minutes, the CronJob should kick in, and Kubernetes will spawn a new pod using the image we provided to perform the backup. To verify the pods get created, run the following:
kubectl get pods
Your output should be similar to:
Viewing Backups
To view the backup that has been created, head over to your Civo dashboard. Click on the object store tab and select the bucket you created. You should see the following:
Summary
Whether you run your database within or outside Kubernetes, backups will always remain an essential part of your disaster recovery plan. In this tutorial, we covered one of many ways to backup your database using CronJobs.
Looking to learn more about backups? Here are a few ideas:
- Learn how to secure your file backups using Minio and Restic
- Check out this guide on using Velero for Postgres Backups