Kubernetes has become the de-facto standard for the deployment of cloud-native applications. Kubernetes has been made popular by the increasing shift to a microservices architecture. This type of system, with lightweight, loosely coupled, autonomous web services, is well-suited for the deployment of distributed applications.
However, managing distributed applications at a scale is challenging, especially when multiple components are involved. Moreover, the situation quickly becomes challenging when we want to ensure the system's overall correctness, even in partial failures.
In this tutorial, we will explore the three Kubernetes probes that enable the construction of highly available, robust, and self-healing distributed applications. As our demo application, we will focus on NGINX. We will begin by examining the constraints of a basic deployment and subsequently improve it incrementally using the Kubernetes probes.
By the conclusion of this guide, we will have established a reliable methodology that can be employed for deploying any other web application in a production environment. So, let’s dive into the tutorial!
Prerequisites
To get the most out of this guide, you will need access to the following resources and tools:
- A Civo account
- A Kubernetes cluster
- The latest kubectl utility to interact with the Kubernetes cluster you have created
- The KUBECONFIG file pointing kubectl to your cluster, downloadable from the cluster page on your Civo dashboard
Setting up core components
Before we begin, you will need to ensure that you have a new namespace and a basic NGINX deployment created. Let’s run over how you can do that:
Creating a new namespace
Kubernetes namespaces provide a way to isolate the resources within a cluster. This is completely optional, but for this tutorial, we are creating the namespace to ensure that it doesn't conflict with other existing resources you may have running on a cluster.
So, let's create a new namespace with the name probe-demo
:
$ kubectl create ns probe-demo
Next, let's set the probe-demo
namespace as the current context:
$ kubectl config set-context --current --namespace=probe-demo
Creating a basic NGINX deployment
Now, let's create a basic NGINX deployment without any health checks. Save the following as basic-deployment.yml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
Next, let's deploy this configuration using kubectl apply-f basic-deployment.yml
and verify that all pods are in a healthy state:
$ kubectl apply -f basic-deployment.yml
deployment.apps/nginx created
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-ff6774dc6-4lxcz 1/1 Running 0 23s
nginx-ff6774dc6-wrsjj 1/1 Running 0 23s
As we can see, now all the pods are in a running state.
Understanding the need for Kubernetes probes
In the previous stage, we created a basic NGINX deployment without any health checks. However, by default, Kubernetes provides the process health check, which verifies whether or not the main process of the container is running. If not, then by default, Kubernetes restarts that container.
In addition, we are using Kubernetes' deployment object to run multiple instances of the Pod, in this case, 2. So can we say that this deployment is robust and resilient to failures? Not really. Let's see why this is so.
First, let's verify that the NGINX web server is healthy and able to render the welcome page. Make sure to swap in one of the pod names from your cluster, as they will be different:
$ kubectl exec -it nginx-ff6774dc6-4lxcz -- curl http://localhost
If everything is fine, then the above command will display the default NGINX HTML welcome page on the terminal.
The NGINX web server renders its welcome page from the /usr/share/nginx/html/index.html
file. Now, to simulate an error scenario, let's delete this file on the container and execute the same HTTP request using the curl
command:
$ kubectl exec -it nginx-ff6774dc6-4lxcz -- rm /usr/share/nginx/html/index.html
$ kubectl exec -it nginx-ff6774dc6-4lxcz -- curl http://localhost
In the above output, we can see that now we get a 403 Forbidden error.
Here, we can notice that even though the NGINX daemon is running, it's not serving any functional purpose at this moment. Because it's not able to render the required page, it is throwing the HTTP status code 403.
It is very much possible that any other web application can end up in a similar situation. One such scenario is that the Java Virtual Machine(JVM) might throw the OutOfMemoryError, but the JVM process is still in a running state. This is a problematic situation because the application cannot serve any requests, but the process health check considers the application as healthy.
In such scenarios, the quick and short-term fix is to restart the Pod. Wouldn't it be great if this could happen automatically? In fact, we can achieve this using Kubernetes probes. So let's learn more about them.
Types of Kubernetes probes
Monitoring the health of an application is an essential task. However, only monitoring is not sufficient and we must take corrective actions in case of failures to maintain the overall availability of the system. Kubernetes provides a reliable way to achieve this using probes. It provides the following three types of probes:
- Liveness probe: This probe constantly checks whether or not the container is healthy and functional. If it detects an issue, then by default, it restarts the container
- Readiness probe: This probe checks whether or not the container is ready to accept incoming requests. If yes, then the requests are sent to the container for further processing
- Startup probe: This probe determines whether a container has started or not
Each probe provides three different methods for checking the application's health:
- Command: This method executes the provided command in a container. As long as the return value is 0, i.e., not an error, this indicates a success.
- TCP: This method attempts to establish a TCP connection with the container. A successful connection establishment indicates success
- HTTP request: This method executes an HTTP request on the container. A response HTTP status code between 200 and 399 (both inclusive) indicates success.
Just now we discussed the probe types and their methods. But which one should be used? There is no single fixed answer, because that depends on your application. We can choose the method that is most suitable for the application. This is the reason why the various probe types exist.
Liveness probe
In the previous section, we saw that the process health check could not recognize whether or not the application is functional though it appears live. Sometimes, restarting the application might solve the intermittent issue. In such cases, we can use the Kubernetes' liveness probe.
The liveness probe allows us to define an application-specific health check. In other words, this mechanism provides a reliable way to monitor the health of any given application. Let's understand its usage with an example.
Defining a liveness command
Let's define the liveness probe to check the existence of the /usr/share/nginx/html/index.html
file. We can use the ls
command to achieve this. After adding the liveness probe, our deployment definition from earlier looks like this:
command-liveness.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
exec:
command:
- ls
- /usr/share/nginx/html/index.html
Now, let's deploy this updated configuration and verify that the pods are in a healthy state:
$ kubectl apply -f command-liveness.yaml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-847c64cc7c-664lf 1/1 Running 0 42s
nginx-847c64cc7c-jb2bq 1/1 Running 0 46s
Next, let's delete the /usr/share/nginx/html/index.html
file from the pod and observe its events. Once again, make sure to substitute the pod name from kubectl get pods
on your cluster:
$ kubectl exec -it nginx-847c64cc7c-664lf -- rm /usr/share/nginx/html/index.html
$ kubectl get event --namespace probe-demo --field-selector involvedObject.name=nginx-847c64cc7c-664lf
LAST SEEN TYPE REASON OBJECT MESSAGE
3m21s Normal Scheduled pod/nginx-847c64cc7c-664lf Successfully assigned probe-demo/nginx-847c64cc7c-664lf to proble-demo-control-plane
0s Normal Pulling pod/nginx-847c64cc7c-664lf Pulling image "nginx"
3m18s Normal Pulled pod/nginx-847c64cc7c-664lf Successfully pulled image "nginx" in 2.19317793s
3m18s Normal Created pod/nginx-847c64cc7c-664lf Created container nginx
3m18s Normal Started pod/nginx-847c64cc7c-664lf Started container nginx
1s Warning Unhealthy pod/nginx-847c64cc7c-664lf Liveness probe failed: ls: cannot access '/usr/share/nginx/html/index.html': No such file or directory
1s Normal Killing pod/nginx-847c64cc7c-664lf Container nginx failed liveness probe, will be restarted
In the above output, we can see that, Kubernetes has marked the pod as unhealthy and restated it. We can see these details in the REASON
and MESSAGE
columns respectively.
Finally, let's verify that the pod has been restarted:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-847c64cc7c-664lf 1/1 Running 1 (42s ago) 4m2s
nginx-847c64cc7c-jb2bq 1/1 Running 0 4m6s
In the above output, the RESTARTS
column indicates that the pod was restarted 42 seconds ago.
Defining a TCP liveness probe
Similar to the command probes, we can use the TCP socket probe to check the health of the application. As the name suggests, this probe attempts to establish a TCP connection with the container at a specified port. The probe is considered successful if the connection gets established successfully.
Currently, the NGINX server is running on port 80. To simulate an error, let's try to connect to port number 8080 using the following TCP probe:
livenessProbe:
tcpSocket:
port: 8080
After adding this probe configuration, the deployment descriptor looks like this:
tcp-liveness.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
tcpSocket:
port: 8080
Now, let's deploy this updated configuration and check the events of the pod:
$ kubectl apply -f tcp-liveness.yaml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5bb9d87b58-cwpzf 1/1 Running 0 28s
nginx-5bb9d87b58-rfnlh 1/1 Running 0 21s
$ kubectl get event --namespace probe-demo --field-selector involvedObject.name=nginx-5bb9d87b58-cwpzf
LAST SEEN TYPE REASON OBJECT MESSAGE
33s Normal Scheduled pod/nginx-5bb9d87b58-cwpzf Successfully assigned probe-demo/nginx-5bb9d87b58-cwpzf to proble-demo-control-plane
3s Normal Pulling pod/nginx-5bb9d87b58-cwpzf Pulling image "nginx"
27s Normal Pulled pod/nginx-5bb9d87b58-cwpzf Successfully pulled image "nginx" in 5.719997947s
1s Normal Created pod/nginx-5bb9d87b58-cwpzf Created container nginx
0s Normal Started pod/nginx-5bb9d87b58-cwpzf Started container nginx
3s Warning Unhealthy pod/nginx-5bb9d87b58-cwpzf Liveness probe failed: dial tcp 10.244.0.7:8080: connect: connection refused
3s Normal Killing pod/nginx-5bb9d87b58-cwpzf Container nginx failed liveness probe, will be restarted
1s Normal Pulled pod/nginx-5bb9d87b58-cwpzf Successfully pulled image "nginx" in 1.928558648s
In the above output, we can see that the liveness probe failed because the connection was refused on port 8080. To fix this issue, we can correct the liveness probe to use port 80, where the server is listening.
Defining a liveness HTTP request
Many web applications expose an HTTP endpoint to report the health of the application. For example, in the Spring Boot framework Actuator we can use the actuator/health
endpoint to check the status of the application. So let's see how to configure an HTTP endpoint in a liveness probe next.
By default, the NGINX server renders the welcome page at a base URL. To simulate an error, let's try to hit a non-existing HTTP endpoint using the following probe:
livenessProbe:
httpGet:
path: /non-existing-endpoint
port: 80
After adding the probe configuration, the complete deployment descriptor looks like this: http-liveness.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /non-existing-endpoint
port: 80
Now, let's deploy this configuration and check the events of the pod. Once again, make sure to use the pod name from your cluster rather than the example name below:
$ kubectl apply -f http-liveness.yaml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-7c459d7c6c-tlqp5 1/1 Running 0 16s
nginx-7c459d7c6c-vmlq5 1/1 Running 0 11s
$ kubectl get event --namespace probe-demo --field-selector involvedObject.name=nginx-7c459d7c6c-tlqp5
LAST SEEN TYPE REASON OBJECT MESSAGE
30s Normal Scheduled pod/nginx-7c459d7c6c-tlqp5 Successfully assigned probe-demo/nginx-7c459d7c6c-tlqp5 to proble-demo-control-plane
30s Normal Pulling pod/nginx-7c459d7c6c-tlqp5 Pulling image "nginx"
27s Normal Pulled pod/nginx-7c459d7c6c-tlqp5 Successfully pulled image "nginx" in 3.58879558s
27s Normal Created pod/nginx-7c459d7c6c-tlqp5 Created container nginx
27s Normal Started pod/nginx-7c459d7c6c-tlqp5 Started container nginx
1s Warning Unhealthy pod/nginx-7c459d7c6c-tlqp5 Liveness probe failed: HTTP probe failed with statuscode: 404
1s Normal Killing pod/nginx-7c459d7c6c-tlqp5 Container nginx failed liveness probe, will be restarted
Here, we can see that the liveness probe failed as expected with the HTTP status code 404. To fix this issue, we can use the valid HTTP endpoint (such as /
) with the liveness probe.
It is worth noting that the liveness probe is not a solution to all problems. It plays a valuable role only if your application can afford the restart of the affected pod(s), and the restart can solve the application's intermittent issues. It will not fix configuration errors or bugs in your application code.
Readiness probe
In the previous section, we saw how the liveness probe allows us to implement a self-healing system in certain situations. However, from practical experience, we know that in most cases having only a liveness probe is not sufficient.
The liveness probe is able to restart unhealthy containers. However, in some rare cases, the container may not be in a healthy state in the first place, and restarting it will not help. One example of such a scenario is when we try to deploy a new version of the application that is not healthy. Let's understand this with an example.
Rectifying the setup
In the previous section, we deployed an unhealthy pod to illustrate the failure in the HTTP liveness probe. Now, let's modify it to use the valid HTTP endpoint. Now, the modified deployment descriptor looks like this:
http-liveness.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
Now, let's deploy this configuration and verify that the pods are in a healthy state:
$ kubectl apply -f http-liveness.yaml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5bb954fdcb-k42tg 1/1 Running 0 104m
nginx-5bb954fdcb-wsfjm 1/1 Running 0 104m
Breaking the liveness probe
Previously, we saw that the liveness probe plays an important role when the deployed application is healthy but at a later point, it becomes unhealthy. However, the liveness probe won't be able to do much if the application is unhealthy in the first place.
To simulate the unhealthy application scenario, let's configure the postStart hook that deletes the /usr/share/nginx/html/index.html
file:
breaking-liveness.yml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
lifecycle:
postStart:
exec:
command: ["/bin/bash", "-c", "rm -f /usr/share/nginx/html/index.html"]
For more information on Kubernetes lifecycle hooks, please see the official documentation. As a summary, as soon as the container defined in the deployment starts, the postStart
hook will execute the defined command as part of standing up the pod.
Now, let's deploy this configuration and observe the behavior of the newly deployed pods:
$ kubectl apply -f breaking-liveness.yml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-76fb56d59f-knsbz 1/1 Running 4 (3s ago) 2m4s
nginx-76fb56d59f-kx24d 1/1 Running 4 (6s ago) 2m6s
As we can see, now the pods are getting restarted continuously. Such a scenario can cause production downtime. In the next section, we will discuss how to avoid such undesirable behaviors.
Before moving to the next section, let's revert the setup by deploying the configuration from the http-liveness.yaml
file from earlier:
$ kubectl apply -f http-liveness.yaml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5bb954fdcb-2fgm2 1/1 Running 0 4m23s
nginx-5bb954fdcb-xmhlr 1/1 Running 0 4m26s
Defining the HTTP readiness probe
In the previous example, we saw how an unhealthy application can cause production downtime. We can mitigate such failures by configuring a readiness probe. The syntax of the readiness probe is similar to the liveness probes:
readinessProbe:
httpGet:
path: /
port: 80
Now, let's understand the behavior of the readiness probe with an example.
First, add the readiness probe configuration to handle the unhealthy deployment scenarios:
http-readiness.yml :
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
lifecycle:
postStart:
exec:
command: ["/bin/bash", "-c", "rm -f /usr/share/nginx/html/index.html"]
readinessProbe:
httpGet:
path: /
port: 80
Next, let's deploy this configuration and observe the status of the newly created pod:
$ kubectl apply -f http-readiness.yml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5bb954fdcb-2fgm2 1/1 Running 0 11m
nginx-5bb954fdcb-xmhlr 1/1 Running 0 11m
nginx-dbbd95c97-mgklw 0/1 Running 6 (86s ago) 5m6s
In the above output, we can see that now there are three pods. But the most important thing is the status of the READY
column.
For the last pod, we can see that the READY
column shows 0/1
. This indicates that the 0
out of 1
pods are ready to receive the incoming traffic. Due to this reason, the new pod is considered unhealthy. Hence Kubernetes doesn't delete the older pods. In this way, we can use a combination of liveness and readiness probes to ensure that only healthy containers serve the incoming requests.
Lastly, we can remove the erroneous postStart
section in the deployment to make the deployment healthy.
In this section, we illustrated the use of the readiness probe using the HTTP probe alone. However, we can also use the command and TCP probe methods to configure the readiness probe. Their syntax is similar to the corresponding liveness probes.
Startup probe
Kubernetes also provides the startup probe. This probe, however, is not as well-known as the other two. It's mainly used with an application that takes time to start up. When the startup probe is configured, it disables the other two probes until this probe succeeds. This prevents errors or alerts from liveness or other probes from triggering unnecessarily.
The syntax of the startup probe is similar to the other probes:
startupProbe:
httpGet:
path: /
port: 80
To understand its usage, let's create an unhealthy deployment with the startup probe. The following deployment removes the index.html
default NGINX page but also defines a readiness probe that would produce an error as the page it tries to access is not available:
http-startup.yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: probe-demo
spec:
selector:
matchLabels:
app: nginx
replicas: 2
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
livenessProbe:
httpGet:
path: /
port: 80
lifecycle:
postStart:
exec:
command: ["/bin/bash", "-c", "rm -f /usr/share/nginx/html/index.html"]
readinessProbe:
httpGet:
path: /
port: 80
startupProbe:
httpGet:
path: /
port: 80
Now, let's deploy this configuration and verify that the startup probe disables the other two probes:
$ kubectl apply -f http-startup.yaml
deployment.apps/nginx configured
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5bb954fdcb-2fgm2 1/1 Running 0 35m
nginx-5bb954fdcb-xmhlr 1/1 Running 0 35m
nginx-5f4576574c-dqhfp 0/1 Running 0 24s
$ kubectl get event --namespace probe-demo --field-selector involvedObject.name=nginx-5f4576574c-dqhfp
LAST SEEN TYPE REASON OBJECT MESSAGE
95s Normal Scheduled pod/nginx-5f4576574c-dqhfp Successfully assigned probe-demo/nginx-5f4576574c-dqhfp to proble-demo-control-plane
5s Normal Pulling pod/nginx-5f4576574c-dqhfp Pulling image "nginx"
93s Normal Pulled pod/nginx-5f4576574c-dqhfp Successfully pulled image "nginx" in 2.046922256s
33s Normal Created pod/nginx-5f4576574c-dqhfp Created container nginx
33s Normal Started pod/nginx-5f4576574c-dqhfp Started container nginx
5s Warning Unhealthy pod/nginx-5f4576574c-dqhfp Startup probe failed: HTTP probe failed with statuscode: 403
5s Normal Killing pod/nginx-5f4576574c-dqhfp Container nginx failed startup probe, will be restarted
In the above output, we can see that the pod was marked as unhealthy since the startup probe failed. To make the setup functional again, we can remove the postStart
section.
Just like the liveness probes, we can also use the command and TCP probe methods to configure the startup probes.
Advanced Probe Configuration
So far, we've utilized the probes in their default configuration. In reality, as a solution for a probe failure, the default practice of restarting a pod was stated. However, we can override this based on the needs of the application. Each configuration parameter is described in depth in the table below:
Parameter | Description | Default Value | Minimum Value |
---|---|---|---|
initialDelaySeconds | The time duration after the container has started but before any probes are initiated | 0 | 0 |
periodSeconds | The frequency of the probe | 10 | 1 |
timeoutSeconds | The timeout value for the probe responses | 1 | 1 |
successThreshold | The minimum number of consecutive success responses required to mark the probe status as a success | 1 | 1 |
failureThreshold | The minimum number of consecutive failed responses required to mark the probe status as failed | 3 | 1 |
Summary
Throughout this tutorial, we went over how to configure probes in Kubernetes. First, we discussed the limitation of the default process health check, then we discussed the different types of probes. After providing practical examples of liveness, readiness, and startup probes, we were able to look at advanced probe configuration.
For more information on Kubernetes probes, check out these resources: