Distributed tracing is crucial for tracking requests as they traverse throughout your entire application. For developers and operators managing complex systems, especially those orchestrated with Kubernetes, it is an indispensable tool.
Configuring distributed tracing allows you to understand and diagnose every functional part of your application through external outputs (traces). This enhances your workflow by improving debugging and troubleshooting capabilities, enabling a faster understanding of component interactions and error pinpointing. Decision-making becomes more informed with clear insights into your application's performance, guiding optimizations and proactive management to prevent issues before they impact end-users, ultimately reducing application downtime.
This tutorial guides you through setting up end-to-end distributed tracing in Kubernetes using Grafana Tempo and Civo Object Store, demonstrated with a Django application instrumented with OpenTelemetry.
An Introduction to Distributed Tracing
What is Distributed Tracing?
Distributed tracing is an essential methodology in modern application development, particularly within Kubernetes-based microservice architectures. It helps to address several critical challenges, such as:
Aspect | Description |
---|---|
Handling Complexity in Microservices | Applications deployed in Kubernetes environments often adopt a microservices architecture. Distributed tracing, integrated with visualization tools, demystifies request flows, aiding in managing application complexity. |
Optimizing Performance | Distributed tracing implementation visualizes request pathways, aiding in identifying bottlenecks and understanding service call latencies. Developers gain insights to pinpoint delays, decipher causes, and implement targeted performance enhancements. |
Service Interaction Analysis | Distributed tracing is crucial for deciphering service interactions within a system, allowing precise updates and debugging of underperforming services by understanding how they interact with each other. |
Facilitating Error Diagnosis and Troubleshooting | Distributed tracing offers detailed views or insights into internal application mechanisms, streamlining error diagnosis and troubleshooting. Tracing issues to specific services or requests significantly reduces time and effort in effective debugging. |
Why Grafana Tempo and OpenTelemetry?
Employing Grafana Tempo and OpenTelemetry in Kubernetes environments isn’t a matter of convenience but a strategic choice grounded in technical superiority and adaptability. Here’s why:
Feature | Description |
---|---|
High-Volume Trace Data Management | Grafana Tempo efficiently handles large trace data volumes with optimized architecture for high throughput and low-latency processing, making it ideal for data-intensive applications. |
Unified Telemetry Framework | OpenTelemetry is a unified solution for telemetry data, integrating traces, metrics, and logs into a single platform. This simplifies observability infrastructure by eliminating the need for multiple tools. |
Seamless Data Processing Pipeline | The Grafana Tempo and OpenTelemetry combination forms a seamless data pipeline. OpenTelemetry, with its optional Collector, not only gathers but also preprocesses telemetry data, efficiently feeding it into Grafana Tempo for advanced tracing and storage. |
Ecosystem Compatibility and Integration | Grafana Tempo and OpenTelemetry seamlessly integrate into existing systems. Tempo complements Grafana, and OpenTelemetry is highly compatible with various languages and frameworks, making them versatile choices for diverse application stacks. They are both open-source. |
Prerequisites
To follow along in this tutorial, you should meet the following requirements:
- A Civo account
- Kubectl installed
- A Civo Kubernetes cluster provisioned with a PostgreSQL database installed
- Object store created
- Docker installed on your machine and a DockerHub account with a repository already set up - this tutorial uses a repository called
django-optl
- Helm installed locally
Please note that this tutorial uses a Linux OS with an Ubuntu 22.04 (Jammy Jellyfish) with amd64 architecture.
Installing and configuring Grafana Tempo
Once you have successfully set up your Kubernetes environment, we will proceed to install and configure Grafana Tempo in our cluster. This way, we'll have a specific endpoint ready for the OpenTelemetry collector to send traces to.
Grafana Tempo is the tracing backend we will be using in this tutorial. It is built for handling large-scale distributed tracing with few external dependencies and supports multiple storage options. For the purpose of this tutorial, we will configure Grafana Tempo to use the Civo Object Store, which is S3-compatible as its storage backend.
Step 1: Add the Grafana Helm Repository
After creating the Civo Object Store, we must add the Grafana Helm repository to our Helm setup. This repository contains the necessary charts to install Tempo and other Grafana tools.
Execute the commands below to add the Grafana Helm chart repository and then update your local Helm chart repository list to ensure you have the latest chart information:
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
You should see the following output:
"grafana" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "grafana" chart repository
Update Complete. ⎈Happy Helming!⎈
Step 2: Configuring Tempo
With the repository added, we can now configure Grafana Tempo.
On your machine, create a file called tempo.yaml
and add the following configuration settings:
# tempo.yaml
distributor:
receivers:
otlp:
protocols:
grpc:
ingester:
trace_idle_period: 10s
max_block_bytes: 1_000_000
max_block_duration: 1m
compactor:
compaction:
compaction_window: 1h
max_compaction_objects: 1000000
block_retention: 1h
compacted_block_retention: 10m
flush_size_bytes: 5242880
storage:
trace:
backend: s3
s3:
access_key: your-civo-objectstore-access-key
secret_key: your-civo-objectstore-secret-key
endpoint: your-civo-objectstore-endpoint
bucket: tempo # Replace this with the actual name of your civo object store
insecure: true
Here’s what the configuration settings above are doing:
distributor
: Manages the distribution of trace data across Tempo's services. It's essential for handling incoming data efficiently.OTLP
: Configures the OpenTelemetry Protocol receiver, crucial for Tempo to receive trace data from instrumented applications.ingester
: Processes incoming trace data and compiles it into blocks. Key settings liketrace_idle_period
andmax_block_bytes
control how data is aggregated and stored.compactor
: Improves storage efficiency by consolidating trace data blocks. Settings such ascompaction_window
andmax_compaction_objects
are important for optimizing data storage and retrieval.storage
: Defines where and how trace data is stored. The configuration specifies using S3-compatible storage, with key details likebucket
andendpoint
indicating where the data is stored.
In production environments, it's recommended to handle access credentials securely. So, when setting up S3-compatible storage backends, avoid hardcoding credentials such as access keys and secret access keys. Instead, use environment variables to inject these credentials at runtime or store them using Kubernetes secret objects for a more secure approach.
Step 3: Installing Grafana Tempo
After configuring Tempo, we can now go ahead to install it using the configuration file we created in the previous step:
helm install grafana-tempo grafana/tempo -f tempo.yaml
Once tempo is installed, you should see the following outputted:
NAME: grafana-tempo
LAST DEPLOYED: Sun Nov 5 13:29:53 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
To confirm that Grafana Tempo is up and running in a ready state, have a Kubernetes pod and service use the following kubectl
commands:
kubectl get pods
kubectl get service
You should have the following output:
kubectl get pods
NAME READY STATUS RESTARTS AGE
grafana-tempo-0 1/1 Running 0 5m6s
kubectl get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.43.0.1 443/TCP 18d
grafana-tempo ClusterIP 10.43.68.96 3100/TCP,6831/UDP,6832/UDP,14268/TCP,14250/TCP,9411/TCP,55680/TCP,55681/TCP,4317/TCP,4318/TCP,55678/TCP 5m14s
From the output above, the Grafana Tempo service has a couple of ports exposed, in this tutorial, we will be focusing on the following ports:
3100/TCP
: This is the default port used by Grafana Tempo for its gRPC endpoint. It is used for receiving trace data from clients, or services instrumented to send traces using gRPC.
4317/TCP
: This is designated for the OpenTelemetry Collector's gRPC receiver. It is the standard port for receiving trace data sent over gRPC following the OpenTelemetry protocol (OTLP).
4318/TCP
: Similar to port 4317
, is used for the OpenTelemetry Collector's HTTP receiver. It accepts trace data sent over HTTP using the OTLP format. This provides an alternative to gRPC for environments where HTTP is preferred or required to submit trace data.
Installing and Configuring OpenTelemetry Collector
To deploy the OpenTelemetry collector in our Kubernetes cluster, we will use a pre-made Helm chart provided by OpenTelemetry.
Step 1: Configuring the OpenTelemetry Collector
The OpenTelemetry collector needs to be configured to forward traces to Grafana Tempo. This involves setting up the otlp endpoint
section in our configuration file to point to our Grafana Tempo instance.
Create a file called collector.yaml
and paste it into the configuration settings:
mode: "deployment"
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch:
timeout: 10s
send_batch_size: 1024
exporters:
logging:
loglevel: debug
otlp:
endpoint: grafana-tempo:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [debug, otlp]
resources:
limits:
cpu: 250m
memory: 512Mi
Here's what the configuration above does:
mode
: Sets the Collector's mode to "deployment" for scalability and centralized data collection. This mode is one of the options like daemonset or Statefulset, depending on the use case.receivers
: Configures the OTLP receiver to listen for telemetry data on ports4317
(gRPC) and4318
(HTTP), enabling the collection of trace data over different protocols.processors
: Includes a batch processor to aggregate traces into batches, optimizing data processing with settings for timeout and batch size.exporters
: Defines the exporters used, including a debug exporter for logging and an OTLP exporter to forward data to Grafana Tempo. The OTLP exporter uses insecure TLS for simplicity in a development setup.service
: Establishes a pipeline for trace data, specifying how data is received, processed, and exported. It uses the configured OTLP receiver, batch processor, and both debug and OTLP exporters.resources
: Sets resource limits for the Collector's CPU and memory usage in a Kubernetes environment, ensuring efficient resource utilization and preventing excessive consumption.
Step 2: Installing the OpenTelemetry Collector
Once OpenTelemetry collector is configured, follow these steps to install it on your Kubernetes cluster.
Add the OpenTelemetry Helm repository using the following commands:
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
Install the OpenTelemetry collector using the following command:
helm install opentelemetry-collector open-telemetry/opentelemetry-collector -f collector.yaml
You should have something similar output as below when installed:
NAME: opentelemetry-collector
LAST DEPLOYED: Sun Nov 5 13:33:30 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Confirm that the OpenTelemetry collector is running as a pod and as a service using the following commands:
kubectl get pods
kubectl get services
kubectl get pods
NAME READY STATUS RESTARTS AGE
...
opentelemetry-collector-65676955c7-pxx57 1/1 Running 0 25s
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
opentelemetry-collector ClusterIP 10.43.185.170 <none> 6831/UDP,14250/TCP,14268/TCP,4317/TCP,4318/TCP,9411/TCP 32s
Setting Up the Database
Now that we have Grafana Tempo and OpenTelemetry Collector setup, it’s time to set up our Django application. However, before we set up the Django application, we must first set up the Postgres database.
Step 1: Connecting to the Database
First, we need to connect to the Postgres database to create a role and a database.
Head over to your Kubernetes cluster from your Civo dashboard. Click on the Installed Apps then click on Postgres.
Copy your Admin username (shown above) and run the following commands sequentially to connect to the PostgreSQL database:
kubectl exec -it <postgres-pod-name> -- bash
psql -U <username> -d postgres
This will open an interactive bash shell within the PostgreSQL pod specified by <postgres-pod-name>
. Once inside the pod, the second command, psql -U <username> -d postgres
, connects to the PostgreSQL server using the specified username (<username>) and connects to the default "postgres" database (-d postgres).
kubectl exec -it postgresql-5546959f6d-b67fp -- bash
I have no name!@postgresql-5546959f6d-b67fp:/$
Step 2: Creating a User, Role and Database
After connecting to the Postgres database, we need to create a role and then create a database with the role as the owner. We will use this credential in our Django application to connect to and interact with the Postgres database.
Create a new role using the following command:
CREATE USER django WITH PASSWORD '1234';
Next, create a new database with the role as owner with the following command:
CREATE DATABASE notes OWNER django;
At this point, we should have the following output:
kubectl exec -it postgresql-5546959f6d-b67fp -- bash
I have no name!@postgresql-5546959f6d-b67fp:/$ psql -U 4gwvo6gFD3 -d postgres
psql (11.5 (Debian 11.5-3.pgdg90+1))
Type "help" for help.
postgres=# CREATE USER django WITH PASSWORD '1234';
CREATE ROLE
postgres=# CREATE DATABASE notes OWNER django;
CREATE DATABASE
Now grant all privileges on the database to the new role so it has the necessary permissions to operate:
GRANT ALL PRIVILEGES ON DATABASE notes TO django;
Once you have executed the above commands and set up the database and role, you can exit the PostgreSQL shell by typing:
# Exit the PostgreSQL command-line interface.
\q
# Exit the current shell session of the PostgreSQL container.
exit
..
postgres=# GRANT ALL PRIVILEGES ON DATABASE notes TO django;
GRANT
postgres=# \q
I have no name!@postgresql-5546959f6d-b67fp:/$ exit
exit
Setting up the Django Project
In this section, we will set up our Django project, which consists of a notes application named notes_app
. This Django app is already instrumented with OpenTelemetry, and the entire project is configured to send trace data to the OpenTelemetry collector instance in our Kubernetes cluster over gRPC using OTLP (OpenTelemetry Protocol).
Step 1: Setting up the Django Project for Kubernetes
To begin,fork and clone this GitHub repository which is housing the Django project.
The project achieves trace data capturing through OpenTelemetry middleware wrapped around the WSGI application and records data for incoming HTTP requests.
Additionally, a custom LoggingSpanExporter
is used to log the success or failure of span exports in order to provide visibility into the trace export process. The trace data includes information such as the service name, which is set to django-notes-app
, so that traces are correlated with the correct service in observability tools.
Once you have forked and cloned the GitHub repository, open it up with your default code editor, create a .env
file at the root of the project, and populate it with the following:
DB_NAME=notes
DB_USER=django
DB_PASSWORD=1234
DB_HOST=postgresql #The name of the Postgres service in the cluster
DB_PORT=5432
This will set up environment variables with credentials to access the Postgres database in the Kubernetes cluster.
Next, build the docker image for the application and push it to your DockerHub repository using the following command:
docker build -t <your-dockerhub-username>/django-optl:latest .
docker push <your-dockerhub-username>/django-optl:latest
Step 2: Deploying the Django Project to Kubernetes
Exit out of the project directory completely, open up your command prompt, create a file called django.yaml
, and paste in the following configuration settings:
apiVersion: apps/v1
kind: Deployment
metadata:
name: django-deployment
labels:
app: django-app
spec:
replicas: 1
selector:
matchLabels:
app: django-app
template:
metadata:
labels:
app: django-app
spec:
containers:
- name: django-app
image: <your-dockerhub-username>/django-optl:latest
ports:
- containerPort: 8000
env:
- name: DB_NAME
value: "notes"
- name: DB_USER
value: "django"
- name: DB_PASSWORD
value: "1234"
- name: DB_HOST
value: "postgresql" # The name of the Postgresql service
- name: DB_PORT
value: "5432"
The configuration settings above does the following:
- Creates a deployment called
django-deployment
with one replica (pod). - Sets up the container within the pod, named
django-app
, which will run the Docker image<your-dockerhub-username>/django-optl:latest
. - Exposes port
8000
on the container, which is the port the Django application will use to serve HTTP traffic. - Configures environment variables for the container to connect to a PostgreSQL database, including the database name (
DB_NAME
), user (DB_USER
), password (DB_PASSWORD
), host (DB_HOST
), and port (DB_PORT
). TheDB_HOST
is set topostgresql
, which is the service name for our PostgreSQL deployment within our Kubernetes cluster.
Apply these configuration settings in the cluster using the command:
kubectl apply -f django.yaml
Expose the deployment over a service using the following command:
kubectl expose deploy django-deployment --port 8000
Now run the commands to view the pod and service associated with the Django project:
kubectl get pods
kubectl get services
At this point you should have the following output:
kubectl get pods
NAME READY STATUS RESTARTS AGE
...
django-deployment-6c4c7d4bcf-t4ccx 1/1 Running 0 45s
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
django-deployment ClusterIP 10.43.209.170 <none> 8000/TCP 45s
Step 3: Applying Migrations
Now that our Django project is deployed into our cluster, the next thing to do is apply the database migrations to set up the necessary tables and relationships in the Postgres database. This can be done by executing the Django management commands within the context of the Kubernetes deployment.
First, we need to identify the pod where our Django application is running. Use the following command to get the list of running pods:
kubectl get pods
Look for the pod that has the name django-deployment
followed by a unique identifier. Once you have identified the correct pod, execute the following command to create new migrations based on the models present in the Django notes_app
application:
kubectl exec <django-deployment-unique-identifier> -- python manage.py makemigrations
You should have the following output:
Migrations for 'notes_app':
notes_app/migrations/0001_initial.py
- Create model Note
Now apply the migrations using the following command:
kubectl exec <django-deployment-unique-identifier> -- python manage.py migrate
You should have the following output:
Operations to perform:
Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying admin.0002_logentry_remove_auto_add... OK
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying contenttypes.0002_remove_content_type_name... OK
...
Step 4: Viewing the Django Project over the web
At this point, we have successfully deployed our Django project and configured it to interact with our Postgres database. Now we need to view the project over the web so we can interact with it so some traces can be sent to the OpenTelemetry collector and then to Grafana tempo.
Execute the following command to expose the Django application service to your local environment for access via localhost:8000
:
kubectl port-forward svc/django-deployment 8000
Now you can access the Django project by entering the following URL into your web browser:
localhost:8000
Go-ahead and interact with the application by creating a note or more. Once you have successfully added a note, you should see the following output:
Installing and setting up Grafana
Up until now, we have successfully deployed our Django project and have been able to interact with it. Now, it's time to visualize these traces with Grafana UI.
To install Grafana on our Kubernetes cluster, we will use Helm. Execute the following command to install the Grafana helm chart:
helm install grafana grafana/grafana
helm repo update
Confirm that Grafana is up and running and is exposed as a service using the following command:
kubectl get pods
kubectl get services
You should have the following output:
kubectl get pods
NAME READY STATUS RESTARTS AGE
...
grafana-6c9ff96d9d-jm6sc 1/1 Running 0 71s
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
grafana ClusterIP 10.43.80.98 <none> 80/TCP 81s
Now execute the following command to retrieve your Grafana password:
kubectl get secret --namespace default grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
Once retrieved, execute the command below to expose the Grafana service to your local environment for access via localhost:3000:80
:
kubectl port-forward svc/grafana 3000:80
Head over to the localhost:3000
and log in to Grafana UI as admin
and the password you retrieved earlier on. Once logged in successfully, you should see the following:
Viewing Traces in Grafana
Now that Grafana is up and running, it's time to explore the traces collected from our Django application. Follow these steps to visualize the telemetry data:
Step 1: Add Tempo as a Data Source
Before viewing traces, ensure that Grafana Tempo is configured as a data source:
Step 1: Select the “Data Sources” box from the Grafana dashboard.
Step 2: Search for Tempo and choose Tempo from the list of available data sources:
Step 3: Enter the details for your Tempo instance grafana-tempo:3100
Step 4: Click 'Save & Test' to ensure Grafana can connect to Tempo. You should have this pop-up if the connection is successful:
Step 2: Explore Traces
To explore traces:
Step 1: Click on the toggle menu on the left panel to open the “Explore” section.
Step 2: Click on the “Explore” option, and select the Tempo data source you just added. You should see the traces:
Step 3: You can search for traces by Trace ID, or you can use the built-in query features to filter and find traces. Like this:
Step 4: Select a trace to view detailed information, including spans and operations.
Once you have a trace open, you can now examine the spans to understand the request flow and latency, use the metadata provided to identify any issues or bottlenecks, and also view logs related to traces and application metrics if you have configured Loki or Prometheus as additional data sources in Grafana.
Troubleshooting
While following the steps of this tutorial, you may encounter various challenges. Below are some common issues along with their potential solutions to help you navigate and resolve these hurdles effectively:
- Grafana Tempo not receiving trace data: Check that the OpenTelemetry Collector's configuration points to the correct Tempo endpoint (grafana-tempo:4317 for gRPC or grafana-tempo:4318 for HTTP).
- OpenTelemetry Collector not receiving trace data: Verify that your Django application is correctly sending traces to your collector's endpoint in Kubernetes by examining the application and collector logs
kubectl logs <pod-name>
for any errors or misconfigurations in the trace export process. - Grafana Tempo query errors: Ensure Grafana is operational and Tempo is correctly configured as a data source with the proper endpoint.
- Django not connecting to Postgres database: Confirm that
DB_NAME
,DB_USER
,DB_PASSWORD
,DB_HOST
,DB_PORT
are correctly set in your Django's Kubernetes deployment and align with your Postgres database settings.
Summary
In this tutorial, you have learned how to configure end-to-end distributed tracing with OpenTelemetry, Grafana Tempo, and Grafana for visualization. Using a pre-instrumented Django application, we have been able to configure an OpenTelemetry collector in our Civo Kubernetes cluster and have configured Grafana Tempo to receive traces from the OpenTelemetry collector using the Civo object store as our storage backend. Additionally, we went further to set up our Django project to generate some traces and have been able to view them via Grafana UI.
With the steps outlined in this tutorial, you now possess the capability to monitor and troubleshoot your applications in Kubernetes more effectively by leveraging the power of distributed tracing.
Further resources
If you want to continue learning about this topic, check out some of these resources:
- OpenTelemetry Official Docs
- Grafana Tempo Official Docs
- Django Official Docs
- Henrik Rexed Navigate Europe 2023 talk on The Sound of Code: Instrument with OpenTelemetry