Monitoring is the process of collecting, analyzing, and visualizing system metrics to ensure your infrastructure is running optimally. It helps detect performance issues, track resource usage, and identify potential failures before they impact your applications.
Depending on the order you like to follow, monitoring is probably a high priority on your checklist when you create a new virtual machine, the ability to quickly visualize what’s happening on your servers is crucial.
In this tutorial, we will be using Prometheus, an open-source systems monitoring and alerting tool, to monitor a virtual machine. We will visualize the metrics created using Grafana, an open-source dashboarding solution, and use Caddy for TLS termination.
Prerequisites
This tutorial assumes some familiarity with Linux. Additionally, you will need the following installed on your machine:
Creating a Virtual Machine
We’ll begin by creating a virtual machine using the Civo CLI:
civo instance create --hostname=deathstar --size=g3.medium --diskimage=ubuntu-jammy --initialuser=demo
This will spin up a new Ubuntu instance called Deathstar in your default region and create a new user called demo.
Retrieve machine IP
With a machine created, you can now retrieve the public IP using the Civo CLI:
civo instance show deathstar -o json | jq .public_ip
Retrieve user password
To connect to the virtual machine, we will also need to retrieve the initial password for the demo user:
civo instance show deathstar -o json | jq .initial_password
At this point, you should be able to log in to your machine:
ssh demo@<your instance ip>
Setting up your server
Before we install Prometheus, we will need to create a couple of directories and a non-root user for the installation.
Creating a non-root user
sudo useradd --no-create-home --shell /bin/false prometheus
Creating configuration directories
sudo mkdir /var/lib/prometheus
sudo mkdir /etc/prometheus
Granting user permissions
sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus
Setting up firewall rules
To ensure Prometheus isn’t open to the entire web, we must set up a couple of firewall rules. Ubuntu ships with a utility called ufw
, allowing us to manage firewall rules easily.
Allow SSH connections:
To prevent being locked out by the firewall, add a rule to allow SSH connections:
sudo ufw allow ssh
The output should be similar to:
Rule added
Rule added (v6)
Allow HTTP and HTTPS connections:
sudo ufw allow http
sudo ufw allow https
Enable the firewall:
sudo ufw enable
The output should be similar to:
Command may disrupt existing ssh connections. Proceed with operation (y|n)? y
Firewall is active and enabled on system startup
Install Prometheus
Download the Prometheus archive:
cd /tmp && wget https://github.com/prometheus/prometheus/releases/download/v2.54.0/prometheus-2.54.0.linux-amd64.tar.gz
Extract the archive:
tar -xzf prometheus-2.54.0.linux-amd64.tar.gz && cd prometheus-2.54.0.linux-amd64/
Copy the Prometheus to /usr/local/bin
:
sudo cp prometheus /usr/local/bin/
Copy the consoles
and console_libraries
:
sudo cp -r console_libraries consoles prometheus.yml /etc/prometheus
The consoles and console_libraries directories contain prebuilt HTML templates for Prometheus' web-based expression browser and dashboards. Copying them ensures that Prometheus can render its built-in UI correctly.
Installing Node Exporter
Prometheus provides a custom exporter that provides relevant host metrics such as CPU usage, disk utilization, and even network file server monitoring if you have it set up.
Grab the node exporter binary
cd /tmp && wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
Extract the tarball
tar xvfz node_exporter-*.*-amd64.tar.gz && cd node_exporter-*.*-amd64
Move the binary
sudo mv node_exporter /usr/local/bin/
Grant user permissions
sudo chown prometheus:prometheus /usr/local/bin/node_exporter
Creating a systemd Service
Since we are using Ubuntu, we will create a systemd unit file to simplify the management of the Node Exporter service.
Create the unit file:
sudo nano /etc/systemd/system/node-exporter.service
Paste in the following content:
[Unit]
Description=Node Exporter
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter
[Install]
WantedBy=multi-user.target
Repeat the process for Prometheus Server:
sudo nano /etc/systemd/system/prometheus.service
Paste in the following content:
[Unit]
Description=Prometheus Service
After=network.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \\
--config.file /etc/prometheus/prometheus.yml \\
--storage.tsdb.path /var/lib/prometheus/ \\
--web.console.templates=/etc/prometheus/consoles \\
--web.console.libraries=/etc/prometheus/console_libraries
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure
[Install]
WantedBy=multi-user.target
Reload systemd and start the service:
sudo systemctl daemon-reload && sudo systemctl start node-exporter && sudo systemctl start prometheus
Confirm node exporter is running:
sudo systemctl status node-exporter
Output is similar to:
● node-exporter.service - Node Exporter
Loaded: loaded (/etc/systemd/system/node-exporter.service; disabled; vendor preset: enabled)
Active: active (running) since Fri 2024-08-09 20:48:11 UTC; 52s ago
Main PID: 32564 (node_exporter)
Tasks: 5 (limit: 4423)
Memory: 2.5M
CPU: 12ms
CGroup: /system.slice/node-exporter.service
└─32564 /usr/local/bin/node_exporter
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=time
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=timex
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=udp_queues
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=uname
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=vmstat
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=watchdog
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=xfs
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=zfs
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.844Z caller=tls_config.go:313 level=info msg="Listening on" address=[::>
Test the metrics endpoint:
curl localhost:9100/metrics
Output is similar to:
node_cpu_seconds_total{cpu="0",mode="idle"} 205689.04
node_cpu_seconds_total{cpu="0",mode="iowait"} 53.8
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 11.91
node_cpu_seconds_total{cpu="1",mode="irq"} 0
node_cpu_seconds_total{cpu="1",mode="nice"} 11.35
node_cpu_seconds_total{cpu="1",mode="softirq"} 4.4
node_cpu_seconds_total{cpu="1",mode="steal"} 6.98
node_cpu_seconds_total{cpu="1",mode="system"} 144.62
node_cpu_seconds_total{cpu="1",mode="user"} 295.75
# HELP node_disk_discard_time_seconds_total This is the total number of seconds spent by all discards.
# TYPE node_disk_discard_time_seconds_total counter
node_disk_discard_time_seconds_total{device="vda"} 0.099
node_disk_discard_time_seconds_total{device="vdb"} 0
# HELP node_disk_discarded_sectors_total The total number of sectors discarded successfully.
Great! We have metrics! But to gain value from them we need some sort of visualization.
Configure Prometheus targets
Targets in Prometheus define what servers or sources Prometheus will scrape metrics from, for our case, this will be the virtual machine we have created.
Create a Prometheus configuration file:
sudo nano /etc/prometheus/prometheus.yml
Create a single Prometheus target. Update the file so it looks something like this:
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["localhost:9090"]
- job_name: 'node'
static_configs:
- targets: ['localhost:9100']
The update config adds a single job named node
to the configuration file and targets the node exporter we just deployed.
Installing Grafana
Install the prerequisite packages:
sudo apt-get install -y apt-transport-https software-properties-common wget
Import the GPG key:
sudo mkdir -p /etc/apt/keyrings/
wget -q -O - <https://apt.grafana.com/gpg.key> | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
Add a repository for stable releases:
echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
Update the list of available packages:
sudo apt-get update
Install Grafana:
sudo apt-get install grafana -y
Start Grafana:
sudo systemctl start grafana-server
Exposing the Dashboard
To expose Grafana we will use Caddy, a cross-platform web server written in Go as a reverse proxy for the Grafana server we have running.
Add the latest Caddy GPG Key:
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg
Add the Caddy repository to your APT sources:
curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list
Update the server package index:
sudo apt update
Install Caddy:
sudo apt install caddy
At this point, if you visit http://yourserverip you should see a page like this:
This shows Caddy is up and running, let's change this to point to our Grafana server.
Update the Caddy Config
sudo vim /etc/caddy/Caddyfile
Replace the entire contents of the file with this:
:80 {
reverse_proxy localhost:3000
}
This will create a reverse proxy to the Grafana server.
Reload the caddy service:
sudo systemctl reload caddy
At this point, you should be able to go to http://<yourserverip>
and you will be greeted with the Grafana login page:
If you can't access the Grafana login page, here are a few things you can try:
Inspect firewall rules: Ensure ports (e.g., 80
, 443
) are open (ufw status
or iptables -L
).
Check logs: View logs for potential errors (journalctl -u grafana-server
).
Test direct access: Try accessing Grafana via http://localhost:3000
on the server itself to rule out proxy issues.
Verify ports: sudo netstat -tulnp | grep LISTEN
Ensure Grafana (default: 3000
) and Caddy (80
, 443
) are listening.
By default, your login credentials will be admin
for both username and password. Once you authenticate you will be prompted to reset the admin password.
You will then be greeted with the following screen:
Visualizing Metrics
We’ll begin by connecting the Prometheus instance we set up earlier to the dashboard. To do this, click on the sidebar menu, select connections > Add a connection.
Within the search bar in the Add new connection page, search for Prometheus and click on it:
On the Prometheus page, click on Add new data source:
Enter http://localhost:9090 as the URL:
Scroll to the bottom of the page and hit save and test. Next, we are going to import a prebuilt dashboard to visualize all our metrics.
On the sidebar menu, click on dashboards:
Within the dashboards page, select New > Import:
On the import page enter the ID 1806 and hit load:
Select your data source and click import:
You should be greeted with the following dashboard:
Clean Up (Optional)
Upon completing this tutorial, you may want to clean up the resources you created. To delete the virtual machine created, run the following command ↓
civo instance rm deathstar
Summary
The Prometheus node exporter offers a powerful way to quickly collect metrics from your virtual machines, in this tutorial, we discussed how to monitor a virtual machine using a combination of Prometheus and Grafana. If you are looking to take your monitoring quest furthe,r here are some ideas: