Monitoring is the process of collecting, analyzing, and visualizing system metrics to ensure your infrastructure is running optimally. It helps detect performance issues, track resource usage, and identify potential failures before they impact your applications.

Depending on the order you like to follow, monitoring is probably a high priority on your checklist when you create a new virtual machine, the ability to quickly visualize what’s happening on your servers is crucial.

In this tutorial, we will be using Prometheus, an open-source systems monitoring and alerting tool, to monitor a virtual machine. We will visualize the metrics created using Grafana, an open-source dashboarding solution, and use Caddy for TLS termination.

Prerequisites

This tutorial assumes some familiarity with Linux. Additionally, you will need the following installed on your machine:

Creating a Virtual Machine

We’ll begin by creating a virtual machine using the Civo CLI:

civo instance create --hostname=deathstar --size=g3.medium --diskimage=ubuntu-jammy --initialuser=demo

This will spin up a new Ubuntu instance called Deathstar in your default region and create a new user called demo.

Retrieve machine IP

With a machine created, you can now retrieve the public IP using the Civo CLI:

civo instance show deathstar -o json  | jq .public_ip

Retrieve user password

To connect to the virtual machine, we will also need to retrieve the initial password for the demo user:

civo instance show deathstar -o json  | jq .initial_password

At this point, you should be able to log in to your machine:

ssh demo@<your instance ip>

Setting up your server

Before we install Prometheus, we will need to create a couple of directories and a non-root user for the installation.

Creating a non-root user

sudo useradd --no-create-home --shell /bin/false prometheus

Creating configuration directories

sudo mkdir /var/lib/prometheus
sudo mkdir /etc/prometheus

Granting user permissions

sudo chown -R prometheus:prometheus /etc/prometheus /var/lib/prometheus

Setting up firewall rules

To ensure Prometheus isn’t open to the entire web, we must set up a couple of firewall rules. Ubuntu ships with a utility called ufw, allowing us to manage firewall rules easily.

Allow SSH connections:

To prevent being locked out by the firewall, add a rule to allow SSH connections:

sudo ufw allow ssh

The output should be similar to:

Rule added
Rule added (v6)

Allow HTTP and HTTPS connections:

sudo ufw allow http
sudo ufw allow https

Enable the firewall:

sudo ufw enable

The output should be similar to:

Command may disrupt existing ssh connections. Proceed with operation (y|n)? y
Firewall is active and enabled on system startup

Install Prometheus

Download the Prometheus archive:

cd /tmp && wget https://github.com/prometheus/prometheus/releases/download/v2.54.0/prometheus-2.54.0.linux-amd64.tar.gz

Extract the archive:

tar -xzf prometheus-2.54.0.linux-amd64.tar.gz && cd prometheus-2.54.0.linux-amd64/

Copy the Prometheus to /usr/local/bin:

sudo cp prometheus /usr/local/bin/

Copy the consoles and console_libraries:

sudo cp -r console_libraries consoles prometheus.yml /etc/prometheus

The consoles and console_libraries directories contain prebuilt HTML templates for Prometheus' web-based expression browser and dashboards. Copying them ensures that Prometheus can render its built-in UI correctly.

Installing Node Exporter

Prometheus provides a custom exporter that provides relevant host metrics such as CPU usage, disk utilization, and even network file server monitoring if you have it set up.

Grab the node exporter binary

cd /tmp && wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz

Extract the tarball

tar xvfz node_exporter-*.*-amd64.tar.gz && cd node_exporter-*.*-amd64

Move the binary

sudo  mv node_exporter /usr/local/bin/

Grant user permissions

sudo chown prometheus:prometheus /usr/local/bin/node_exporter

Creating a systemd Service

Since we are using Ubuntu, we will create a systemd unit file to simplify the management of the Node Exporter service.

Create the unit file:

sudo nano /etc/systemd/system/node-exporter.service

Paste in the following content:

[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target

Repeat the process for Prometheus Server:

sudo nano /etc/systemd/system/prometheus.service

Paste in the following content:

[Unit]
Description=Prometheus Service
After=network.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/prometheus \\
    --config.file /etc/prometheus/prometheus.yml \\
    --storage.tsdb.path /var/lib/prometheus/ \\
    --web.console.templates=/etc/prometheus/consoles \\
    --web.console.libraries=/etc/prometheus/console_libraries
ExecReload=/bin/kill -HUP $MAINPID
Restart=on-failure

[Install]
WantedBy=multi-user.target

Reload systemd and start the service:

sudo systemctl daemon-reload && sudo systemctl start node-exporter && sudo systemctl start prometheus

Confirm node exporter is running:

sudo systemctl status node-exporter

Output is similar to:

● node-exporter.service - Node Exporter
     Loaded: loaded (/etc/systemd/system/node-exporter.service; disabled; vendor preset: enabled)
     Active: active (running) since Fri 2024-08-09 20:48:11 UTC; 52s ago
   Main PID: 32564 (node_exporter)
      Tasks: 5 (limit: 4423)
     Memory: 2.5M
        CPU: 12ms
     CGroup: /system.slice/node-exporter.service
             └─32564 /usr/local/bin/node_exporter

Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=time
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=timex
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=udp_queues
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=uname
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=vmstat
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=watchdog
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=xfs
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.843Z caller=node_exporter.go:118 level=info collector=zfs
Aug 09 20:48:11 deathstar-62d8-3d389f node_exporter[32564]: ts=2024-08-09T20:48:11.844Z caller=tls_config.go:313 level=info msg="Listening on" address=[::>

Test the metrics endpoint:

curl localhost:9100/metrics

Output is similar to:

node_cpu_seconds_total{cpu="0",mode="idle"} 205689.04
node_cpu_seconds_total{cpu="0",mode="iowait"} 53.8
node_cpu_seconds_total{cpu="0",mode="irq"} 0
node_cpu_seconds_total{cpu="0",mode="nice"} 11.91
node_cpu_seconds_total{cpu="1",mode="irq"} 0
node_cpu_seconds_total{cpu="1",mode="nice"} 11.35
node_cpu_seconds_total{cpu="1",mode="softirq"} 4.4
node_cpu_seconds_total{cpu="1",mode="steal"} 6.98
node_cpu_seconds_total{cpu="1",mode="system"} 144.62
node_cpu_seconds_total{cpu="1",mode="user"} 295.75
# HELP node_disk_discard_time_seconds_total This is the total number of seconds spent by all discards.
# TYPE node_disk_discard_time_seconds_total counter
node_disk_discard_time_seconds_total{device="vda"} 0.099
node_disk_discard_time_seconds_total{device="vdb"} 0
# HELP node_disk_discarded_sectors_total The total number of sectors discarded successfully.

Great! We have metrics! But to gain value from them we need some sort of visualization.

Configure Prometheus targets

Targets in Prometheus define what servers or sources Prometheus will scrape metrics from, for our case, this will be the virtual machine we have created.

Create a Prometheus configuration file:

sudo nano /etc/prometheus/prometheus.yml

Create a single Prometheus target. Update the file so it looks something like this:

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["localhost:9090"]

  - job_name: 'node'
    static_configs:
        - targets: ['localhost:9100']

The update config adds a single job named node to the configuration file and targets the node exporter we just deployed.

Installing Grafana

Install the prerequisite packages:

sudo apt-get install -y apt-transport-https software-properties-common wget

Import the GPG key:

sudo mkdir -p /etc/apt/keyrings/
wget -q -O - <https://apt.grafana.com/gpg.key> | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null

Add a repository for stable releases:

echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list

Update the list of available packages:

sudo apt-get update

Install Grafana:

sudo apt-get install grafana -y

Start Grafana:

sudo systemctl start grafana-server

Exposing the Dashboard

To expose Grafana we will use Caddy, a cross-platform web server written in Go as a reverse proxy for the Grafana server we have running.

Add the latest Caddy GPG Key:

curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/gpg.key' | sudo gpg --dearmor -o /usr/share/keyrings/caddy-stable-archive-keyring.gpg

Add the Caddy repository to your APT sources:

curl -1sLf 'https://dl.cloudsmith.io/public/caddy/stable/debian.deb.txt' | sudo tee /etc/apt/sources.list.d/caddy-stable.list

Update the server package index:

sudo apt update

Install Caddy:

sudo apt install caddy

At this point, if you visit http://yourserverip you should see a page like this:

Installing Caddy

This shows Caddy is up and running, let's change this to point to our Grafana server.

Update the Caddy Config

sudo vim /etc/caddy/Caddyfile

Replace the entire contents of the file with this:

:80 {
         reverse_proxy localhost:3000
}

This will create a reverse proxy to the Grafana server.

Reload the caddy service:

sudo systemctl reload caddy

At this point, you should be able to go to http://<yourserverip> and you will be greeted with the Grafana login page:

Grafana Login Page

If you can't access the Grafana login page, here are a few things you can try:

Inspect firewall rules: Ensure ports (e.g., 80, 443) are open (ufw status or iptables -L).

Check logs: View logs for potential errors (journalctl -u grafana-server).

Test direct access: Try accessing Grafana via http://localhost:3000 on the server itself to rule out proxy issues.

Verify ports: sudo netstat -tulnp | grep LISTEN

Ensure Grafana (default: 3000) and Caddy (80, 443) are listening.

By default, your login credentials will be admin for both username and password. Once you authenticate you will be prompted to reset the admin password.

You will then be greeted with the following screen:

Grafana Welcome Screen

Visualizing Metrics

We’ll begin by connecting the Prometheus instance we set up earlier to the dashboard. To do this, click on the sidebar menu, select connections > Add a connection.

Visualizing Metrics on Grafana Welcome Page

Within the search bar in the Add new connection page, search for Prometheus and click on it:

Grafana Connection Page

On the Prometheus page, click on Add new data source:

Prometheus Page for Adding New Data Source

Enter http://localhost:9090 as the URL:

Prometheus Local Host

Scroll to the bottom of the page and hit save and test. Next, we are going to import a prebuilt dashboard to visualize all our metrics.

On the sidebar menu, click on dashboards:

Sidebar menu options

Within the dashboards page, select New > Import:

Creating a new dashboard

On the import page enter the ID 1806 and hit load:

Importing pages in the dashboard

Select your data source and click import:

Importing the dashboard from Grafana

You should be greeted with the following dashboard:

Final Dashboard display

Clean Up (Optional)

Upon completing this tutorial, you may want to clean up the resources you created. To delete the virtual machine created, run the following command ↓

civo instance rm deathstar

Summary

The Prometheus node exporter offers a powerful way to quickly collect metrics from your virtual machines, in this tutorial, we discussed how to monitor a virtual machine using a combination of Prometheus and Grafana. If you are looking to take your monitoring quest furthe,r here are some ideas: