Kubernetes Meets Llama 3.2: How to Deploy AI Models on GPU Clusters

Large Language Models (LLMs) are revolutionizing the way we interact with artificial intelligence, enabling applications such as language translation, text summarization, and chatbots that are more natural and human-like. However, deploying LLMs requires significant computational resources and specialized infrastructure, which can be a barrier to adoption.

Simplifying LLM Deployment with Civo’s LLM Boilerplate

Setting up a GPU-enabled Kubernetes cluster to run LLMs can be complex and time-consuming, especially for those who require seamless integration, data security, and regulatory compliance. To address this challenge, we've created a step-by-step guide to deploying a Kubernetes GPU cluster on Civo using the Civo LLM Boilerplate.

Accelerate Your LLM Deployment with Civo GPUs

Experience high-performance, scalable, cost-effective GPU solutions for your machine learning and AI projects. Our NVIDIA-powered cloud GPUs help you streamline LLM deployments, whether for development or production.

👉 Learn More

What You'll Learn

In this tutorial, you'll learn how to automate the setup of a Kubernetes GPU cluster on Civo Cloud using Terraform and GithubActions, and deploy essential tools such as:

Project Goal

The goal of this project is to enable customers to easily use Open Source LLMs, providing 1:1 compatibility with OpenAI's ChatGPT. Plus:

Access to the latest Open Source LLMs made available from Ollama.
Provide a user interface to allow non-technical users access to models.
Provide a path to produce insights with LLMs while maintaining sovereignty over the data.
Enable LLMs in regulatory use cases where ChatGPT can't be used.

Prerequisites

Before beginning, ensure you have the following:

A Civo Cloud account
A Civo Cloud API Key
Terraform installed on your local machine

Deploying Llama 3.2 on Civo using Terraform

Project Setup

Obtain your Civo API key from the Civo Dashboard.
Create a file named terraform.tfvars in the project's root directory.
Insert your Civo API key into this file as follows:

civo_token = "YOUR_API_KEY"

Project Configuration

Project configurations are managed within the tf/variables.tf file. This file contains definitions and default values for the Terraform variables used in the project.

Variable	Description	Type	Default Value
`cluster_name`	The name of the cluster.	string	"llm_cluster3"
`cluster_node_size`	The GPU node instance to use for the cluster.	string	"g4g.40.kube.small"
`cluster_node_count`	The number of nodes to provision in the cluster.	number	1
`civo_token`	The Civo API token, set in terraform.tfvars.	string	N/A
`region`	The Civo Region to deploy the cluster in.	string	"LON1"

Deployment Configuration

Deployment of components is controlled through boolean variables within the tf/variables.tf file. Set these variables to true to enable the deployment of the corresponding component.

Variable	Description	Type	Default Value
`deploy_ollama`	Deploy the Ollama inference server.	bool	true
`deploy_ollama_ui`	Deploy the Ollama Web UI.	bool	true
`deploy_app`	Deploy the example application.	bool	false
`deploy_nv_device_plugin_ds`	Deploy the Nvidia GPU Device Plugin for enabling GPU support.	bool	true

Deploy LLM Boiler Plate

To deploy, simply run the following commands:

Step 1: Initialize Terraform:

terraform init

This command initializes Terraform, installs the required providers, and prepares the environment for deployment.

initialize your Terraform setup

Step 2: Plan Deployment:

terraform plan

This command displays the deployment plan, showing what resources will be created or modified.

displays the deployment plan,

Step 3: Apply Deployment:

terraform apply

This command applies the deployment plan. Terraform will prompt for confirmation before proceeding with the creation of resources.

creation of resources

Building and deploying the Example Application

Step 1: Build the custom application container:

Enter the application folder:

cd app

Build the docker image:

docker build -t {repo}/{image} .

Push the docker image to a registry:

docker push -t {repo}/{image}

Navigate to the helm chart:

cd ../infra/helm/app

Modify the Helm Values to point to your docker registry, e.g.

replicaCount: 1
image:
    repository: {repo}/{image}
    pullPolicy: Always
    tag: "latest"

service:
    type: ClusterIP
    port: 80

Step 2: Initialize Terraform:

Navigate to the terraform directory:

cd ../tf

terraform init

This command initializes Terraform, installs the required providers, and prepares the environment for deployment.

Step 3: Plan Deployment:

terraform plan

This command displays the deployment plan, showing what resources will be created or modified.

Step 3: Apply Deployment:

terraform apply

This command applies the deployment plan. Terraform will prompt for confirmation before proceeding with the creation of resources.

Deployment takes around 10 minutes to stand up the Civo Kubernetes Cluster, assign a GPU node, deploy the helm charts and GPU configuration before downloading the models and running them on your Nvidia GPU.

deployment

Troubleshooting

If you experience any issues during the deployment (for example, if you experience a timeout), you can reattempt the deployment by rerunning:

terraform apply

Deploy Llama 3.2 through GitHub Actions

For those who prefer a fully automated cloud-based approach, GitHub Actions offers a powerful solution. As a part of GitHub's CI/CD platform, Actions allows you to automate your software workflows, including deployments. This method simplifies the deployment process, ensuring that it is repeatable and error-free, which is particularly beneficial for managing and updating large-scale machine learning models like Llama 3.2 without manual intervention.

First, navigate to the repository: https://github.com/civo-learn/civo-llm-boilerplate, and then use the template to create a new repository.

create a new repository

After doing so, go to the settings of your newly created repository and make sure GitHub Actions are allowed to run.

GitHub Actions permissions

Make a new secret through the settings for the repository called CIVO_TOKEN and set it to your Civo account token.

Now, you can head to the actions tab and run the deployment.

run the deployment

Accessing and Managing Your Deployment

Once you have successfully deployed Llama 3.2 using either Terraform or GitHub Actions, the next step is to verify and utilize the deployment:

Checking the Load Balancers

After deployment, you can check the load balancers attached to your Kubernetes cluster to locate the Open Web UI endpoint. Navigate to the load balancer section in your Civo Dashboard and find the DNS name labeled “ollama-ui-open-webui.”

find the Civo DNS name

Completing the initial open-web-ui setup, which involves registering an initial administrator account and configuring the deployment options, will grant you access to a “ChatGPT-like” interface, where you can interact with the deployed LLM directly.

ChatGPT-like interface

From this window, you can further configure your environment, such as setting your security and access preferences and what access newly registered users can access. In addition, you can make additional users administrators in addition to the first registered account.

Deploying Additional Models

If you wish to expand your LLM capabilities, simply navigate to the settings menu found in the top right-hand corner of the Open Web UI screen. Select “models” from the left-hand menu to add or manage additional models. This feature allows for versatile deployment configurations and model management, ensuring that your setup can adapt to various requirements and tasks.

If you would like to change the default models deployed or disable GPU support, simply modify the ollama-values.yaml file in the infra/tf/values folder.

ollama:
 gpu:
   # -- Enable GPU integration
   enabled: true
   # -- Specify the number of GPU to 1
   number: 1
 # -- List of models to pull at container startup
 models:
   - llama3.2-vision
   - gemma
   # - llava
   # - mixtral
   # Get more models from: https://ollama.com/library
persistentVolume:
 enabled: true
 size: 250Gi # file size of model repository

Note: To access Llama 3.2 for non-vision features update the above code llama3.2-vision to code>llama3.2

Summary

Congratulations! You have successfully deployed a Kubernetes GPU cluster on Civo Cloud using Terraform and set up various components for running LLMs, including the Ollama inference server and web interface.

With this boilerplate, you now have a scalable and flexible infrastructure for leveraging Open Source LLMs, allowing you to customize deployments, integrate additional tools, or expand your cluster as needed.

If you want to learn more about LLMs, check out some of these resources:

Kubernetes meets Llama 3.2: How to deploy AI Models on GPU clusters

Simplifying LLM Deployment with Civo’s LLM Boilerplate

Accelerate Your LLM Deployment with Civo GPUs

What You'll Learn

Project Goal

Prerequisites

Deploying Llama 3.2 on Civo using Terraform

Project Setup

Project Configuration

Deployment Configuration

Deploy LLM Boiler Plate

Building and deploying the Example Application

Related Tutorial: Deploy Private ChatGPT with LLama 3.1

Troubleshooting

Deploy Llama 3.2 through GitHub Actions

Accessing and Managing Your Deployment

Checking the Load Balancers

Deploying Additional Models

Summary

Ben Norris

These may also be of interest

An Introduction to using Cyclops on Civo

Secure Secrets Management in Kubernetes with Bitwarden and Civo

Using Argo CD in Kubernetes to deploy applications with GitOps

Kubernetes

Compute

Databases

CivoStack Enterprise

Civo FlexCore

CivoStack for Service Providers

Cloud GPU

Carbon neutral GPU

Kubeflow as a Service

Case studies & testimonials

Learn

Blog

White papers

Documentation

Civo news

Meetups

Marketplace

Use Civo for your demos

Kubernetes meets Llama 3.2: How to deploy AI Models on GPU clusters

Simplifying LLM Deployment with Civo’s LLM Boilerplate

Accelerate Your LLM Deployment with Civo GPUs

What You'll Learn

Project Goal

Prerequisites

Deploying Llama 3.2 on Civo using Terraform

Project Setup

Project Configuration

Deployment Configuration

Deploy LLM Boiler Plate

Building and deploying the Example Application

Related Tutorial: Deploy Private ChatGPT with LLama 3.1

Troubleshooting

Deploy Llama 3.2 through GitHub Actions

Accessing and Managing Your Deployment

Checking the Load Balancers

Deploying Additional Models

Summary

Ben Norris

These may also be of interest

An Introduction to using Cyclops on Civo

Secure Secrets Management in Kubernetes with Bitwarden and Civo

Using Argo CD in Kubernetes to deploy applications with GitOps