Retrieval-augmented generation (RAG) systems are becoming one of the most exciting topics in machine learning today. RAG systems represent a significant advancement over standard Large Language Models (LLMs). However, to fully utilize RAG systems, two key components are essential: a Large Language Model and a scalable deployment environment.

In this tutorial, we will be using Gemini, a highly sophisticated and powerful LLM created by Google. As of the writing of this article, Gemini is considered one of the most powerful LLMs available.

For our deployment environment, we will be using Civo Kubernetes, which offers a managed Kubernetes service that focuses on speed, simplicity, and developer-friendly features.

This tutorial will answer the questions of what RAG systems are, how they work, how to build one, how to deploy it on a Civo Kubernetes Cluster, and finally, how to test it. Without further ado, let's dive in.

Understanding the Gemini LLM and RAG Architecture

Let's start by answering the question, "How does such a complex system work from the start?" Modern LLMs are not trained on every piece of data available on the internet, which means they cannot analyze all existing data. This is where RAG systems come in handy.

A RAG system operates by first searching for and retrieving data that the LLM hasn't been trained on. This retrieved data is then passed to the LLM, which analyzes it to answer a given prompt. This approach enables the LLM to provide more accurate and up-to-date responses by incorporating information beyond its training data.

Understanding the Gemini LLM and RAG Architecture

Source: Image created by author

Step 1: The user asks the model a question

Since the LLM does not inherently possess up-to-date information required to answer certain queries, it is essential to provide it with the latest relevant data.

Step 2: Perform similarity search in the Vector Database

The query is transformed into a word embedding, which is a data format understandable by the LLM. A similarity search is then performed to find the data in the vector database that is most similar and relevant to the embedded query. This process ensures that the LLM receives the most pertinent information to generate an accurate response.

Step 3: Data is retrieved and sent to the LLM

The retrieved data is then incorporated into the model’s prompt.

Step 4: Data is analyzed and response given

The LLM analyzes the retrieved data and generates an accurate, up-to-date response based on the most relevant information.

Building our RAG finance system

Step 1: Import the necessary libraries

To begin with, we will import the required libraries for our model to function. These imports include tools for data manipulation, machine learning, and integration with our chosen models and APIs.

Flask, a lightweight web framework, will help us create web applications and APIs, while Langchain provides components such as OpenAIEmbeddings and FAISS for handling embeddings and efficient similarity search.

from flask import Flask, request, jsonify
import os
import pandas as pd
from lucknowllm import GeminiModel
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate

Step 2: Initialize Gemini and OpenAI API

Here, we initialize Gemini and OpenAI API by providing the necessary API keys. This step sets up our connection to the models we will use for generating and retrieving data. For detailed instructions on obtaining the API keys, you can follow the tutorials for Gemini API Key and OpenAI API Key. These guides will walk you through the process of generating and securing your API keys.

Initialize the Gemini API key:

gemini_api_key = "Insert Your Gemini Key Here"
Gemini = GeminiModel(api_key=gemini_api_key, model_name="gemini-1.0-pro")

Initialize the OpenAI API key:

openai_api_key = "Insert Your OpenAI Key Here"

Step 3: Define Our Key Functions

In this step, we define several essential functions to handle data processing, create embeddings, and perform retrieval tasks. These functions are integral to our Retrieval-Augmented Generation (RAG) system.

Load and process the CSV file:

For our demonstration, we will utilize the S&P 500 stock dataset, a comprehensive dataset containing historical stock market information for all companies found on the S&P 500 index. This dataset will enable us to implement and test the effectiveness of our RAG system in handling real-world data scenarios.

By incorporating this dataset into our RAG system, we will be able to perform comprehensive analyses of stock market trends, evaluate historical performance, and generate insights into the financial health and future stock movements.

def load_and_process_csv(file_path):
   data = pd.read_csv(file_path)
   data['text'] = data.apply(lambda row: f"Stock {row['Name']} on date {row['date']} opening price {row['open']} closing price {row['close']}.", axis=1)
   texts = data['text'].tolist()
   return texts

Create and store embeddings:

def create_and_store_embeddings(texts, openai_api_key):
   vectors = embeddings.embed_documents(texts)
   vector_store = FAISS.from_texts(texts, embeddings)
   return vector_store

Retrieve similar documents:

def retrieve(query, vector_store, k=20):
   return vector_store.similarity_search(query, k=k)

Ask a question and get an answer:

def askquestion(question, vectorstore):
   topdocs = retrieve(question, vectorstore)
   topcontexts = [doc.pagecontent for doc in topdocs]
   topcontext = " ".join(top_contexts)

prompttemplate = PromptTemplate( inputvariables=["context", "question"], template="""You are an expert question answering system. I'll give you a question and context, and you'll return the answer. Context: {context} Query: {question}""" ) argumentedprompt = prompttemplate.format(context=top_context, question=question)

modeloutput = Gemini.generatecontent(argumented_prompt)

return model_output

Step 4: Defining Our End Point

We then ask a specific question about our data and use our RAG system to retrieve the most relevant information and generate a response.

# Endpoint for handling question answering requests
@app.route('/api/question-answering', methods=['POST'])
def question_answering():
   # Extract data from the request
   data = request.json  # Assuming JSON data is sent
   question = data.get('question')
   file_path = '/app/AAL_data.csv'  # Path in the Docker container

   # Load and process data from CSV
   texts = load_and_process_csv(file_path)

   # Create embeddings and vector store
   vector_store = create_and_store_embeddings(texts, openai_api_key)

   # Perform question answering
   response = ask_question(question, vector_store)

   return jsonify({'response': response})

Step 5: Defining Our Main Function

The below code section, starts the Flask development server, making the application accessible on all network interfaces at port 5000.

if __name__ == '__main__':
   app.run(host='0.0.0.0', port=5000)

Creating our Civo Kubernetes cluster

After completing the development of our RAG system, we need an environment to deploy both our Python script and CSV dataset. This involves setting up a suitable platform where these components can run efficiently and interact seamlessly.

To create a Civo Kubernetes cluster, follow these steps:

  1. Log in to Civo Dashboard: Navigate to the Civo Dashboard and log in with your credentials.
  2. Create a New Cluster: Once logged in, go to the "Kubernetes" section on the dashboard. Click on the "Create New Cluster" button.
  3. Configure Your Cluster:
    • Name Your Cluster: Give your cluster a unique and descriptive name.
    • Node Configuration: Choose the number and size of nodes for your cluster. Consider the workload and scale of your applications when making this choice.
    • Additional Options: Configure additional settings such as network configurations, firewall rules, and cluster tags.

Within a few minutes, your Kubernetes cluster will be fully set up, and all the relevant details will be displayed.

Creating Our Civo Kubernetes Cluster

Deploying our Flask Application on our Civo cluster

Step 1: Creating Our Docker File

The Dockerfile is a text document that contains all the commands to assemble an image. Using the docker build command, you can create an automated build that executes several command-line instructions in succession.

Base Image: The first line specifies the base image, python:3.9-slim, which provides a lightweight Python environment.

Working Directory: The WORKDIR /app line sets the working directory within the container where our application will reside.

Copy Files: The COPY commands copy our Python script FinanceSystem.py and the dataset AAL_data.csv into the container.

Install Dependencies: The RUN command installs necessary system dependencies and Python packages. This includes Flask for the web server, pandas for data manipulation, and various other libraries needed for our RAG system.

Expose Port: The EXPOSE 5000 line specifies the port on which the Flask app will run.

Environment Variable: The ENV FLASK_APP=FinanceSystem.py line sets an environment variable to tell Flask which file to run.

Run Command: Finally, the CMD line runs the Flask application, listening on all network interfaces at port 5000.

FROM python:3.9-slim

WORKDIR /app

COPY FinanceSystem.py .
COPY AAL_data.csv .

RUN apt-get update && apt-get install -y wget curl git \
   && apt-get install -y python3-dev python3-pip build-essential \
   && pip3 install --upgrade pip \
   && pip3 install -v flask pandas wandb langchain google-generativeai langchain-community openai tiktoken faiss-cpu \
   && pip3 install git+https://github.com/LucknowAI/Lucknow-LLM

EXPOSE 5000

# Define environment variable
ENV FLASK_APP=FinanceSystem.py

# Run the Flask application
CMD ["flask", "run", "--host=0.0.0.0", "--port=5000"]

This command will generate a requirements.txt file listing all the installed packages and their versions.

Step 2: Building the Docker image

The build command initiates the creation of a Docker image based on the instructions specified in a Dockerfile.

docker build -t finance-system .

Step 3: Tagging the Docker image

The tag command assigns a tag to an existing Docker image, allowing for easier identification and versioning.

docker tag finance-system:latest your_dockerhub_username/finance-system:latest

Step 4: Pushing the Docker image to Docker Hub

The push command uploads a Docker image to a registry, making it accessible for deployment and sharing.

docker push your_dockerhub_username/finance-system:latest

Step 5: Creating and Configuring the Deployment YAML File

The deployment YAML file defines a Kubernetes deployment for our finance system application. It specifies the number of replicas, the Docker image to use, and the ports to expose. This configuration ensures the application is consistently running and accessible, while Kubernetes manages the deployment's lifecycle.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: finance-system-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: finance-system
  template:
    metadata:
      labels:
        app: finance-system
    spec:
      containers:
      - name: finance-system
        image: your_dockerhub_username/finance-system:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 5000

Step 6: Creating and Configuring the Service YAML File

The service YAML file defines a Kubernetes service for our finance system application. It specifies how the deployment is exposed to other services or external traffic.

apiVersion: v1
kind: Service
metadata:
  name: finance-system-service
spec:
  selector:
    app: finance-system
  ports:
    - protocol: TCP
      port: 5000
      targetPort: 5000
      nodePort: 30007  # Specify the NodePort here
  type: NodePort  # Change the service type to NodePort

Step 7: Creating and Configuring the Ingress YAML File

The ingress YAML file configures an Ingress resource to manage external access to the services in a Kubernetes cluster. It defines rules for routing HTTP traffic to the finance system service, specifying a host and path to direct requests. The ingress controller processes these rules and manages the routing, making the application accessible via a specific URL.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
  labels:
    app: finance-system
  name: finance-system-ingress
spec:
  rules:
    - host: finance-system.<your_cluster_id>.k8s.civo.com
      http:
        paths:
          -
            backend:
              service:
                name: finance-system-service
                port:
                  number: 5000
            path: /
            pathType: "Prefix"

Step 8: Deploying the YAML Files to the Civo Kubernetes Cluster

Apply the deployment configuration:

kubectl apply -f deployment.yaml

Apply the Service configuration:

kubectl apply -f service.yaml

Apply the Ingress configuration to manage external access to your service:

kubectl apply -f ingress.yaml

Your External IP address should look something like this:

http://finance-system.<your-cluster-id>.k8s.civo.com/

Making sure everything works

Let's sum up the steps followed until now:

Step Description
Step 1: We created our RAG system using our machine-learning expertise.
Step 2: We have inserted this RAG system into a Flask application, defining its endpoints or the API to be called.
Step 3: We have created our Civo Kubernetes cluster and deployed our Flask service on it.
Step 4: Now we need to call the API. We can do this both locally (from the device that the Flask application is deployed on) or which is more commonly used externally (from devices outside the local machine hosting our application). The external endpoint is the one mentioned in the ingress.yaml.

In order to call the API, we will utilize Postman, an API tool that allows us to call different endpoints smoothly.

Building a RAG system with Gemini for financial forecasting on Civo Kubernetes

If you receive a similar reply to the one above, congratulations! Your system is working perfectly!

The takeaway

By now, you should have successfully deployed a fully functional Retrieval-Augmented Generation (RAG) system on your local Kubernetes cluster. In this article, we've covered the process of building the RAG system, Creating a Kubernetes cluster, deploying the app on the Civo Kubernetes cluster, and conducting thorough testing to ensure it operates as expected.

By following these steps, you have not only set up and deployed a sophisticated RAG system but also gained hands-on experience with Docker, Flask, Kubernetes, Civo, and related tools. This deployment is scalable and can be adapted for various use cases, making it a robust solution for integrating advanced AI models into your applications.

If you want to learn more about this topic, check out this video from Joey DeVilla.