Retrieval-augmented generation (RAG) systems are becoming one of the most exciting topics in machine learning today. RAG systems represent a significant advancement over standard Large Language Models (LLMs). However, to fully utilize RAG systems, two key components are essential: a Large Language Model and a scalable deployment environment.
In this tutorial, we will be using Gemini, a highly sophisticated and powerful LLM created by Google. As of the writing of this article, Gemini is considered one of the most powerful LLMs available.
For our deployment environment, we will be using Civo Kubernetes, which offers a managed Kubernetes service that focuses on speed, simplicity, and developer-friendly features.
This tutorial will answer the questions of what RAG systems are, how they work, how to build one, how to deploy it on a Civo Kubernetes Cluster, and finally, how to test it. Without further ado, let's dive in.
Understanding the Gemini LLM and RAG Architecture
Let's start by answering the question, "How does such a complex system work from the start?" Modern LLMs are not trained on every piece of data available on the internet, which means they cannot analyze all existing data. This is where RAG systems come in handy.
A RAG system operates by first searching for and retrieving data that the LLM hasn't been trained on. This retrieved data is then passed to the LLM, which analyzes it to answer a given prompt. This approach enables the LLM to provide more accurate and up-to-date responses by incorporating information beyond its training data.
Source: Image created by author
Step 1: The user asks the model a question
Since the LLM does not inherently possess up-to-date information required to answer certain queries, it is essential to provide it with the latest relevant data.
Step 2: Perform similarity search in the Vector Database
The query is transformed into a word embedding, which is a data format understandable by the LLM. A similarity search is then performed to find the data in the vector database that is most similar and relevant to the embedded query. This process ensures that the LLM receives the most pertinent information to generate an accurate response.
Step 3: Data is retrieved and sent to the LLM
The retrieved data is then incorporated into the model’s prompt.
Step 4: Data is analyzed and response given
The LLM analyzes the retrieved data and generates an accurate, up-to-date response based on the most relevant information.
Building our RAG finance system
Step 1: Import the necessary libraries
To begin with, we will import the required libraries for our model to function. These imports include tools for data manipulation, machine learning, and integration with our chosen models and APIs.
Flask, a lightweight web framework, will help us create web applications and APIs, while Langchain provides components such as OpenAIEmbeddings and FAISS for handling embeddings and efficient similarity search.
from flask import Flask, request, jsonify
import os
import pandas as pd
from lucknowllm import GeminiModel
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.prompts import PromptTemplate
Step 2: Initialize Gemini and OpenAI API
Here, we initialize Gemini and OpenAI API by providing the necessary API keys. This step sets up our connection to the models we will use for generating and retrieving data. For detailed instructions on obtaining the API keys, you can follow the tutorials for Gemini API Key and OpenAI API Key. These guides will walk you through the process of generating and securing your API keys.
Initialize the Gemini API key:
gemini_api_key = "Insert Your Gemini Key Here"
Gemini = GeminiModel(api_key=gemini_api_key, model_name="gemini-1.0-pro")
Initialize the OpenAI API key:
openai_api_key = "Insert Your OpenAI Key Here"
Step 3: Define Our Key Functions
In this step, we define several essential functions to handle data processing, create embeddings, and perform retrieval tasks. These functions are integral to our Retrieval-Augmented Generation (RAG) system.
Load and process the CSV file:
For our demonstration, we will utilize the S&P 500 stock dataset, a comprehensive dataset containing historical stock market information for all companies found on the S&P 500 index. This dataset will enable us to implement and test the effectiveness of our RAG system in handling real-world data scenarios.
By incorporating this dataset into our RAG system, we will be able to perform comprehensive analyses of stock market trends, evaluate historical performance, and generate insights into the financial health and future stock movements.
def load_and_process_csv(file_path):
data = pd.read_csv(file_path)
data['text'] = data.apply(lambda row: f"Stock {row['Name']} on date {row['date']} opening price {row['open']} closing price {row['close']}.", axis=1)
texts = data['text'].tolist()
return texts
Create and store embeddings:
def create_and_store_embeddings(texts, openai_api_key):
vectors = embeddings.embed_documents(texts)
vector_store = FAISS.from_texts(texts, embeddings)
return vector_store
Retrieve similar documents:
def retrieve(query, vector_store, k=20):
return vector_store.similarity_search(query, k=k)
Ask a question and get an answer:
def askquestion(question, vectorstore):
topdocs = retrieve(question, vectorstore)
topcontexts = [doc.pagecontent for doc in topdocs]
topcontext = " ".join(top_contexts)
prompttemplate = PromptTemplate(
inputvariables=["context", "question"],
template="""You are an expert question answering system. I'll give you a question and context, and you'll return the answer.
Context: {context}
Query: {question}"""
)
argumentedprompt = prompttemplate.format(context=top_context, question=question)
modeloutput = Gemini.generatecontent(argumented_prompt)
return model_output
Step 4: Defining Our End Point
We then ask a specific question about our data and use our RAG system to retrieve the most relevant information and generate a response.
# Endpoint for handling question answering requests
@app.route('/api/question-answering', methods=['POST'])
def question_answering():
# Extract data from the request
data = request.json # Assuming JSON data is sent
question = data.get('question')
file_path = '/app/AAL_data.csv' # Path in the Docker container
# Load and process data from CSV
texts = load_and_process_csv(file_path)
# Create embeddings and vector store
vector_store = create_and_store_embeddings(texts, openai_api_key)
# Perform question answering
response = ask_question(question, vector_store)
return jsonify({'response': response})
Step 5: Defining Our Main Function
The below code section, starts the Flask development server, making the application accessible on all network interfaces at port 5000.
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Creating our Civo Kubernetes cluster
After completing the development of our RAG system, we need an environment to deploy both our Python script and CSV dataset. This involves setting up a suitable platform where these components can run efficiently and interact seamlessly.
To create a Civo Kubernetes cluster, follow these steps:
- Log in to Civo Dashboard: Navigate to the Civo Dashboard and log in with your credentials.
- Create a New Cluster: Once logged in, go to the "Kubernetes" section on the dashboard. Click on the "Create New Cluster" button.
- Configure Your Cluster:
- Name Your Cluster: Give your cluster a unique and descriptive name.
- Node Configuration: Choose the number and size of nodes for your cluster. Consider the workload and scale of your applications when making this choice.
- Additional Options: Configure additional settings such as network configurations, firewall rules, and cluster tags.
Within a few minutes, your Kubernetes cluster will be fully set up, and all the relevant details will be displayed.
Deploying our Flask Application on our Civo cluster
Step 1: Creating Our Docker File
The Dockerfile is a text document that contains all the commands to assemble an image. Using the docker build command, you can create an automated build that executes several command-line instructions in succession.
Base Image: The first line specifies the base image, python:3.9-slim
, which provides a lightweight Python environment.
Working Directory: The WORKDIR /app
line sets the working directory within the container where our application will reside.
Copy Files: The COPY
commands copy our Python script FinanceSystem.py
and the dataset AAL_data.csv
into the container.
Install Dependencies: The RUN
command installs necessary system dependencies and Python packages. This includes Flask for the web server, pandas for data manipulation, and various other libraries needed for our RAG system.
Expose Port: The EXPOSE 5000
line specifies the port on which the Flask app will run.
Environment Variable: The ENV FLASK_APP=FinanceSystem.py
line sets an environment variable to tell Flask which file to run.
Run Command: Finally, the CMD
line runs the Flask application, listening on all network interfaces at port 5000.
FROM python:3.9-slim
WORKDIR /app
COPY FinanceSystem.py .
COPY AAL_data.csv .
RUN apt-get update && apt-get install -y wget curl git \
&& apt-get install -y python3-dev python3-pip build-essential \
&& pip3 install --upgrade pip \
&& pip3 install -v flask pandas wandb langchain google-generativeai langchain-community openai tiktoken faiss-cpu \
&& pip3 install git+https://github.com/LucknowAI/Lucknow-LLM
EXPOSE 5000
# Define environment variable
ENV FLASK_APP=FinanceSystem.py
# Run the Flask application
CMD ["flask", "run", "--host=0.0.0.0", "--port=5000"]
This command will generate a requirements.txt
file listing all the installed packages and their versions.
Step 2: Building the Docker image
The build command initiates the creation of a Docker image based on the instructions specified in a Dockerfile.
docker build -t finance-system .
Step 3: Tagging the Docker image
The tag command assigns a tag to an existing Docker image, allowing for easier identification and versioning.
docker tag finance-system:latest your_dockerhub_username/finance-system:latest
Step 4: Pushing the Docker image to Docker Hub
The push command uploads a Docker image to a registry, making it accessible for deployment and sharing.
docker push your_dockerhub_username/finance-system:latest
Step 5: Creating and Configuring the Deployment YAML File
The deployment YAML file defines a Kubernetes deployment for our finance system application. It specifies the number of replicas, the Docker image to use, and the ports to expose. This configuration ensures the application is consistently running and accessible, while Kubernetes manages the deployment's lifecycle.
apiVersion: apps/v1
kind: Deployment
metadata:
name: finance-system-deployment
spec:
replicas: 1
selector:
matchLabels:
app: finance-system
template:
metadata:
labels:
app: finance-system
spec:
containers:
- name: finance-system
image: your_dockerhub_username/finance-system:latest
imagePullPolicy: Always
ports:
- containerPort: 5000
Step 6: Creating and Configuring the Service YAML File
The service YAML file defines a Kubernetes service for our finance system application. It specifies how the deployment is exposed to other services or external traffic.
apiVersion: v1
kind: Service
metadata:
name: finance-system-service
spec:
selector:
app: finance-system
ports:
- protocol: TCP
port: 5000
targetPort: 5000
nodePort: 30007 # Specify the NodePort here
type: NodePort # Change the service type to NodePort
Step 7: Creating and Configuring the Ingress YAML File
The ingress YAML file configures an Ingress resource to manage external access to the services in a Kubernetes cluster. It defines rules for routing HTTP traffic to the finance system service, specifying a host and path to direct requests. The ingress controller processes these rules and manages the routing, making the application accessible via a specific URL.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: traefik
labels:
app: finance-system
name: finance-system-ingress
spec:
rules:
- host: finance-system.<your_cluster_id>.k8s.civo.com
http:
paths:
-
backend:
service:
name: finance-system-service
port:
number: 5000
path: /
pathType: "Prefix"
Step 8: Deploying the YAML Files to the Civo Kubernetes Cluster
Apply the deployment configuration:
kubectl apply -f deployment.yaml
Apply the Service configuration:
kubectl apply -f service.yaml
Apply the Ingress configuration to manage external access to your service:
kubectl apply -f ingress.yaml
Your External IP address should look something like this:
http://finance-system.<your-cluster-id>.k8s.civo.com/
Making sure everything works
Let's sum up the steps followed until now:
Step | Description |
---|---|
Step 1: | We created our RAG system using our machine-learning expertise. |
Step 2: | We have inserted this RAG system into a Flask application, defining its endpoints or the API to be called. |
Step 3: | We have created our Civo Kubernetes cluster and deployed our Flask service on it. |
Step 4: | Now we need to call the API. We can do this both locally (from the device that the Flask application is deployed on) or which is more commonly used externally (from devices outside the local machine hosting our application). The external endpoint is the one mentioned in the ingress.yaml. |
In order to call the API, we will utilize Postman, an API tool that allows us to call different endpoints smoothly.
If you receive a similar reply to the one above, congratulations! Your system is working perfectly!
The takeaway
By now, you should have successfully deployed a fully functional Retrieval-Augmented Generation (RAG) system on your local Kubernetes cluster. In this article, we've covered the process of building the RAG system, Creating a Kubernetes cluster, deploying the app on the Civo Kubernetes cluster, and conducting thorough testing to ensure it operates as expected.
By following these steps, you have not only set up and deployed a sophisticated RAG system but also gained hands-on experience with Docker, Flask, Kubernetes, Civo, and related tools. This deployment is scalable and can be adapted for various use cases, making it a robust solution for integrating advanced AI models into your applications.
If you want to learn more about this topic, check out this video from Joey DeVilla.