The new Kubeflow Distribution in Town: Civo
Speaker: Rishit Dagli
Summary
In this talk, Rishit introduces Civo's new Kubeflow distribution, which he contributed to. He delves into the challenges of deep learning, particularly model training and deployment, and how Kubeflow can simplify these processes by leveraging the strengths of Kubernetes. He presents a demo showcasing how to deploy Kubeflow on Civo using K3s, emphasizing the speed and ease of the process. Rishit also underlines the full functionality of this Kubeflow distribution, in contrast to MiniKF, noting it includes all components, as well as the ability to add more as needed.
Transcription
Ok I will start. Hello, I'm Rishit, and I'll be talking about the new Kubeflow distribution in town Civo, and if you remember in yesterday's talk about the new product offerings by Civo Josh mentioned there's a marketplace app for Kubeflow already which exists. So that was contributed by me, and I'll be talking about how I built that Kubeflow distribution in this talk.
Earlier, this was supposed to be a very technical talk with a lot of code, and I had also built a lot of the content, how I optimized it for K3s and everything. But then I realized that there are too many distractions, mainly like lunch after the talk. So, because of those distractions, I thought to make it a bit less technical and leave room for some more questions, but we'll still have a demo.
So, what's the plan for the day? The plan is to start by talking a bit about Kubeflow, just give you a brief introduction about it, then talk about the Kubeflow Civo distribution, which I built. I'll also talk a bit about how it works under the hood. Then, I'll talk about how you can contribute to the Civo marketplace, if you would want to.
Good. So, hello, I'm Rishit. I'm a student at the University of Toronto and I work on Finch. The emoji is interesting because it's an actual rocket we are building. I research machine learning and computer vision. I also contribute to quite a lot of open-source projects and needless to say, I love open source.
I'll start by talking a bit about Kubeflow. One of the main problems when you do deep learning is training. Have you tried training a Transformer from scratch? If so, all of this is really hard, probably a nightmare. It's not so easy, is it? And this is just one part of it. This is the machine learning code part of it, which is already really hard. You try training a Transformer or a GAN from scratch without the right warm-up and you probably spend thousands of dollars with nothing.
So, it's hard to train machine learning models, but that's only one part of it. The machine learning code part of it, but then there is data collection. Once you collect the data, you have to do data verification, feature extraction. Then your friend comes and tells you that you have to do all of this infrastructure-related stuff. Then you have to deploy the model, which is even harder than the machine learning code. So, it is quite hard to deploy models isn't it and it's even harder to deploy end-to-end machine learning applications not just models.
A production solution requires much, much more than any of this machine learning code. So, this is quite hard to do, and Kubeflow helps you do exactly that. It makes deployments of end-to-end machine learning workflows on Kubernetes simple, portable, and scalable.
So, what this means is essentially, you want to allow Kubernetes to do what it's best at, things like auto-scaling, serving, and infrastructure management. You don't want to do all of this by yourself because Kubernetes is already really good at it, and I'm assuming a lot of you might have used Kubernetes earlier and can concur that Kubernetes is great at doing all of these things.
You just want to let Kubernetes do what it's best at, and Kubeflow allows you to do exactly that for machine learning. And it's not tied to any particular tools. You can use TensorFlow, PyTorch, or you can use any other framework, or you can also just have your own framework up there. You can essentially use it however you want.
One of the main reasons I particularly use it for my own work, for my own research as well, is it runs on top of Kubernetes and makes use of all the benefits of Kubernetes, which is pretty nice to have.
Kubeflow has a lot of components. You have a dashboard that gives you a UI to manage everything. There's notebooks, if you want to do machine learning experimentation. There are trainers, if you want to train machine learning models. You can also use Volcano with it, which if you have been working in HPC, you might know, allows you to do distributed training very well. Then you have pipelines, you have hyperparameter engineering. So, there are a lot of these components, and all of these come together to make Kubeflow, and it covers everything in machine learning.
But here's the Kubeflow 2022 user survey, and 30% or more people mentioned that installation was a big problem for them. Probably even more than 30% because of the survivorship bias. So, a lot of people still have problems with installing Kubeflow and it's hard. It takes quite some time. Documentation tutorials are also another big problem as identified in the 2022 Kubeflow user survey. And this survey was filled by people who are already using Kubeflow.
So, ideally even more people face the insulation issue, and all these other issues.
So, why is this a big thing? It's a big thing because Kubeflow is 627 Kubernetes resources, 73 CRDs, 49 deployments, six namespaces, and a lot more. So, when you have 627 Kubernetes resources, there are 627 ways to fail. You have to manage all those versions. You have a different Kubernetes version, you have to manage that. You run anything on top of Kubeflow which conflicts with any of those 627 Kubernetes resources, you again face a problem.
So, it becomes pretty hard to deploy Kubeflow just because it's a complete tool, but in making a complete tool, you have all of this, which is hard to deploy and takes quite some time as well. What you want to do is make 627 resources work together, and not only work together but work with your Kubernetes version. You want to make it work with any other setup. Probably you have K3s, or something on top of Kubernetes for observability, or monitoring. So, you want to make it work with all of this, with your existing setup, which becomes pretty hard.
So what do we want to do is, so what this helps in doing is, and you are not tied to Civo with, I'll be showing the demo with Civo but you are not tied to Civo. You can essentially deploy it anywhere k3s runs and there are a lot of benefits to this, which I'll talk about but K3s makes it particularly easier and faster to deploy Kubeflow with all of its resources. And okay, before we talk more, let's see a demo of this. And I'll show a demo of this distribution of Kubeflow on Civo and it makes it particularly easy to do just the things I talked about. So let's see a demo of it.
I'm here on my Civo dashboard. What I'll try to do right now is just try to create a Kubeflow cluster from the UI but you can most certainly use the CLI as well. But let's start by doing this. We'll have five nodes, make those large. And so now, how easy it is made is, you just go to management and by the way, all of this is the Civo Marketplace for any Kubernetes apps and you can see Kubeflow is here. And interestingly all of this is open source.
And add it to the marketplace. So you can also see how I handle some of the intricacies for optimizing K3s and making it work together. But right now, we'll just see how to use it which is as simple as adding the Kubeflow application. So now that you have added the Kubeflow application, let's just create a cluster and it should take about three minutes on K3s, two minutes for creating a cluster, and another one minute for deploying those 627 resources, which is pretty fast, and that's another thing you get when a distribution is optimized for K3s.
It's creating a cluster, there it is. So it has a countdown. Civo allows you to deploy clusters in 120 seconds, which is pretty nice, also useful for this demo because I can show a live demo here. So that's pretty useful. But okay while this happens, let's talk about the extension. But you saw how to install it and it was pretty simple. But we'll talk about some more things you can do with this.
Great, so we already saw the Civo application to automate it all. And it's essentially a distribution for K3s. So just like the previous talk was mentioning, there are multiple ways to run K3s, multiple avenues to run K3s. You can run the distribution wherever K3s runs. So yeah, it's not just Civo but however K3s runs.
Another interesting thing about this is it's not MiniKF, it's actually deploying those 700 Kubernetes resources. There's also avenues for smaller versions of Kubeflow because installation was such a big issue, the community developed smaller versions of Kubeflow called MiniKF. And it does not contain all the components as you might have expected, makes the installation a lot more easier than maintaining those 627 resources.
But what we are deploying is not MiniKF. What we are deploying is Kubeflow with every single of its components. And MiniKF gets you limited somewhere cutting down on some of the Kubeflow components, which I did not like to do because I wanted to make use of the full power of Kubeflow.
So what this does is, it essentially uses the Kubeflow manifests under the hood. So Kubeflow manifest is an interesting way to install Kubeflow, and it's officially supported by the Kubeflow community. What it allows you to do is, it has customized-based installations. And that's how it installs Kubeflow, has all of its components as customized manifests.
What I tried doing is made modifications to these components and these manifests, to be best optimized for K3s. As I was talking about, this is customized-based installation and adding newer components is also very easy. So right now what we are doing is, what we just did is try to deploy Kubeflow with all of its official components. But there are a lot of unofficial components of Kubeflow as well, probably something you are excited about.
You can still add it with Kubeflow. At the end, it's just a bunch of Kubernetes resources put together. So you can have your own components in Kubeflow as well and you can integrate them with Kubeflow very easily. So adding newer components in this distribution is also very easy, and again it's open source so you can do it pretty well.
This is also the latest release of Kubeflow we are running now and not any of the older releases of Kubeflow. So yeah, this is actually the RC release we are not even running the stable release, the RC release. I've tried deploying that.
Let's go to the cluster, see where it is at. So right now, the cluster is created, the pods are not yet running. So what we'll do now is try to take a look at the kubeconfig for this and see if our pods are running yet.
Great, so I have a kubeconfig up now. I'll just download the kubeconfig. And I've also tried to write a bit of documentation, not a lot. So you can actually take a look at the documentation here. Okay, probably... Oh yeah, and in job applications. So we have some documentation as well, but I won't read documentation, that's usually how I work.
So I'll just get the kubeconfig up here, and let's start seeing if it has actually deployed everything.
So I'll start by configuring Kubectl to use the kubeconfig, which is named Civo Navigate.
Yep, there we have it, and now I can just do kubectl get nodes, and this should show me that I have the five nodes which I created. Yes, I do have the five nodes I created, and what I need to check in right now is if all of my pods are already running. It also takes some time for the pods to become ready. So what I want to do is check if all my pods are ready.
Apparently they are not. Or did we do something else?
Interesting, why could it not access the namespace? Well, let's, I guess I might have made some typo in adding the namespace because it should show some resources.
Ah, I see. So it might take a while for the application to get installed.
I see, it makes sense. So what we'll try to do is give it a moment, see if we have the application installed. It ideally should happen pretty quickly, and back when I was trying, it was pretty quick.
So though it might take some while to install the application, this is essentially how you can very easily install the application. And yeah, you can also do this with the CLI. We'll give it a moment to run and hope the demo works well. Sorry, come again?
So Kubeflow does have a partial Helm chart, but it does not have a Helm chart with all of its components, which is also pretty hard to manage. If you have a Helm chart with all of Kubeflow's components, it becomes pretty hard to manage and there does not exist a community Helm chart for Kubeflow. There is one for MiniKubeflow though, so you can take a look at MiniKubeflow and there's a Helm chart for MiniKubeflow that you can install, but for Kubeflow, there is no Helm chart. And put your kubeconfig flag into it.
Sorry, I didn't get the question. Was it like, does it have something like the k3s-up project that we saw just now, that can help you install all this stuff using that program, like 'kubeflow install' and pass all your details like the path to your kubeconfig and it will handle the rest?
As far as I know, there is no such project, then again, I may be wrong. Back when I was creating the pull request for adding this application, back then there was no project to do this which is why I tried to do it myself. But I might be wrong, there might be some application right now, but not that I know of.
Sorry, it takes some more time than I expected to install the application. Interestingly, I tried it this morning itself and it seemed to work well, so I don't know what happened. Wait, so ideally if you wait some time for all the resources to be applied, what would happen is you would see a load balancer up here. And with the load balancer, you could get to the Kubeflow dashboard. And yeah, you could just start using Kubeflow.
Here, I could do that. Let's... So, when you, after you install the extension, it also appears over here on the UI. I load Civo, and you also see like a load balancer appear on the UI as well. Let's take a look if any of the namespaces have been created yet.
So, it seems the application might still be installing. And I, I'll not wait any further, but this is how it happens and you can try it for yourself on Civo using the UI itself. But yeah, that's about the talk and open source as such. There are a lot of opportunities for anyone to contribute and you can also contribute to the Civo Kubernetes marketplace or create new applications. So that's about the talk. Thank you.
Sir, thank you. Just a quick question about the size. Is that basically the kind of the recommended size for running Kubeflow? Like, was it, you had like five large nodes, is what you had put in?
Yeah, so it depends a lot on what part of the machine learning workflow you are planning to run. So if you might want to actually train a model with Kubeflow, fine, otherwise you would essentially do nothing and you would want to probably have 16 CPUs or more number of GPUs. But if you just want to explore Kubeflow, explore the application, you could run it on three nodes as well. Yeah, you could do it on three nodes as well. Just get Kubeflow up and running, the Kubeflow dashboard up and running. So yeah, it all depends on your workload. You want to do with Kubeflow. Yeah, if you want to just try it out, I suggest just starting out with three nodes and that might help a lot as well.
As I had one question. So how will the app look out for new versions of Kubeflow? So does it get automatically updated, or do we need to make those changes in the actual Civo Navigate, sorry, Civo Marketplace GitHub repository to update the actual version of Kubeflow itself? Yes, so that's interesting. And right now, it does not auto upgrade. That's the simple answer. Right now it does not auto upgrade. And ideally, you would have to create the manifests and update the Kubernetes marketplace to support the new version. And right now there's no auto upgrades configured, but that would be probably nice to have, something that we could work on.
Yeah, good. Yeah, sorry. Yeah, I, from what I've seen during your presentation, you can deploy machine learning models onto Kubernetes, right? Kubeflow, yes. Kubeflow, and then you use Civo to manage Kubeflow, right? Yes. Alright.
Yeah, I'll be honest, I hear this word, Kubernetes, like thrown around. I didn't know what it was or how it was relevant to me, but now I do and I must thank you for that. Oh, thanks. [Applause] So, thank you.
Stay up to date
Sign up to the Navigate mailing list and stay in the loop with all the latest updates and news about the event.