Data Driven Observability: From Secrets to Log
Speaker: Rotem Refael
Summary
Rotem Refael shares insights on data-driven observability for Kubernetes clusters. She explores three observability platforms: Lens, Prometheus, and Grafana, as well as the ARMO platform for enhancing cluster security. Rotem demonstrates how to effectively observe workloads, resource consumption, and security aspects of Kubernetes deployments, providing valuable tips and visualizations to optimize cluster management.
Transcription
So, once again, here I am. My talk today is about how to observe our Kubernetes cluster in several ways. I present myself. Some of you already knew me from last presentation, but my name is Rotem and I am a developer for around 15 years, mainly DevOps industry in the past five years. I'm practicing yoga, I love basketball, and I'm presenting here for my second talk, which is really cool. Currently, I'm the Director of Engineering at ARMO, which is responsible for, and the maintainers basically, of Kubescape, which is an open source security for Kubernetes solution. Check us out on GitHub. We have many followers. We recently got accepted to CNCF as a sandboxing, so it's very much exciting.
Let me tell you a story. Once upon a time, there was a DevOps panda that settled in all night and tried to realize how to create and maintain a Kubernetes cluster in production. So, yeah, that's easy, right? You know, creating this, maintaining it is even easier. To observe your cluster is even easier. All around, so, yeah, it's really not easy. If you're doing it by yourself, kudos to you. But what I do is I will preview and present something that I just made. I created my own Minikube. I'm so Helm charts of Grafana, Prometheus, and Mango. That's it, a simple cluster, very short, very thin.
You will believe me. I just screenshot my pods that are running. I installed those. Observe a small cluster. Okay, I know that we all have Kubernetes clusters with hundreds of nodes, if not even many clusters with hundreds of nodes, and we need to manage those and it's not that easy.
So what I do is observe. What do I need to observe and what's that? So remember my story, I want to run my own cluster in production, right? And to have a clear view about that, I wanted to sense a few simple things, right? Very simple things that I'm sure we all want to understand about our cluster. So, what are my workloads and the status of my cluster, the logs, the CPU and memory consumption, how many nodes I really need. I need to optimize that. Maybe I'm paying a lot of money for AWS or any other provider. How many replicas do I need for my deployments, and is my cluster finally secure? I know it's the last thing here at the list. I think it's very much important, and we'll get to it in the talk.
So, how do we do that? If you're doing it by kubectl, again, great job. It's very hard to do that by that, and I will focus on three observability platforms that I'm using. There are a lot more, but I have just 20 minutes, I think. So I'll focus on those. I'm using Lens, I'm using the Prometheus and Grafana, and I'm also using the ARMO platform, and we'll get to each and every one of those and drill down.
So, let's start with Lens. I know there was a fuss in the last month about Lens and how they took the open sourcing and things, but I still really love this platform. So, I will talk about it a bit. What we're seeing here is, how do I see the info and status of my cluster? So, I have here my demo cluster, okay, I just created a demo cluster, again this Minikube that I showed you before with Mango. I have StatefulSet there, and I have replicas and everything there. What do I see here? I see my workloads, I see my pods. I just click the 'pod' thing and I see my entire pod: what's running, what's not running, what's pending. So, you see it visually, and it's really easy to manage because I can click it, I can stop the pod, I can delete the pod. I can do whatever I want.
So that's one thing that we're doing in Lens. You can see the info, I just click on the replica set of Grafana, so you can see when it's created, what's the CPU usage, what are the labels, the annotation, everything you can see by 'kubectl', but it's just one click and you see it all. So, it's very easy, and I really like it. And regarding security wise, I had like the long list there, what I want to see when I observe my clusters. What we made is Kubescape Lens extension. It's very easy to install extension in Lens, and it's very easy to develop one. So, with development for security, what do we see in that extension? We can see on one hand the Kubescape extension down here, and you can see the whole resources that failed. But on the other hand, you can just click on the workloads and see the info, and also the security: which test this control failed security wise. Did we define the CPU and memory limits, request limits, or any other thing that you can see right here in the list? So that, regarding Lens, I can speak a lot about Lens. I'm using it often, also with my production environment. But again, we're just 20 minutes, so we'll move forward to Prometheus.
Personally, I adore Prometheus. I think it's a wonderful tool. Maybe not for really big organizations, Datadog is now very into things, but I really like it and it's open source, so kudos to them. So, what do I see in Prometheus? I installed my Kubernetes exporter and I see the info of my cluster. How many CPUs do I need? How many memory consumption do I see? How many nodes are there? All those things that we saw back then in Lens, but here it's in a graphical way. If I have a peak in CPU in one of my nodes, then I see it right here in this exporter. Moving forward, this is the node memory usage. Of course, you can make your own dashboards, taking these metrics and that metrics and just combine according to what you want. So, it's very easy to make that visualized in Grafana.
Also, kubelet info. I'm just presenting what I'm doing in my Minikube. You can see here the request duration, the RPC rate, and the CPU usage, and many more. Going to the security, as with Lens, also Prometheus has their exporter. You can create your own exporter, and we created like Kubescape exporter to understand what's the status of your cluster security wise. How many failed resources do you have, what's the rate, and so on. How many controls have failed in status? How many passed? How many excluded we ignore that etc.
And of course, at the end, what Prometheus and Grafana have that I think Lens does not have, is the option to alert. We can connect the Alert Manager and alert about certain metrics that we're responsible for or that we need to understand better. You can define a threshold or something like that and alert just when we're getting there.
Okay, so the third platform that I'm using is the Armo platform. I'm using it mostly for security purposes and what I've noticed, and I went to this link of CNCF, is the five Kubernetes top RBAC mistakes that we want to avoid, and how do I observe that. You have the improper use of role aggregation, the unused roles, the duplicate role grant, the grant of missing rule, and the cluster administrator role granted unsuccessfully, and all that you can see right in this link. It's from CNCF. What Armo provides is taking those mistakes and making them graphical. So, if I'm talking about 'cluster administrator role granted unsuccessfully', what I see here is that 'Ben' has the 'Ben Almighty' binding to the cluster admin. Yeah, that's something I made, but you get the idea right. I can see very visually that all these users or subjects have the cluster admin role. So, that's one thing.
The other thing is the unused role. Here, I can see all those roles that are not in use. So, I can just avoid them or delete them. Why do I need them in my system? They can be just a bridge for someone to enter.
Again, those are a few RBAC visualizer features and tasks that we can do. You can make your own query on the RBAC, which is really nice. We can check if there are some workloads that are unused or anything else, what are the bindings that we're having there.
Like, if I want to close this session and understand, what I want you to take from this session is there are a lot of ways to observe Kubernetes. But I think everyone needs to figure out what's important for them. I mean, if you're into security, of course everyone should be worried about security, okay, but you need to take the tool that fits you. You can take Grafana and Prometheus if you're into graphical representation, Lens if you're into the specific resources, and an RBAC visualizer if you're interested in seeing the whole map of RBAC permissions. You can customize your dashboards. Everyone can have their own dashboard. I wrote here, 'developer is not a DevOps', because they might have the same interests sometimes, but most of the time a developer can be very into CPU and memory usage, and DevOps could be very much into a security perspective. So, you can create your own dashboard and just mix and match.
So, that was me. Thank you so much for being here. You can come over to our booth if you still have any questions, or you can just ask it right now.
Stay up to date
Sign up to the Navigate mailing list and stay in the loop with all the latest updates and news about the event.