Real-world container & image security: Present and future
Speaker: Bret Fisher
Summary
Bret Fisher shares his extensive knowledge and experience in Docker and Kubernetes during this session from Civo Navigate NA 2023. He emphasizes the importance of addressing security challenges in container and image deployments. Bret discusses various aspects of container and image security, including CI/CD practices, vulnerability scanning, standardizing pods, and automation. As potential solutions, he also introduces concepts such as SSH signing of Git commits and GetOps signature verification. Overall, the presentation aims to provide practical insights and guidance for individuals seeking to ensure the safe and reliable deployment of their software in containerized environments.
Transcription
Hi, I'm Bret. I talk for a living on the internet, and I have over 700 videos, including 4 courses, and 300,000 students. This is full-time for me. So I've been doing and using and talking about Docker for almost nine years now, and next month technically is the 10th-year anniversary of Docker. Kubernetes came out a year later, and the rest is history.
So this talk, by the way, is a whole bunch of things. So if you want to talk, if you want the slides, if you want to, this is actually something that is starting this month. I'm going to be evolving this into something bigger. So if you're interested, I'll have the same slide at the end so you can get it later. It's easier to get it on your phone than take pictures of my slides. But for years now, I do live streams every Wednesday or every Thursday on the internet, and I have done that for five years. I have over 200 episodes, 200 weeks, having guests. A lot of the vendors here have been on my show. You can find all that on my website.
But the thing that I've learned over those 300,000 students, working with dozens of companies on adopting Docker and containers and Kubernetes, is we're all mortal. We're all busy people. Most of us don't get to do one of these things alone. We're usually saddled with way too much work, and we need someone to break down a lot of the great security tools like Slim just talked about and give us a sense of the landscape and what we're dealing with end-to-end. And there's not really a lot of vendors that are doing that. Most people are niched in spaces, so they're not dealing with the time from writing code through building testing, smoke testing, scanning, auditing, all that other stuff, shipping to production, and then doing it all again over and over a dozen times a day.
What am I dealing with when it comes to container security in general, and what tools do I need to make sure I'm implementing, and what order should I do these things in, right? Should I be inspecting my code for security vulnerabilities in my SQL statements before I actually audit my S-BOMs in an image container registry? Like, these are all questions I get every week. So a lot of my talks are coming out of the, I call us normal people that are just trying to get our jobs done, and that's where this talk starts.
So it's essentially a hybrid of a lot of the talks you're talking, you're seeing today and yesterday about security, about building and deploying all your containers. Everything in my world for the last decade has centered around containers as the object that we're dealing with.
Containers, to me, are the next big evolution. They have been for the last decade. We had VMs before that. We had the cloud before that. If you're gray like me, you know about the Mainframe to PC evolution that we had. We've had lots of technical shifts. Containers are the next one. That's why you're here, and I want to give you sort of the reminder of the problem and let's start working on solutions. I am not here to create open source or solutions. I am here to use them. So what's a prescription for the use and when do I use them?
So this is essentially what a lot of us deal with. If you're a developer or operator or someone in the middle that's dealing with containers or Kubernetes or the cloud, this is your job. And these are two opposing goals. I need it to be as safe as possible, but you need to go way faster. And that creates tension and friction at every level. So I consult for that. I talk with people about that, and I like to think that I'm pretty good at explaining it. But I'm relying on a lot of this industry best practice stuff that's coming out, right? And so these are actually very new. Like OpenSSF has been around a while, but the SLSA is fresh, it's new hotness. And there's more than this. There's actually way more than this. I just didn't want to put a huge list of all the backend resources we have. GitOps, Open GitOps, a lot of other things happening to standardize this so that you don't feel like there's this nebulous ecosystem between, "I just wrote code. I need it in production in a safe and reliable manner that can be audited and protected, and I won't be on the front page and fired because I did something wrong in the middle." But there are 500 pieces in the middle. So that's where I operate.
Today, I'm going to try to break it down and go as fast as possible. This is not to give you a to-do list. And we're gonna go a little bit fast. But again, this is stuff you're going to absorb later. You can ask me questions. I live on the internet, so you can find me virtually everywhere. And this is basically what I'm calling continuous delivery for DevSecOps because it's focused on security, but we're doing DevOps things. We're practicing DevOps mindset. And to me, these are the five main stages. And when I say "monitor," I don't mean like monitor your servers. I mean I'm monitoring for change. I'm doing GitOps things. I'm monitoring the security as it evolves in real-time. So if you think of these as security places to focus on, every one of these slides is going to tell you what stage it's in so that you can understand that starting out, you don't need to focus all of your security efforts on making the container. There are some things you're going to need to do on servers and some things in code and some things in the middle. Every part of this is a puzzle, and it's changing. So I'm going to try to help you along.
Containers, the reason we were able to have that last talk, if you were here for Slim AI, it's a fantastic awareness of the problem we're dealing with. The problem always existed. We always knew it was there. We just didn't know it. Docker shined a light on the ability for us to see all of our software and all of its dependencies in one read-only format known as the container image. And that day, that was born in 2013, was the dawn of this evolution of us now going, "Holy cow, I have to be responsible for all my software that I mostly didn't make, and I'm going to be in trouble if there's a problem." Like, we are shining a light on this problem. So don't fret. We're just now becoming aware that these always existed. And of course, the bad guys are working just as fast as we are. So there are more security vulnerabilities every year than the previous year. The pace is rapidly increasing, but the tools are also rapidly increasing their ease of use. And that's what we're talking about.
Before this, if you're around long enough, over a decade ago, Docker, the production server was the artifact we were shipping. Oh, sorry, we shipped VM images. Who here has done this, shipped a VM image to a server, right? AMIs. This was the artifact we had it. But it wasn't public. Open-source vendors usually weren't. Maybe Vagrant, if you're a Vagrant person. There were sometimes a public shipping of VM images for people to consume, but we weren't doing that. It was all private. And the only way you knew that there was a problem was someone would scan your server. You had a security team that would wait for you to put your software on the server, and then when you were perfectly ready for production, they would come in and wreck the shop with their security scanners, doing the right thing they should be doing. But that's the moment of the artifact, and that's the physical server. So we now have these artifacts way earlier in the process. So it's a good thing. Let's all be happy for a moment before we go into this field of despair that I'm about to give you.
This is essentially what Docker gave us. And then, you know, obviously, Kubernetes magnified that by allowing us to run many containers on many servers. But we essentially said anything you want to run, this is why the sort of the secret of the success of containers is it didn't require new code. Anything you want to run, remember what Cobalt? You want to run that 30-year-old thing as long as it runs on Linux or can be built on Linux or similar-like operating systems, we get all these features. We get immutability. Let me see if I can use my fancy list. It's not going to work. That's not going to work in here. I have a magnifying glass on here, but it doesn't work. We added, we basically get all these features out of the box that we didn't get before. We get immutability. Container images are read-only by default. There is no way to edit an image. You have to make a new one. We can verify it with a SHA hash. We can sign it now. This is all getting way easier with the signing stuff recently. It's reproducible. If you design your Dockerfiles, you can reproduce it over and over and over again. Metadata comes with it, so we don't have to write a readme about what's in the image. We can look at the metadata and kind of figure it out. We can look at history. And then when you're running the container, you get traceability. You can confirm that that container came from the image, and you know where that image came from. So these are huge benefits. I want to celebrate the win for a second because literally next month is the 10th-year birthday of Docker, and we should celebrate and say, it seems bad, but we're actually doing pretty well compared to 10 years ago.
So the problem though is that we shed a light on this. So now the bad guys and everyone else can see the problems, the imperfections that we're all making in our choices around software. And so then we get these right headlines everywhere. Oh my goodness, images are bad. Docker is insecure. CVEs are everywhere. This is all hype. They were always there. They were on your servers in production. You just weren't scanning before Docker. You didn't know this. Now, granted, we had less back then, but we had less vulnerabilities because we weren't using as much open-source software. So the vulnerabilities were kind of secret, and only elite hackers knew them because they were finding them in closed-source software. So I look at this and I think it's just headlines for the sake of headlines. Like, this doesn't necessarily mean that Docker is any more insecure than GitHub has insecure code. It's the same problem. We need to understand our software better and then mitigate.
So I look at that and think, okay, great. Well, they're just giving us more problems to solve. We're going to solve them because that's what we do as software people. So we have this Docker common artifact. Let's lean in like Slim AI and the rest. I love Slim, by the way. So, you know, I'm a big fan. We're going to talk about them several times this talk. But let's lean in and start securing it and finding all the ways we can create audit, we can create security, we can enhance things. And I like to focus on easy.
So, I tend to give the recommendations first, the biggest bang for the buck, the things that are going to take you an hour or two to implement but are going to reap benefits the rest of the lifetime of that app, right? So, I also mentioned this earlier, but I'm an implementer. This is why did I make this talk. I'm not inventing anything new, I'm not giving you a new tool to solve the magical problem of the day, but I'm an implementer and I do things for a living. I'm too busy to think in theory, so I let the theory happen with the great with all the companies, all the innovation, all the open source, and I try to figure out how are we going to all do it together. And then I teach it on YouTube and on courses and to clients. So, I come from that part.
So, this is going to be three parts of the talk because of the fact that I spent the last month having COVID and having the flu back to back, a lot of this didn't get finished. But I don't have the time for that in this talk anyway. So, there are three parts that basically, it's easier, harder, hardest or do this first, do this next, and do these last. This is my opinion. I work with a lot of different clients. I work with big companies, small companies, government startups, you name it. I work with all of them and I try to generalize. So, it's not like this is the only format. I'm just trying to get you productive as fast as possible. My goal is that every one of you comes away with at least one or two things you didn't know existed that you could probably implement in less than a week, if not less than a day. So you're going to see, we're going to start there and then we're going to move on.
So, the very first one, we're diving in deep. So, the very first one, pin your dependencies. Not a new thing, but on average, it's not done every time I walk into a client, I start looking at their stuff, they're not pinning apt-get dependencies. So yeah, so maybe we don't care about curl, but did you know, I'm going to hate on curl for a second, did you know how much stuff curl puts on your servers when you put it on there or in your container images? It's insane, ldap, SSL, all these various protocols that you don't ever use. So, I have a case for wget because it's very, it's much smaller, it's less than half the size and way less dependencies. So, people don't pin stuff because they're like, it doesn't matter. Well, it does matter because now the CVEs are going to blow up on curl because curl has a lot of CVEs. So, I generally say, don't put them in your images, but people don't listen and they do it anyway, and I understand that need. I still like shells in my images, so I do that too. Pen, pen, pen, pen.
So, you'll notice at the bottom, I'm showing you where in your pipelines this is involved. This is very beginning, right? These are Docker files, no big deal. If you're an npm person, you know about things like CI, which is a more production, it pins every single thing and uses a JSON file to make sure that you're getting exactly that version of the code, not an approximate version of the code. So, pen, because you want this thing to be deterministic, sorry, and repeatable. You want to be able to build this thing five times a day and it's always the same thing. You can't do that without pinning. By the way, automation, magic of automation. You have to automate these things. Almost every single one of these things I'm going to tell you, if you don't automate it, it won't happen. Who's here a fan of GitHub Actions? Anyone tried GitHub Actions? Okay, I see you people.
I'm a huge fan. I think it's going to change the world. It's just me, but I think there's a consolidation happening in CI. There are tons of products, tons of niche tools, but GitHub Actions is where every vendor is going. All the demos are in GitHub Actions, and most of our code is there too. So, I'm starting a course. This is a little self-promotion, but I'm starting a course, a live Zoom-based course for small groups of people that want to learn GitHub Actions for DevOps and GitOps. The new thing about GitOps is the Argo CD. We're going to do it in short little moments of Zoom meetings together, and it's starting next month. I'm going to be repeating it if it's successful. But come to my website, take a picture of the link at the end, and you'll get that stuff, and you can sign up or ask me questions. So, we're lazy. We'll talk about SLSA later, but you have to have automation to even get started with this stuff. So why not do it now? I recommend GitHub Actions.
Linting is not a thing that we talk about enough because it seems passé, but I am a huge fan of these two tools, and I use them in every repo and every client for the last three years. No company has come to me and left without some linting being enforced on them because we, one of the biggest things is not only is it doing more than linting now, but most of these linters are all actually they're doing more scanning and more like, hey, you did this wrong, and it's going to be insecure in your Terraform, which is considered a linter, but it seems more than linting nowadays. These two projects use over 50 different linters in a mega pack that I run on every commit of every PR of every repo that I work with companies on, and it changes behavior every single time. It changes behavior for the better. People fuss, they don't like the way that the Ruby scanner works, whatever. It changes behavior, and that's what I'm looking for. So, do this, do it early, and it's easy. I have tons of examples that you can literally copy and paste into production-level quality code right there at the bottom. So, this last one, in fact, I built for a company that's now bought by Google, and that's what those templates came out of that I'm using for all my clients now. So, that very bottom link, real good stuff.
Did you know that you can auto PR the security updates for all of your code in GitHub? And, of course, all the other ones tend to have this too, but GitHub's Dependabot is configurable, controllable, and automatable in a way that I can, as one person, for all of my courses and students, operate dozens of sample repos and keep every security update up to date because I can now that I aid of GitHub Actions. I can have all those automatically updating depending on requirements, depending on the libraries and what package manager it is, and I can even keep my GitHub Actions up to date from all of the community code that's doing all the things like building images and deploying Kubernetes clusters. And so, those versions are all getting updated as well, and I get PRs. I can look at the change in the diff. I can decide if I want to adopt it, or I can just automate it and I never see it again. You know, I test in production.
We're going to talk real quick about CVE scanning. It's not a new thing. Everyone's here talking about it, but I have opinions, and I shall give them. So, Trivy, one of my favorites. It's very popular right now because they're becoming a meta scanner. They're doing way more than just CVE scanning now. They do licenses and a lot of other things, but there are others, and they all have pros and cons. They're all great, mostly pros. I don't have a lot of cons there. But this is in the test and audit phase, so CVE scanning is also very easy. I have examples of ways to do this on every commit with 10 lines of YAML. People, 10 lines, and then at least you'll have it. You won't necessarily have to, like, block on it or do anything about it, but at least you'll have the CVE scans so that you can reference them later as an artifact if you need to. A lot of this stuff I like to turn on and we do nothing about for a while and get people used to it. We have it, there's a reference, and then eventually we start to do things like lock things down and say, "Oh, you can't have critical CVEs anymore if you're going to go to production." So, we phase it in, but the first step is always get these things running in automation, get used to seeing them. We'll worry about, maybe in a month, we'll worry about the actual labor of fixing this stuff. If you're locally, it's great. You can use all these local, but I automate in CI. I do all this stuff for every commit on a pull request.
There's other stuff here too, by the way. There is a remediation plan. I'm talking about, like, this is basically the phase-in part, and eventually what I use, I work with companies on, is eventually we get to the point of saying, "No, no crits in production." This is a moving target, it's hard, we'll talk about it later, but we start out saying, "Okay, we're not gonna do anything, we're just gonna keep shipping. We don't want to stop production." And then we're going to start demanding that no criticals happen, and then we're going to start demanding that no highs happen, and we're going to block PRs because of it, but we phase that into not upset anyone, right? So, that's a thing.
Standardizing your pods is actually harder than you think. So, one of the challenges of Kubernetes is it turns off Docker security features that were there with Docker by default, like setcomp. Don't get me started. I'm not a security specialist, but I've been doing security for 30 years as my job of being a DevOps and CIS admin, so I feel like that should have just happened, but the maturity wasn't there. There's reasons, it's all a good thing. We've got a new, there's an alpha feature that you can turn on to enforce it, but the point is, it's hard to make a really nice pod spec template. Luckily, I did it for you. So, down at the bottom is a link, that's a pod spec that I've gotten feedback on. You know, Twitter has given me opinions, and they mostly approve, but it's actually three times the size of that screenshot. So, it's three times longer than that. It includes all the things that every single workload in Kubernetes should have turned on or have done to it, like not running as root, having limits on how many resources it can use, having probes for making monitoring, making sure it's up. These are all things that everyone in your pods has, needs to have, and Kubernetes does not really help you with that. It does not give you advice, it doesn't give you a good template. That's why I made it for you. So, this is what I also use with clients. This came out of work with multiple companies over the years, several of them financial and security companies, and it's very universal. Almost everyone can implement everything in that template with a little bit of work, and then you'll be way safer for it. And this happens at deployment time.
So, we're down there, you know, that on that bottom row, there deployment is more Ops, right? It's a little bit more DevOps-y, less develop-y. So, they have something to do. Providence is something that is relatively new, and for those mere mortals of us like me that aren't security professionals full-time, they've been talking about it forever, but I'm just now learning about things like Anastasians and Providence and stuff. So, did you know that you can now, as of like yesterday, brand new Docker stuff, as well as stuff by ChainGuard, which is also another great company, ChainGuard's awesome. They're doing a lot of great work. But Docker has now made it so easy that you can now have extra metadata in every one of your containers, literally as of yesterday. So start looking at updating your Docker stuff. Hopefully, it doesn't break anything, but you're going to now be able to start putting way more metadata about how the image was created, including things like the Git repo, the CI job ID, and URL of how to go look at the CI job. The build timestamps that you can actually customize based on your requirements. The full entire real original Dockerfile can be in there, which we've never, weirdly, we've never had that before. It's, you've had to usually read hieroglyphs and stuff to figure out how Docker made the image. And then build steps, you can map build steps to layers. You can tell it what tools you use to build. It's all in there by default now. And if you want even more, you can actually turn on a Max field that basically dumps a bunch of data. This is all good stuff because now we can start to understand more about what's running, where it came from, who did it, when was it done, how was it done? It's really important stuff that wasn't there for the first 10 years of Docker's life. Sper easy to use, just like it's, it's almost automatic, completely automatic out of the box. You may not even know it's there.
So, just as a real refresher, let's pause for a moment. What do we just do? We did six things. They touched almost all of our infrastructure. But if you've got someone that knew most of this stuff, you could probably turn all these things on in your CI in less than a week, right? Less than a week's work. You're already depending on which security stuff you're following, the SLSA or whatever, you're beyond level one already, and you just did a few days of work. Hopefully most of this stuff, I all have templates on my GitHub, so you can find all this stuff, copy it. It's all meant for production. A lot of it is nearly all of it. I'll tell you when it's not ready for production if it's iffy.
So, now the next phase of this, this is going to be a little harder. This is going to be a little more work for you, a little bit of planning, but it's well worth it. These are all things I don't put in fluff. Like, this is stuff that I want to change behavior or I want to, I want to know that I'm actually reducing exploitable potential scenarios. Like, I don't do fluff. I don't do hype. I've been doing this too long, I'm too gray, I don't have time.
So, the first one here is configuration scanners are actually really great. They're just not out of the box. So, you can scan a Docker host in CI. You can scan a Kubernetes cluster. You can scan the configuration you have for the cluster or for your apps. All these scanners exist. Almost all of them are free. If, by the way, if something's not free, I mention it. I missed something about paid there, but I think all these are actually free. And some of most of them, I mean, all of them I've used. I'm not an expert in all these, but a lot of them have existed. Like, the Docker Bench one has existed almost a decade. The KubeBench has been around a long time. The KubeScape has been around for years. Armo does great work there. And these are all very easily easy to do. You can put these in CI, and again, you just create the log and the artifact, leave it for later. You don't have to, like, because it's going to give you fails, it's going to give you warnings, and then it creates work for you. And everyone has this angst around like, 'I don't really want to know how bad it is because then I have a bunch of work, and I don't have time.' So, I always advocate, 'Hey, just turn it on, don't tell anyone'. Just have it logging, and then when you have the time or when it becomes a priority, your boss says, 'Hey, we need to be scanning clusters.' You can say, 'Actually, we've been doing that already. We just needed you to reserve the team's time so we can actually deal with the failures.' So, I like to do this stuff. It actually, to me, it's a little bit of cluster monitoring, a little bit of testing and auditing in the middle there, so it kind of spans both of those.
License scanners are a thing now. I recently dealt with a company that, if anyone doesn't know what COTS is, this is actually an old term in the industry, commercial off-the-shelf software. Before we had open source everywhere, COTS, if you bought Office on a CD, it was COTS. Well, licensing matters to some companies. If you're not one of those companies, you're amazing. Like, great, you can ignore this slide. But for a lot of us, we can't use GPL, AGPL, or we can't, you know, we're a government facility, we only can use these licenses. Well, I just took a simple little one of my demos and ran a license scanner from Trivy on it, and I got back that I had 209 different licenses in my simple little demo app. 209 licenses! How can I even understand what I'm dealing with, right? This is a new problem that's probably going to get worse as lawyers get involved with containers, which, if it hasn't happened to you, it may happen. I've worked with some companies, and I didn't realize how bad this was. I had no idea that one simple little React app could have 209 licenses. Anyway, these license scanners are meant to be automated. You can probably find someone who already built it on GitHub Actions. If you didn't know, there are 15,000 open source actions that are almost drag and drop. You just say, 'Hey, I want to build Docker. Hey, I want to scan Trivy. I want to do a license finder with Pivotal,' and you just find all these things. They're almost all there. Every company is making them for free, and you just pay GitHub, and all the other work is done for you. That's really the power of GitHub Actions and why I'm so pro GitHub Actions, because every other CI out there doesn't have 15,000 plugins, and they would love to have that, but I go where the energy's at, and GitHub Actions is where the energy's at.
So, this is really cool. I've been wanting to do this for the longest time, but GPG is nerd level, I'm sorry, but it's like geek super nerd level stuff sometimes. If you're someone who was back in the day of PGP (Pretty Good Privacy), it was the first big scandal on the internet, which was “the hackers can encrypt”. Yeah, well, you know, you can use tools for good or bad.
So long ago, we had this thing called PGP. The open source version became GPG, and that's what we've been using for a lot of signing stuff in the old school ways. Critically good technology, it's just complicated, and most people don't do it.
So recently, we've had, in the last couple of years, it's gotten easier to sign Git commits to the point that it's almost no reason to not do it. Did you know that anyone can be you on GitHub? All they have to do is change their Git config file to your name and your email address, and it will accept the commit. And it's their repo, assuming they have write access to a repo, they can be you or Bill Gates. It looks just like them. The only way to know it's not you is if you sign it with a key that only you have—the key, the secret to.
To write this, it's sort of like, "Why isn't this on by default?" So GitHub and GitHub and all the other ones do it too, but GitHub is sort of advancing the way by saying your SSH keys that you probably already have can now sign. And we do S/mime, but that's old and that's complicated. We're not going to ask MIME anymore unless you're very, very enterprise-y. But you can now just basically... This is like... It took me, I did this the other day, and it took me maybe five minutes to read the documentation, enable it, and I already had the SSH key, and I already protect that myself. I figured that out years ago. So I did nothing new, and now all of mine are signed, so you see the verification if you want to go hardcore. By the way, I recommend it. Yeah, you can turn on Vigilant mode. And Vigilant mode means I am someone who will always sign. And if I don't, tell me, or tell the person that's looking at my commits that it's not signed. Because if you saw that feel my signature, see how it says verify? That means I signed it. And then the ones that I didn't sign, it's just, it looks fine. Like, it, other people didn't sign. Actually, that's, Jerome and someone else there. They didn't sign. It could be someone else that had write access. They still have to have permissions to the repo, but if they do, they can pretend to be another employee or another co-worker. So the signing confirms that it's really me. And here, all I did, I'm a 1Password huge fan. Who here's 1Password? You like 1Password? It's developer-friendly. It does SSH Keys. It does Kubernetes. It has a Kubernetes operator. You can store your secrets and everything in there. 1Password's awesome. It now can do this. So it's like... It's like basically turning on a button in 1Password. And so I can sign my commits. My SSH keys are stored in 1Password. They're not even on my machine, unencrypted. Or you can do old school and encrypt it with a password on your laptop. That's fine too. And then if you turn on Vigilant mode, I didn't actually take a screenshot of this, but it will then alert everyone when you, when you accidentally forget to sign. And that's known as Vigilant. So, I... I think this is going to become a trend. I hope it becomes a trend. You know, like, uh, TLS everywhere, right? Have you ever used Let's Encrypt? That basically encrypted the web for our safety and for the benefit of everyone. And now I think, like, this SSH signing of Git commits is going to become the... It's going to become the standard way everyone expects you to sign your commits. Why would you ever put code into production, especially open-source code where everyone else has to trust that it was you? Why would you do that without signing? So I think... I'm hoping it becomes a thing.
Git Branch Protection has really grown up. So what this is is any repo that you make, you have a main branch or a master branch or the default branch or whatever it is that you ship. And GitHub and the others have been adding more and more features over the years to help protect that special branch from your other branches, your development branch, your PR branches, your, "Hey, I had to fix Bret's English because he only speaks one language and he can't even do that right." Like, those all those things exist. I get those PRs all the time because I don't know how to use spell check. And so this stuff has gotten super simple. It's checkbox-friendly. I took a screenshot of a lot of the options, and there's more than this, but it can really, really, really tighten up your security. And if you're someone who's a GitHub person, you've seen the pop-ups lately on repos that say, "Hey, did you know you don't have branch protection? Turn it on." I'm telling you, it's super easy, and it gives you a warm and fuzzy to know that, "Hey, no one's going to sneak code into the main." And because you can do this, you can say they have to sign their commits, It has to pass all CI checks, Someone has to approve it with a review and if all that happens, it automatically merges. And so you can really start to lock this stuff down with checkboxes, which that's what we all need because no one wants to write more YAML. So that's like a PSA there. Like, please do that. Check it out. It's super easy. I've worked with multiple teams on this, implementing it. It almost never goes bad. And when it goes bad, it's just unchecking something or adding someone's permissions. It's low-risk security advancement. And it's probably a part of audits if you're someone who's dealing with SOC 2 or these other certifications. There probably will be a question about, you know, what do you do before changes get pushed into production? And that's a big part of his branch protection, so I recommend it.
All right, slim your images. So I didn't actually was trying to get some screenshots this morning, and I didn't get them. So there are multiple ways to slim your images. There are multiple strategies, and I think all deserve a chance. We've actually got two great ideas that I'm tracking. Slimify is one of them, and Wolfie is another one that I think is a really great idea. But they're completely different ways, and I think they both have a strong place in the community because we need both ways.
But the reality is, most people are still using official images from Docker, which are totally fine except in production. So one of the biggest mistakes that I think Docker made—and I give them a hard time about this all the time, again, I'm a Docker Captain, so I'm supposed to be like an advocate—but Docker was primarily easy to use, and they couldn't possibly have managed, imagined billions of downloads a month or whatever it is now. So the default images, when you do like a "Docker run MySQL" or a "Kubernetes deploy WordPress" or whatever, those images are extremely bloated on purpose for ease of use.
But we learned recently that Slim AI has determined that this is a nightmare for security. And I scan production workloads for my clients all the time, and just two years ago, I scanned a Ruby app that had 2,000 CVEs in one container. How in the world do you deal with that? I can't even deal with a hundred. If you give me more than 20 CVEs, I can't process it. It's too much. My mind won't wrap its head around all of the analysis that I have to happen. So my rule of thumb is, more than 20, give up. Like, you need a new strategy, or it's just never going to get fixed.
So Slim is one of them. Slim happens at the end, which I like. You can take out an existing workload, run Docker Slim or the Slimify solution against it, and it will help reduce that count with no effort on your part. It's a little bit magical that scares people, but Docker scared people too when it was starting, so I'm really excited for them. And we hung out at KubeCon. It was a good time.
The other way is to start at the beginning and replace your image from your, basically, you're starting from scratch with a new Dockerfile and a different image. My number one recommendation to everyone here, if you're not using a slim image, that's the main idea. But the slim variant of your production images. And I'm mostly speaking to the developers, Python, Ruby, Node.js, Java, .NET, all these languages. The default image comes with an insane amount of dependencies that you will never use for convenience and ease of use. That was Docker's main focus at first. Because remember, containers before Docker were hard. They were really hard. So Docker made it really easy. They made it a little too easy to have too many CVEs.
So the slims are there. Python slim, way more, way better. It's not zero vulnerability. The Node.js is insane. I did an entire talk, which you can find on my site. My DockerCon talk last year was a half an hour rant about how horrible the Node.js official image is. And then I went through every option, and they're all almost horrible except for the two that I'm going to talk about here, which is Slim AI and ChainGuard.
So I'm not a fan of Alpine. If you're an Alpine person, I don't have any problem with Alpine. It just... It's caused more production problems than any other variant in my life. If there's going to be a production incompatibility problem, it is Musl, it is Alpine, it is the slight subtle differences of Alpine. And I wish it more luck, but that's not why we have Wolfie because it's sort of a variant that fixes some of the problems of Alpine. So a lot of people go Alpine, and it works great for them, or it can end horribly. You won't know until you try. I just have written off all Alpine use. I don't use it. Most people I know don't use it. It's not that it's bad. It's just with programming languages, for example, Node.js has no official support for Alpine or Musl, which Musl is the C library behind Alpine. So it only uses official C binaries, and those are not in Alpine. So if you're running Node.js apps, you can never get support if you choose Alpine. So great, you've gone to zero or near-zero CVEs, but now you're not even under support of the official Node.js company or project. So, I don't want to do that, and I can't recommend that to my clients. I wouldn't want them to go without support. So, unfortunately, you can't use Alpine, but you can use slim or these other options.
I prefer Ubuntu if I'm going to rebuild today. If you hired me to rebuild your app with the lowest CVE count possible, and I wasn't allowed to cheat with Slim AI and just automate it, I would say Ubuntu. I think Wolfie is the future, but we need to give it some time to sort of bake and reduce some of the complexity because it's a little hard to use, a little less user-friendly. So, I'm saying consider Wolfie, but skip Distrolist, skip Buildpacks, i don't recommend them. They're great ideas, but they all have fatal flaws that really prevent this. Distrolist, for example, you can't pin to patch images. They fall off over time. It gets complicated, and I basically have had to give up on it. Plus, sometimes they just fail pulling. I don't know why. Buildpacks, um, is not... If you want a lower CVE count, Buildpacks is not a strategy. So, you basically can't do that.
So, I do Ubuntu. Ubuntu has fewer vulnerabilities. In fact, I have an entire chart in my GitHub that you can go through that breaks down comparing images on every one of these strategies and the CVE count, all comparable at the same time. And Ubuntu was better. It was only one CVE off from Alpine, and it was a supported C library and on a supported distribution that you can get maintenance for with Ubuntu. So, Ubuntu is really, really great. It's small. It's got fewer CVEs than Debian. If you didn't know, Debian is actually what most official images, or I think all of them, actually, are built upon Debian. But Ubuntu has fewer CVEs in the scanners that get picked up by the scanner. And I actually got this idea from clients who kept doing this as a way to get security approval for production because they found that Ubuntu was a safer alternative, less CVEs. It still uses the same package manager. They didn't have to change everything, and it was still under support of all of their libraries and companies.
So, these two equally advanced ways for getting your image CVE count down, these are basically that today. Other than just refactoring it all on Ubuntu, which is still good and you probably still should start with that because it's simple, these are the two things that would work. And they're both ends of the spectrum. Slim AI is actually... I would actually need to put that down at the end. It's next. It's down the end of your chain because you act normal, and then it fixes it retroactively. And then Wolfie is saying, "I'm going to throw away everything I've done. I'm going to start from this new idea of how to make an image. And It's got great security fundamentals in it, but it adds complexity because it's brand new. So, it's basically like adopting a new distribution, a new way to package. Um, and they're getting better. It's already gotten some improvements that are making it way easier to use if you're using, like, Node.js or programming languages. So, they're rapidly innovating. I have hope for them, but it... I would still probably do Ubuntu first and then do these later or just turn on some AI, see what it can do for you, and if it can reduce your CVE count, just go with that. It's sort of the shortest amount of effort, is the Slim AI problem or solution.
So, central policy is actually getting really easy, and this is super important for all of you. You can do this. It's mere mortal stuff. And I'm unfortunately going to run out of time and not be able to talk about it, but Kyverno is my favorite. There are options, and what this is is, this is taking YAML and describing the security policies you want in your clusters in YAML. Easy to understand templates that are copy-paste. They're very short. There's no new languages or things you have to understand. That Opa is... Unfortunately, it means you have to learn a new configuration language, and you have to write your policies. But Kyverno is the open-source app, NoMod is the company behind it. I had him on my show last year and very impressed, excited to me. It's on every one of my clusters. Like, I will not deploy a cluster if one of these isn't installed. But I think the Kyverno one is my favorite because it doesn't require... I can say all images must be signed if you're going to put it on my server or in this namespace, It must be signed. These images, all applications are going to be on my cluster will be denied unless they have CPU limit or memory limits. So, you can make those policies as a DevOps person that doesn't require developers to be Kubernetes experts and put those in on the cluster. And then you can run scanners in your CI that will make sure it will work so that they don't find out on production day that they didn't do the YAML right for Kubernetes. So, I find this very approachable. You can scan for it. You can basically audit to make sure that before you go to production, it's gonna be allowed. And you remove a lot of the work from developers needing to know all the things in Kubernetes they need to watch out for so they don't get in trouble. I love it. Central policy is great.
So, what did we just do? Let's pause for a second. We're near the end. We did all those things. That's maybe a few weeks of full-time work if you did all of them. And, some of them are really, really easy. Like the Git branch, the Git commits. Like, those are all, like, you can probably do that stuff in an hour or two. You read some docs, you take a screenshot, send it to your co-workers and say, "Look what I did. I'm signing commits. Are you gonna get... You know, give them a little hard time, tell them they're insecure." But we covered a lot of ground on all the different areas with just a few little projects. And we're just checking off each one at a time, implementing, adding security, adding auditing, and all those things. And then you get to the point of being the next frontier, leading-edge stuff.
This is the part that I wasn't able to finish due to sitting on the couch feeling sorry for the fact that I had the flu and COVID. So these, I'm just going to basically mention, they're going to be blank slides because I've got an entire repo and a course actually based on this entire repo of all this stuff going to be open-source. So, at the end, there's the slide you can take a picture of, and you can get on my newsletter and stay up to date on what I'm building here, and so that you can find more information on this stuff. But CVE monitoring is a thing now. That when you scan for, you know, CVEs are a moving target, tomorrow, there will be a new one, and you scanned yesterday and then you put it into production, but how do you know tomorrow, that production, that you know, that new vulnerability is now a problem because you have it in production, you didn't know it.
So, CVE monitoring is what I'm calling this. Some of these scanners are now able to basically keep, even though you're not building, they're going to keep track so that it's a moving target, and you'll know what's happening. But I actually have very, very few people doing this. . That's why I saved this to the end because I work with companies that claim to be financial and security companies, and they're not doing this stuff. So, I feel like maybe these are more advanced things. If you're doing all this stuff, by the way, you need to be up here doing this talk next time. So, why are you listening to me?
Server behavior monitoring. So, these are actually pretty slick. One of my favorite examples is to turn on Falco, deploy it on a cluster, and then try to shell in a container because I put a rule into Falco that says alert me or the proper authorities if someone shells into a container because that should probably never happen in production. You know, in general, people shouldn't have access to shell or SSH into your clusters. You don't need that. And so these tools start to make it really easy with their templates and default ruling to know without advanced solutions that require Datadog and Prometheus and all this stuff. You can actually get all that out of the box with a few easy utilities.
The last one here, the Tracy GitHub action, I'm really excited about. I'm going to be implementing this on every single one of my GitHub actions. Basically, what it does is it has an action that you put in first on your first step because when you use GitHub actions or other public infrastructure, it's basically provisioning a VM, running your builds and your testing and all that stuff, and then destroying the VM. Well, how do you know that that VM can be trusted, right? Whether it's yours or the internet's public internet, how do you know? So, they've actually taken their Tracee scanner, put it into GitHub action, so you install it first, and then it tracks for bad behavior that you shouldn't have happening on your CI infrastructure. And then at the end of the run of your workload, the whole machine gets blown up anyway. So, it helps give you a nice warm blanket of, "Maybe I'm building auditing," you know, "you know, how do I know that I'm taking I'm signing the code, but how do I know that bad behavior wasn't injecting code, right? We've all heard about SolarWinds. So, we don't want that CI stuff to happen. We need to start taking action to lock down our CI. But the problem is, CI is where all the magic happens anyway. So, it has all the keys, it has all the access, all the robots are taking over. How do I help? And Tracee has a literally, like, a three-line YAML that you can implement for GitHub actions and make it safer. At least you'll know if bad things happen, signing images is still a pretty new frontier, but we just got huge advancements.
There are some great talks out of a brand new Kubernetes conference that just happened in, I think, in Seattle. Cloud Native Security Conf. Does anyone remember the actual name? The Cloud Native Security Conf. So, it's a brand new conference because the security topic in Kubernetes now deserves its own conference, not just a day of the conference, but all its own conference. There are great talks about a lot of this stuff. It's all pretty new tech, but it's getting... It's going to get pretty easy, and it's going to be pretty soon. It's going to be so easy that we're going to ask, "Why aren't you signing all your images with keys or even going keyless as some of this will do? And then you have to verify. You have to verify signatures. So, you need to verify that the signed images are actually the ones you're going to run. That's a whole... Again, this is future tech that I haven't even figured out. I have all the notes. I just didn't get them in the slides today.
GetOps signature verification. So, what this means is that you can now, before you push to your servers, if you're using GitOps like ArgoCD, you can now simply say to it, "Hey, don't even try to deploy unless my images are signed by my key. Just don't even try." So, and so you're sort of shifting left and saying, "I'm not gonna wait till it gets to the cluster and then the workload fails. I'm gonna audit it and basically not deploy it." So, ArgoCD now does this. So, I'm calling it GitOps signature verification.
We all heard about SBoMs. I'm not going to get into the SBoM thing, but that is a huge moving target right now because it's one thing to make them, it's another thing like, what do you do with them? Where do you put them? This is all getting worked out in real time right now. Next year, if we have this conference, I hope we have this conference, this will be a different story. That's what I'm calling SBoM gating, which means you have to validate the SBoM and make sure that there's nothing in there you don't want and then block any production deployments. That's a little fuzzy right now. It's a little chain guard. It's got some stuff, but it's not quite clear.
The last thing I want to throw in here is code analysis. This is also getting super easy to analyze your source code for bad behavior, and the tools now, GitHub has one built-in. You have to pay for it, but it's just a part of enterprise plans if you have that. And Snyk has one called Snyk Code that I think they recently open-sourced, right? Didn't you mention that? I think, oh, you're shaking your head no. I thought Snyk Code was now free or it's free, not... Okay, maybe not open-source but free. It used to be a paid feature, but you can run this in your CI to look at your code and give you intelligence like, "Hey, you have a buffer overrun here, or you've got a SQL inject," or all the things that we're doing in code. That's very early, right? That's at step one, and I didn't talk about it because it deserves its own talk.
I'm going to skip ahead. You have a roadmap. It's a lot of stuff, I know, but I have an entire community on Discord of 12,000 people all doing this stuff real-time. Come join us. It's completely free.
There's the slide. Improvements to come. You can get on the newsletter. Take a picture of that. It just sends you to that URL. And that's my talk.
Stay up to date
Sign up to the Navigate mailing list and stay in the loop with all the latest updates and news about the event.