Operationalizing Kubernetes within DoD

Transcription

Thanks for joining. I am Ryan Gutwein. I'm with the Ignite Assurance Platform. My director, at a high level, we are a governance risk management compliance platform. Can anyone hear me? Is that better? Alright.

So, Ryan Gutwein. I'm with the Ignite Insurance Platform. I'm one of the directors. This talk will be about operationalizing Kubernetes within DOD and really how we defeat some of these bottlenecks of bringing emerging technology into the DOD and other regulated environments.

Again, Ryan Gutwine. I come from an Air Force background. I was in from 2005 to 2014. Deployed a lot to the Middle East and used a lot of technology that wasn't really modernized yet. A lot of just traditional security, laying down fiber, configuring layer two, layer three switches. There's a lot of traditional security stuff. So, I consider myself like an old-school information assurance guy. I have done a lot of operational security stuff within the DOD. I have also done private sector work for large financial services companies like Equifax, TransUnion. So, we have about 12 years of experience doing this work for large companies and really have started modernizing over the last three to four years, containerizing our own application, deploying it as Helm charts, and really just kind of shifting left from compliance because compliance has traditionally been just a paperwork kind of thing. And so, we're really putting this into action.

This is the agenda, high level. We'll go over system authorization, just giving you some context into the DOD. How they secure and deploy their systems and applications. Is it optimal? Is it the right way? And then we'll get into how the DOD has started to enable some of this technology, like Kubernetes. We'll get into some of the software factories that are being implemented within the DOD, Platform One being the main driver there. Then, we'll get into component aggregation, how we're automating this work by looking at the entire stack and really executing on compliance as code initiatives. Then, I'll get into a basic demonstration of the platform, how we're automating this stuff, and also IronBank, which is the DOD container hardening repo. We have a cooperative R&D agreement with the 16th Air Force, Air Force Cyber, so we're programmatically bringing in this data into our own platform.

You want to see the problem? If you have a month, we'll go through all these boxes. This is the Integrated Defense Acquisition Technology, and Logistics Lifecycle Management Framework. This is one of many bottlenecks within the DOD of bringing emerging technology in. Obviously, this is not working. The DOD is currently in the middle of modifying more of an agile acquisition framework. Again, just giving you some context, this is the NIST 800-37, the Risk Management Framework, and it's a very nuanced process. Meaning, it's not able to keep up with the pace of technology. This process comes from the NIST 837, and a lot of the business logic that you see here comes from this, and so we've baked all this stuff into the platform.

There's a lot of innovation going on within the DOD. There's probably about 20 software factories, all spearheaded by Platform One. But there's still a lot of bottlenecks within accrediting and deploying software within the DOD. Are we accrediting a process or a continuous state? That's what we need to figure out within the DOD. That's what they're still trying to figure out with all these different mission owners. We have a lot of great technology that is trying to streamline the accreditation process by having mission owners onboard Platform One, and they have all these different kinds of services, IronBank being one of them. We'll get into Big Bang Platform and things like that.

But I again want to give you some context. "Platform" is an overloaded term, and this represents the stack for the DOD. At the bottom layer, you see the infrastructure layer: SC2S, C2S, and Fences, which are all secret cloud environments and they all sit on top of AWS, Azure. Moving up to the platform layer, you have CNCF, your compliant Kubernetes stacks. Then there's CI/CD, continuous delivery, and DevSecOps, which leads us to the Big Bang platform. I'll delve deeper into Big Bang shortly. For the service mesh, they utilize Istio, Envoy proxy, and monitor the service mesh using tools like Kiali, Grafana, Prometheus, Alert Manager, Jaeger, and other observability tools.

At the application layer, we focus on the mission apps, so developers can consistently deliver capabilities, placing more emphasis on ads and lessening the burden on the platform.

Here's another visualization of how we layer the authorization process within the DOD. Starting at the bottom, there's the infrastructure layer where an agency might handle account management, provisioning accounts via Active Directory, for example. The platform layer is where much of the burden lies, with DevSecOps pipelines, logging, monitoring, container registry, dependency management, and secrets management. So, the platform team assumes these responsibilities, reducing the strain on the containers and application teams. The controls and requirements are sourced from NIST 800-53, CNSI regarding National Security Systems 1253, using the 853 control set, and FIPS 199 - the federal information processing standard that aids categorization. Categorizing the impact of systems and applications operating within the DOD information environment is a separate workshop in itself.

There was an S-BOMB talk earlier. John Osborne with Chainguard and Mark Wersteen with Tremolo gave a compelling presentation on S-BOMs. We don't produce S-BOMs (Software Bill of Materials) at a lower level on the software hierarchy, but it's something on our roadmap concerning how we can consume S-BOMs. It remains a work in progress. Different data formats are used to deliver S-BOMs, including Cyclone DX and SPDX, with the DOD receiving these S-BOM artifacts.

I mentioned the DOD DevSecOps reference design which you can google this. It details the data flow. CNAP is the Cloud Native Access Point which employs various tools like Istio, Envoy, and monitoring tools, also HAProxy for high availability. Then there's IronBank, where you source your hardened container images, base images, and platform tooling images. Big Bang is the DevSecOps platform I referenced earlier. It represents infrastructure as code and configuration as code, allowing you to declaratively deploy your platform via a Helm chart. By the way, the Big Bang platform is open source.

Navigating through the pipeline, it goes into its ATO staging environment and then into the certificate of field app. It traverses different classification levels for the DOD for cloud, such as IL2, IL4, IL5 - different impact levels. IL2 is analogous to Nipper or NipperNet. IL4 is similar to a lower-level Zipper, which differentiates based on the physical separation of networking, storage, and compute. IL5 is more like SIPR, and there's also an Impact Level 6, which is akin to the secret cloud or JWICS if you’re familiar with the DOD classified networking scheme.

So, this is a data flow for the Big Bang platform. As a system security manager, because that's who we are, we typically want to capture everything that's contained in the platform in order to effectively provide assurance to the stack. You can see how the user logs into the Big Bang platform. You have the Ingress load balancer, it goes through Flux and Argo CD, which is declaratively deploying your Big Bang application into whatever cloud environment, whether AWS or Azure. Then you see the example YAML files, how these are deployed, how the monitoring tools are monitoring the Kubernetes clusters. OPA validates the mutating admission webhook for any policies that are enforced on roles for production or any other environments. Twistlock is heavily used within the platform as their main container scanning mechanism. This is a high level of the Big Bang platform. Their core stack includes Argo, Flux, Istio, Envoy, OPA, Twistlock, and all the monitoring tools. All of this is driven by Platform One, which was created by the Air Force. They've created various software factories and services. Is there a specific Wi-Fi in this room? Yeah. Okay.

So, OSCAL (Open Security Controls Assessment Language) was developed by NIST and the General Services Administration. We have also been a contributor to this. It's not OPA, this is compliance as code. You can deploy this in an XML, YAML, or JSON format. We have worked with the Air Force with our cooperative R&D agreement that we have with the 16th Air Force, which is Air Force Cyber. We helped build their Big Bang OSCAL component definition models. There are different definition models: the system security plan model, the plan of action milestones model, and the profile model. But we especially appreciate the component definition model because it's purposely built so that we can build component definitions for the main components you just saw. It's not just for the federal space. We can use this in the private sector as well, for PCI and HIPAA. We've worked with clients in those areas as well.

I'll show this during the demonstration. We maintain the DISA Secure Kubernetes DIG checklist within the platform. You can Google all this. The NSA has their own hardening guide. I provided the link to IronBank. Their GitLab repo is open source. And then there's Big Bang, their GitLab repo, and the Container Security Requirements Guide which was developed by the Defense Information Systems Agency. We have this baked into the platform, and I'll demonstrate how we automate the container security requirements guides. I know Armo's here. They have automated scanning tools, so we've worked with them on potentially automating the entire process of scanning, remediation, and applying control statements for any distribution of Kubernetes. The DOD uses Convoy and Rancher for their Kubernetes distributions, but this can apply to any distribution.

Now, let's delve into the aggregation techniques. Typically, when you have many components, like Big Bang or any modernized tech stack, you have numerous components. To save time and money, we want to break down the application. For example, you see Big Bang, Ignite, and Mission app module. While there are S-bombs and other processes, from a true compliance standpoint, many are still managing through spreadsheets. With OSCAL, we're aiming to create a runtime-based SSP and also make it reusable. This means you can declaratively deploy your compliance as code within your tech stack. It should be part of your GitLab CI YAML file or your GitHub actions templates. The goal is to automate control statements, making them reusable, so platform teams and modernized businesses move away from using spreadsheets.

Moreover, making it reusable is crucial. It doesn't matter what environment it goes into. Whether it's AWS, GCP, Azure, or a secret cloud environment, the definition for a component remains consistent. This is vital as we try to bridge the gap between compliance individuals and DevOps personnel. It helps developers provide precise data about the application's contents, serving as a foundation for building a System Security Plan, a key artifact for an Authority to Operate (ATO) package within the DOD or any compliance initiative.

Our initial process for component aggregation within the platform starts with understanding the workload, scope, controls, and the type of component. Conversations with engineers and developers are crucial to understand the target build. In the collect phase, we identify the components, whether it's a Keycloak container, Istio operator, Ignite app, or Flux container. We also consider system interconnections, especially those connecting externally to production. This recalls the data flow diagram, showing how the Big Bang platform interacts with external entities. Once we build and collect, we start aggregating language, control ID, and other elements. With Ignite, we utilize a lot of NLP in the background. Our aim is to create reusable control statements for each modernized tech stack. The focus is on control harmonization, language, and scope, so when it's time to create the initial SSP, all data is already prepared and can be pushed from the aggregated form.

Now, let's look at our platform. We handle compliance, pipeline issues, SCAP, and more. Many manage these processes with spreadsheets. If a component aligns with NIST 800-171, for instance, we review that workload. For a component under HIPAA, we'll examine that workload specifically. This approach applies to CMMC, an emerging compliance framework within the DOD, and the container SRGs I mentioned earlier.

Our process visualizes component aggregation. We have multiple container analytics, which, combined with our aggregation in the background, streamlines our efforts with the help of NLP. We can adjust stages, delve into requirements, and drill into each statement. The key is truly understanding the workload.

I mentioned the automated statements; we have these baked in already. If the client wants anything customized, like testing procedures, framework mapping, or tying any assets, we can add that in a separate tab. As we navigate through these aggregated states, I'll delve into one of the container analytics. This specific container aligns with the container security requirements guides developed by the Defense Information Systems Agency (DISA). These come straight from the regulation. You can also incorporate tags if there's a missing Plan of Action and Milestones (POA&M), which we refer to as a deficiency or a missing URI. Our platform is fully customizable.

You'll notice we have automated statements where the Keycloak EKS cluster is configured to use TLS. This container security requirements guide has about 173 requirements. As we progress, you can view the completion percentage for each stage. When we transition from the collecting phase to the aggregated state, you'll see automated statements again. With tagging, we recognize that Istio Envoy provides access enforcement, information flow enforcement, and other features, like OPA for access control policy. We'll inspect the control statement, which comes from NIST 800-53. We can edit these in real-time. We refer to these as Organizational Defined Parameters (ODPs), which are integral to the NIST test cases: NIST 800-53 Alpha. They're pre-built, allowing organizations to define their unique parameters.

We can also devise test cases, as per client requests. All of this originates from NIST. Once we navigate through the aggregated state, we reach the initial System Security Plan (SSP), which can be exported as a Word document or any format the client desires. What's vital here is the ability to transfer tags and statements seamlessly. This represents true compliance automation, not just a checklist. We're integrating this declaratively into your CI/CD, using OSCAL statements. Provide us with your stack, like a Docker compose or Helm chart, and we'll dissect it in the platform, breaking down components by stage, aggregating them, and assisting in building that initial SSP.

Moving on, I won't delve deep into the controls to keep things lively. Another feature we offer is the Security Content Automation Protocol (SCAP). Many recognize it in relation to STIGs and system hardening. The Navy initiated the SCAP, an automated STIG scanner, which the Navy Information Warfare Center (NYWICK) used to fund. It's discussed widely on platforms like Reddit. Over the last two years, Ignite has been in negotiations with the Navy for a license agreement. Consequently, you can see we have the Ubuntu STIG mapped to our compliance framework. This ensures all components and SSP elements align. You can directly link technical skins to specific components.

I also referenced IronBank, the container hardening repository. This is where the DoD sources its container images. They maintain a comprehensive catalog. Eventually, through our agreement, we aim to programmatically integrate these into our platform. The IronBank repository, as far as I know, is open source on GitLab. Their API for the vulnerability assessment tracker for IronBank, for instance, showcases vulnerabilities for components like the Istio operator. Our objective is to systematically pull this data through a JSON API and depict it in pipeline issues. It explicitly determines false positives and more. Another initiative we're focusing on with the Air Force involves pulling container vulnerabilities from IronBank and mapping them to compliance, akin to the SCAP tool approach.

To conclude, our platform's demonstration emphasizes the Open Security Controls Framework. The scorecard here is a metric we use to rate containers, ranging from zero to ten. This is what IronBank employs in the background. They also utilize tools like Encore and CORSIF to produce S-bombs.

Okay, so, some future initiatives. We love to collaborate with other teams that are interested in this work. We've done a lot of work before. As I mentioned, we've been around for about 12 years. We didn't initially come from a compliance background; we did TPM module development for Dell, which was our first project. We often venture outside our primary scope, but compliance and GRC or integrated risk management, this is our vision for the platform.

We're focused on optimizing FedRAMP small business assurance cases. We really look at container component analytics for the future of FedRAMP. Istio, Keycloak, Envoy, and other such containers aren't undergoing the detailed scrutiny of NIST controls or the granular IA work. I believe this will be the future of FedRAMP compliance. I mentioned our work with the 16th Air Force and the SCAP component development. You saw that in action. We're also going to open up our API. For those interested, this is our current API. We haven't fully released every microservice or module yet, mainly due to our agreement with the Air Force, but this is our API and our GitHub. If anyone's keen on collaborating, especially if you're passionate about GRC or if you're a developer, we welcome you. There isn't much work being done in this field regarding component analytics and assurance at the container level.

To summarize, I've provided a high-level overview of the context within the DOD, the challenges with their current processes, and their modernization efforts. The Air Force is driving a lot of this change. If anyone has questions about how to use Ignite or onboard with Platform One, feel free to reach out. We automate compliance processes for various regulations. We support a plethora of frameworks and regulations, which is crucial for organizations needing to adhere to diverse standards, including those outside the DOD. This is a brief overview of Ignite and how we're aiding in streamlining accreditation within the DOD. Any questions? We have about five minutes.

Yes, sir.

Our company assists the Air Force with ATO and continuous ATF. Are you involved in that?

Yes, we've undertaken several projects with the Navy, working on special projects concerning NNPI data. We've collaborated with software factories within the Navy and SpaceX as well. A lot of our focus now is on development work, like the OSCAL. I have an export of the SSP here in the YAML format, which provides a detailed look into controls, like IA2 Istio request authentication. This is the kind of detail we provide, as we did for SpaceX.

How do you interface with Platform One, considering they seem to be leading in this domain?

To onboard with them, you must first get your containers approved to operate within Impact Level 2 for IronBank, which is NIPR. Once onboarded and your containers are hardened and authorized for Impact Level 2, you can then move to Impact Level 4. These are distinct authorization paths, but our containers are approved for Impact Level 2 IronBank.

Thank you, everyone.

Share on Reddit Share on X Share on Facebook Share on Linkedin

Operationalizing Kubernetes within DoD

Summary

Transcription

Multi-cluster Failover With A Service Mesh

How To Gain Back Your Velocity When Working With Kubernetes

Monitoring Weather At The Edge With K3s and Raspberry Pi Devices

Operationalizing Kubernetes within DoD

Summary

Transcription

More talks like this

Multi-cluster Failover With A Service Mesh

How To Gain Back Your Velocity When Working With Kubernetes

Monitoring Weather At The Edge With K3s and Raspberry Pi Devices

Stay up to date