Choosing an Infrastructure as Code Solution
Speaker: Lee Briggs
Summary
Lee Briggs discusses the intricacies of choosing an infrastructure-as-code (IAC) solution. He emphasizes the differences between imperative and declarative approaches, explaining the directed acyclic graph (DAG) that underpins most IAC tools. Briggs delves into the pros and cons of various authoring experiences, including domain-specific languages (DSLs), configuration languages, and programming languages, advocating for the latter due to its flexibility and expressiveness.
Transcription
Today, I want to talk to you about choosing an infrastructure-as-code solution. You may have already decided on the infrastructure-as-code solution you're going to use. However, I'm going to discuss why you might consider what that looks like for your organization.
First of all, let me introduce myself. My name is Lee Briggs. Before my current role, which I'll discuss shortly, I spent 10 years as an infrastructure engineer. I've used every infrastructure-as-code solution you could think of. I even built one myself at one point. Additionally, I constructed a large platform for a Fortune 500 company. I've felt a lot of the pain that I believe is being addressed by cloud-native solutions now.
I have an account on Elon Musk's playground on Twitter, and I write fairly frequently on my blog.
Today, we're going to discuss infrastructure-as-code and configuration management. We'll define some terms, and I'll share my perspective on the differences between imperative and declarative approaches. We'll delve into how infrastructure-as-code tools work. Once we understand their workings, we can begin to view them from a different perspective. Lastly, we'll discuss how you might decide on a particular solution.
I should mention that my current role is in software sales. My intention here is not to sell you anything. If you feel compelled to purchase something by the end of this talk, know that it wasn't my intention. While I work in software sales and am a sales engineer, my goal is not to sell anything to anyone during this discussion.
I mean, what is infrastructure-as-code? I'm going to start, you know, straight off the bat with a hot take. I've put 'code' in inverted quotes here because, in my personal opinion, the definition of 'code' is very flexible when it comes to infrastructure-as-code. When I started in the industry, infrastructure looked like this. This is not an actual photo of my home lab, but it's a fairly accurate representation. We spent lots and lots of time buying servers, building data centers, putting servers in data centers. There's a bit of a groundswell around going back in this direction because of cloud providers charging a lot. There's a reason we moved away from this. The reason was you could provision infrastructure at the click of a button. You didn't have to wait six weeks for a server to turn up in a data center. You didn't need expertise in networking. You didn't have to hire an army of people. You didn't have to deal with supply chain issues. Ordering and stacking data centers is hard. Cloud providers are here for a reason, and they're not going anywhere, regardless of the latest blog posts from the Ruby on Rails guy saying we're all going to leave the cloud in five years. It's just not going to happen.
Before we had all these API-driven ways of provisioning infrastructure, we'd rack a data center, install an operating system, and then manage all of that. We used configuration management tools like Puppet, Ansible, Chef, and CFEngine. Then we created a subset of those tools as infrastructure-as-code solutions with slight differences between configuration management and infrastructure-as-code. Configuration management tools can be declarative or imperative, whereas infrastructure-as-code tools are always declarative. Configuration management tools generally interact at the operating system layer, and infrastructure-as-code interacts with an API.
If you're a user of Sibo Cloud, you can use most infrastructure-as-code tools to provision things in Sibo. Most importantly, there's an underlying principle that drives infrastructure-as-code solutions, and that's the DAG. I'll talk more about the DAG in a moment, but it's a Directed Acyclic Graph created to represent everything you're managing. You store the results of that DAG in a state store. Almost every infrastructure-as-code tool does this. Some expose it to you, some don't, but they all do it. That's how you can differentiate between infrastructure-as-code and configuration management. As I said, all infrastructure-as-code tools are declarative. That doesn't mean you go to the cloud provider API and say, 'Give me this infrastructure.' I did see the Sibo announcement earlier, which is a direction they're taking, and I'm a fan. Instead of managing all the resources yourself, the cloud provider will do it for you. I know people at AWS, and they won't be able to do this because it's a complex machine managed by many people. But newer cloud providers can say they want this declarative thing. This is the big value proposition of infrastructure-as-code tools. They allow you to say you want the world to look a certain way, and they figure out how to get there. That's important because infrastructure-as-code has two key components: the declarative mechanism and the DAG.
I argue with people online about what imperative and declarative mean. I've tried to differentiate between the two. When you decide the order of operations in your cloud provider, you're following an imperative pattern. If you've used the Sibo API, AWS API, or Google Cloud API and their SDKs, you decide the order. In a declarative pattern, you say you want to create these things, and it happens. You don't decide, you don't catch exceptions. It's hard to do on your own. I've tried to do this with the AWS Go SDK, and it's challenging. If you do it declaratively, it's condensed.
The difference between using the AWS API declaratively and the AWS SDK imperatively in Python is clear. On one side, you say you want a bucket. On the other, you create the bucket and handle errors. That's an imperative operation.
When you create multiple resources in a cloud provider, each thing represents something you're creating. This is the DAG in infrastructure-as-code. Every infrastructure-as-code tool builds these DAGs. They do this by representing what the DAG looks like. Many use configuration languages like YAML or JSON. It's easy because there's not much manipulation. The problem is configuration languages are hard to write and manage at scale. Not every environment you build will look the same. Your production and development environments will differ. Most organizations say their development environment is close to their production environment, but there's always a difference. You want to use the same code for both but with slight modifications. This creates complexity that can't be expressed in a configuration language alone. If you add layers, it gets painful. So, we created domain-specific languages to represent this. They express configuration. Their only purpose is to generate configuration. The clue is in the name: it's domain-specific.
We know what we want to do. We want to build this DAG to send to the cloud provider API. So, how do we want the user experience to be? When choosing an infrastructure-as-code tool, you might Google different names and see a table comparing features. This doesn't help you decide. I don't know if having rollback changes is good for my infrastructure-as-code solution. I chose to look at it through two experiences: how to express the DAG and how to get it into the cloud provider API. You might want to think about which languages to use, how to abstract things, which cloud providers to support, and whether the process is easy or done by hand. There are different ways to execute this, which I'll go through.
Let's talk about the options. Here are the different authoring experiences available. One is domain-specific language-based authoring. The investment in learning a domain-specific language isn't useful outside that domain. It offers a lower learning curve because there are fewer features. The abstractions for sharing are specific to that implementation. It's not reusable elsewhere.
These are the DSL-based tools that are available right now. The most ubiquitous and commonly known is Terraform. Terraform is everywhere. I have lots of opinions, as you can probably tell, about why that is. But most people are familiar with Terraform as an infrastructure-as-code tool. Then, Microsoft, in its infinite wisdom, decided to build its own DSL specifically for its cloud, called Bicep, which is a play on words of ARM. Its entire job is to express ARM templates, and ARM stands for Azure Resource Manager. So, these are the DSL-based tools you have at your disposal.
Now, the other, more simple approach is a configuration language-based approach. This has a very limited scope. As we discussed earlier, expressing a DAG with a configuration language isn't great. To do complicated things, you either need to go to a DSL or build something on top of it. The fact is, it's really simple to get started with a configuration language. I could probably teach most people to write a configuration language-based document. That lower learning curve is helpful for getting started, but it doesn't provide a great authoring experience. You don't know if you've made a mistake until you send it to the cloud provider API. That's a slow feedback loop. It feels like an exercise in frustration because it takes a long time to get anything done. But that simpler learning curve is beneficial for many organizations. These are all the infrastructure-as-code tools that currently support a configuration language-based approach. Adding a configuration language-based approach is a great starting experience. Palumi supports a YAML authoring mechanism, Azure Resource Manager allows for hand-written templates, Terraform can be expressed in JSON, Crossplane allows YAML, and CloudFormation supports YAML or a JSON-type configuration document. The fact that all these tools support a configuration language-based approach indicates it's a good place to start for those new to infrastructure-as-code.
Finally, the authoring mechanism I personally prefer is a programming language-based model. I believe almost every infrastructure-as-code tool will offer a programming language-based model at some point because it's incredibly expressive. A programming language allows you to manipulate the DAG easily. The fact that language-based abstractions exist means many people benefit from it. It's also more reusable. If you start building infrastructure with a programming language, you're learning the core foundations of that language. You can't get that from a DSL or a configuration language. Almost every developer environment will assist you when using programming languages. Using a programming language when expressing infrastructure feels intuitive. I spent 10 years feeling like I was battling the API, and now I get feedback right at the authoring time. I believe all infrastructure-as-code tools will eventually offer a programming language-based model. It's more flexible and a better way of operating. The only reason it isn't a first-class practice now is due to an industry shift in those who can express infrastructure. I'll discuss that more in a few moments. These are the tools I'm aware of that allow a programming language-based model to express infrastructure: Palumi, AWS CDK, and Cloud Development Kit for Terraform. Here's a visual of what it looks like to...
Stay up to date
Sign up to the Navigate mailing list and stay in the loop with all the latest updates and news about the event.