Sharpen that Edge! How a Service Mesh enhances EdgeComputeOps
Speaker: Marino Wijay
Summary
In this talk, Marino Wijay discusses Edge Compute Ops and the significance of service mesh technology. He explores the evolution of edge computing, its various forms, and the challenges it poses. The talk highlights Ambient Mesh, a sidecar-less mode of Istio service mesh, as an ideal solution for edge computing due to its adaptability and security features.
Transcription
I actually thought I had 45 minutes to do this talk, and I had like probably about 35 slides when I found out it was 25 minutes. I was like, delete, delete, delete, delete, because yeah. So, welcome everyone! I'm here to talk about Edge Compute Ops. How many of you have done Edge Computing today?
This is a joke. Every one of you have done Edge Computing. You all have laptops, you have MacBooks. I will say that the journey to Edge Computing has been long. It's been very interesting but it's constantly changing because the landscape changes because of the way our Hardware gets better, faster, stronger, and the way our software just develops. And you know, with the, with things like Kubernetes that came out it changed the way how we can approach Edge Computing altogether. But before I begin, my name is Marino Wijay. I'm a platform Developer Advocate at Solo. I'm also an ambassador at Eddie Hub, as well as an organizer KubeHuddle. Please come check that out in May, it's in Toronto. And I run a variety of different things, but more specifically, and more recently, now that Michael's in the audience, I was inspired by him to kick off something called 70 Days of Service Mesh, because I wanted to dig deeper into the realms of how service mesh technology works, how it can, you know, apply to a variety of use cases around security, observability, and even network routing. But it all came from 90 Days of DevOps. So thank you, Michael. I appreciate that.
So, what is Edge computing? Now, Edge Computing comes in various shapes, sizes, and forms. We start off with the cloud because now, today, if I want to go provision a workflow, what do I do? I log into my cloud provider like Civo and go click and deploy a cluster or click and deploy a compute object, and maybe write artifacts to it down the line. Or I've tied that into my CI/CD system. But before clouds existed, we had data centers. We still do have data centers, they still exist. People still operate them. There are a lot of controls in place, but these data centers have adopted a lot of the cloud principles to be a thing, to be available, to be scalable, to you know, quickly change whenever there's an issue or if there's any sort of failures, quick swaps, and things just continue to work. But we started to realize challenges around how our applications perform as we try to access them. Access them either at a cloud place or at a data center.
So, one of the biggest concerns was latency. You know, my application is so far away that it's not going to provide the result I want in time. And so, these new concepts around Near Edge, Far Edge, and even Micro Edge started popping up. Now, the concept of Near Edge, you can think of almost like a mini cloud provider or even a CDN, a Content Delivery Network, which is providing a set of services. And then you have this Far Edge that might live at a branch office, or in your house, or somewhere, wherever you, you know, decide you want to deploy this Far Edge. I'll talk about the Far Edge because that's where we're going to focus on.
And then there's something called a Micro Edge which, if you're familiar and ever work with Raspberry Pi's or Jetson Danos, these are providing capabilities around capturing telemetry data for sensors that are processing real-time data, for example, like temperature, air quality, whatever the conditions of the environment might be, or maybe other things too, around the realms of, hey, I need to detect when people enter and exit my building so I can track that and I can form a security posture from a physical security standpoint.
But one of the, one of the interesting phenomenons that's going on now is, and this was mentioned much earlier on, if you are, I think, at the keynote, is this movement towards the edge, moving away from the cloud and towards the edge, because you can do so much at the edge now. Now, how is this possible? So let's, let's break down what an edge looks like. How many of you have ever played with, like, an Intel NUC, or something similar? Yeah, I mean, they're very tiny little powerful computers. You could game on them, you could do a lot of different things. I use them for lab purposes because they offer up the most amount of RAM, as well as CPU, and even storage. But if you pair a few of them together on a little network, now you have this little Edge compute cluster that you can do a lot with.
So about four years ago, when I was working at VMware, we picked up a company called Velocloud, which offered up something called SD-WAN. Now, this was the hardware side of VMware that we were so foreign to because we never really, even though we deploy on hardware, we never really sold it or dealt with it. But I saw an opportunity because this, this device that you would deploy, pretty much anywhere, you know, 100 sites, 500 sites, is an x86 platform or potentially an ARM platform. What does that mean? I can deploy virtual machines on them. What does that additionally mean? I can deploy Kubernetes onto that little platform. Now, I tried to chase that, that whole use case while I was at VMware, but they never really saw the value four years ago because that's not where the direction was. Everyone was moving towards cloud and the edge was not really a consideration. But then we started to see how companies like FedEx, UPS, even companies that run autonomous vehicles take advantage of these Edge compute solutions. So why can't we make them generally available? And now we can because if I go to any store or online, pick up a NUC or something similar like a Raspberry Pi, there are options where I can deploy Kubernetes on top, run it like it's a little Edge node, allow it to participate in my own private cloud, or even hybrid cloud.
And now I have my edge compute model, but we have to dig a little bit deeper because it's not just about that. So, I don't know where Sergio is. He's not here, dang it. A lot of my talk came from Sergio Mendes's book on edge computing systems with Kubernetes. This diagram actually comes out of that book, because he calls out all the different facets you have to think about before you decide to go deploy an edge solution.
You have to think about why you want to deploy an edge solution. You have to think about if you have the right expertise, the right hardware, do you have network availability? Do you have redundant network availability? What kind of sensors do you want to put in place? Are you going to tie this into the cloud? Where are you going to store your data? Are you going to have a local database, or are you going to use some cloud database? There are a lot of considerations for setting up the edge, and so you have to start really planning this out.
Now, there is a great book that he wrote. I will call it out at the end. He, interestingly enough, puts this chapter right at the end of the book, primarily because he wants you to learn about all the different edge compute concepts before you even get to this point and decide, hey, is this an initiative I want to take on?
So let's dig a little bit deeper, right? Because there are several challenges we need to think about, and it really comes down to things like security, observability, and networking. So, with respect to observability, we have to think about the edge stack itself, the physical hardware. How does it work? How does it function? Does it have points where I can capture some level of telemetry to know when it's on and responding? Do I know when it's running out of CPU, memory, and disk? These are things I need to know so I can continue to run applications on top.
But at the same time, I've also deployed applications inside that stack which I need to monitor to make sure that they're always available. They're responding and providing results that I expected to. So you may be using tools like Prometheus and Grafana to achieve this, but there are many, many, many other tools out there that make this possible. But I want you to sit here and think about for a second where these observability points are and how you would start implementing this if you were to roll your own platform.
Then security, right? Security is a very, very broad topic because you have to think about this physical box that is going to be deployed somewhere. So you have to assess the physical security of the environment. Who has access to that closet? Who has access to the office? Who can access the actual node, plug into it, and maybe even try to configure it? Who has access to the passwords to that node? These are all considerations that you have to think about before you can even start securing the note itself at the operating system level, as well as at the app level.
But once you get past that, past that physical security side of it, you have to think about the kinds of apps you're going to deploy. Now, if you decide that you're going to just pull random Docker images or container images from the world wide web, without really thinking about it, and now that image makes its way into your edge compute node, into your edge compute cloud network. Well, guess what? That's going to laterally move across and impact other areas of your edge compute environment.
So there's the element of how do you ensure that you're pulling down the right images. Now, these principles, you know, will apply outside of the edge compute world, but these are still considerations in edge computing. In terms of the apps themselves, once you've deployed a few microservices, how do you ensure that you have identity? How do you ensure that you employ policy so that you can ensure only the elements that need to talk to each other do, using this zero trust posture or model?
And then we also have to think about the network stack for a second. So, the network stack is very, very critical because, even though we've deployed this box somewhere that is able to run containers and maybe our Book Info app or whatever it might be, you also have to think about the network because the network can be very complex. It has changing conditions, and we have to be able to accommodate for that.
So, for instance, if you're going to deploy an edge node, you obviously need to connect it into some sort of LAN environment. That LAN environment is going to connect out to some firewall or a pair of firewalls that are going to just filter in and out traffic or inbound and outbound traffic. But you also need to consider connectivity. Are you going to have a single connection? Are you going to have dual connections? Are you going to have a single and then a backup connection that might be a satellite connection, which will impact the way your applications perform after you have a failover scenario?
So, there's a lot of considerations there. What do you do when you have to fail over to that slower circuit and now your applications are not responding? Do you depend on failing over to another environment altogether, or do you circuit break and just ignore that part of your application? So there's a lot of different considerations here as well.
But then, once you get past the physical networking part of it, now you're at the virtual layer. If we're just doing bare metal containers and Kubernetes, then we can forget about the SDN layer because we don't really need it at the moment. If we're doing containers straight up, we'll still need some orchestration layer like Kubernetes.
Kubernetes is great. There's also K3S. There's also Pine. But they offer up plugins, container networking interfaces that just allow us to provide that switching-like functionality for our pods. But the logic doesn't stop there, the intelligence doesn't stop there, primarily because if we go back to the failure scenario where I have to fail to a slower link, will I need to be able to handle that at an application layer? And this is where a service mesh comes in.
Now, a service mesh like Istio, for example, is able to adapt to those changing network conditions. For example, there's too much latency on the wire or the wire is oversaturated now, so there's not a not enough available bandwidth to start sending traffic. So a service mesh will see that and say, hey, let me let me respond to that, let me let me respond to that changing network condition. And we can provide a response to make sure that your application continues to function, even though it may operate a little bit slowly, it still functions, or it still operates the way you want it to.
So, let's talk about a specific service mesh called Istio. Yeah, Lisa's like, yeah, that's my service mesh. So Istio is one of several service meshes out there. There are others like Linker D, there's Kuma, there is open service smash, there's even psyllium service mesh. So if you're familiar with the psyllium CNI, they offer up a service mesh offering as well using the envoy proxy.
But Istio has been around for a long time, it's been doing so much, it's been battle-tested, there's a lot of, I guess, features that have been poured into it over the last four years that there came a point in time where we realized that there's so many other kinds of workloads that don't get to take advantage of what a service mesh can do but suffer when we have to deploy things like the sidecar.
So quick sidebar here, what is the Sidecar? Now if you are familiar with network routing, we have to have this device called a router that effectively will tell us where to go when we need to route traffic. Now if we can slim that down and allow it to do things like layer 4 and layer 7, we can stick it alongside a container and allow that sidecar to route on behalf of that container. It can do a bunch of other things, but it's effectively routing on behalf of that container. So the container doesn't have to go talk to the network, figure its way around the network, the sidecar does that.
But the challenge with the sidecar is that a very, very specific example I'll call out. When you have an application that comes online in Kubernetes, you may experience a race condition with a sidecar and the container. The container may come online before the sidecar can come online, and because of that, your routing of the sidecar doesn't take effect. So that is a challenge for for technologies and protocols like SQL for example or other technologies that come online and you can probably implement something like the hold application timer in your manifest. But one thing to consider is that maybe, maybe there might be a situation where you don't want to use a sidecar.
So, in the last year, there were a couple of big leaguers that put their heads together and decided we need to take Istio to a slightly different direction to accommodate for other styles of workloads. So they produce the sidecar-less mode of operations called Istio Ambient Mesh. I'm checking for time because I got to make sure I have enough time.
Now, Istio Ambient Mesh, a new mode of operation, just means that, hey, you're still getting service mesh capabilities, we're just not deploying the sidecar, because we've taken that sidecar functionality and moved it off to a layer 4 and a layer 7 proxy. What does that look like?
So there's a little bit more details I want to get into, a lot of the meat, but I only have like I think seven more minutes or something, so I'll try to be quick. There's something interesting about the way we use these layer 4 and layer 7 proxies away from the sidecar. If you're familiar with tunneling mechanisms like IPSec VPN or VXLAN or Geneve, we're using these styles of technologies to provide that mesh-like functionality at the Kubernetes layer.
Now breaking it down, we've created something called a Z tunnel. A Z tunnel is effectively a pod that tunnels to another node in the same cluster. And what's going on is when traffic or a request has to go from one application to another application across nodes, normally what would happen is the sidecar would facilitate that communication to another sidecar and we would use certificates right for our identity to make this possible, so that we can have our encryption.
This slightly changes and shifts in the mode of ambient where it's not the sidecar that's doing that job anymore. It's the Z tunnel pod that assumes the identity of the app, allows it to form its end of the encryption, and then allows the other end to do the exact same thing. So we maintain the same service mesh functionality, which means encryption is there, which means layer 4 authorization still is there as well.
But what if we need layer 7 authorization? What if we need to say, hey, I have an app that needs to pass in an HTTP GET command to this other app over on the other side, but I also want to issue HTTP DELETE commands which aren't allowed? Will I implement a policy for that, a layer 7 authorization policy in Istio?
I won't get into the details of that but I also have to implement something called a Waypoint proxy, which is what processes these layer 7 policies. Is Waypoint proxy is based on Envoy right now, but currently, as it stands, it's the one that is accepting policies, accepting connections on behalf of that final application. And the reason why we have to deploy this is because there is somewhat of a need for layer 7. If there isn't any, we can pretty much bypass it and the reason why we may not need it is because I care about speed, I need encryption, I need to protect my identities, but I don't really care about that layer 7 header information at this point.
So it really comes down to, do you need it, do you not need it, and it'll come down to what you're building out in your environment.
Now why this becomes very important for edge computing is how we slim down that sidecar capability to something so small to just be able to tunnel and offer encryption much like you would see with IPSec. And so that's what we've done, and this is what makes Ambient Mesh a perfect fit for edge computing or a perfect fit for service mesh in edge computing, because now I can have all of these different edge nodes deployed anywhere in the world, I can set up locality routing based off of geo, I can also have telemetry there to say hey, if for example one of my edge nodes actually went down, well I have another destination to go to to provide the same results. But I can only do this if I have the right amount of resources to make this possible. If I was doing full-blown Istio, guess what, I would run out of memory and CPU before I even got to the point where I can deploy my applications.
So, to cut it short, there are a lot of interesting use cases that you can get to with Edge Computing. I'm not going to get into all of these ones, but I actually pull this list directly from Sergio's book because it was very complete and very detailed in terms of the kinds of things we can do. I think in the next year or two we're going to see massive utilization of edge compute particularly in the Industry 4.0 era with manufacturing and even like logistics. We're also going to see a lot of autonomous vehicles take advantage of edge computing and even offerings like K3S to be able to facilitate this. You may even see some ambient mesh in there as well.
But what if we wanted to do this with Civo Cloud? So, we need to think, okay. I made this talk before Civo announced their Civo Edge, so I wasn't sure what was going on. I'm sorry, I'm sorry, Canal. But in my situation, what I did was I deployed a management layer inside of Civo Cloud and that management layer is supposed to be able to facilitate network communications, drive network policy, and even security policy to all my edge locations. So that means when I deploy a Kubernetes instance on an edge node and I have ambient mesh, ambient mesh will attach and be seen inside of Gloo Mesh, which is something by Solo. I'm sorry, Solo plug, I have to because I work for Solo.
But once I have that, now I have full visibility into my entire edge compute network and now I know where all my applications are flowing. But here's something very interesting, right? Even though you have this edge compute environment, a Kubernetes cluster still needs something, a load balancer. Now why you need a load balancer is because you have to expose your applications. You have to find ways to connect to them and at the edge, especially if you're at home, you're not going to have access to a load balancer. So, what do you do at this point?
Now a very good friend in the community, Alex Ellis, created this amazing technology called Inlets. How many of you have heard of Inlets before? Yes, so to cut it short, Inlets is a tunneling mechanism that allows you to deploy two endpoints. One endpoint gets deployed to the cloud provider of your choice, it could be Civo, it could be AWS, it could be anything. I chose Civo, and in it was a lot of a manual setup by the way and a lot of it doesn't work right now because of some other issues that I have with my own network.
But on one end you have this Inlets exit server, which is where you connect in from. And then on the other end on your local cluster, you have an Inlets client. So that Inlets client will tunnel all of your connections through that exit server so you can access your application, whatever it is, through a public IP through that Inlets exit server. So if you say or you specify that you want a load balancer service in Kubernetes, it's going to pull a public IP for you from that exit server. That becomes your entry point into your environment.
Now this is interesting because you don't want to keep calling load balancers all the time. You can't do that, it's not going to scale well for the edge. So why not use something interesting like the Istio Ingress Gateway? Why? Because you can oversubscribe the Istio Ingress Gateway as many times as you want for all the different kinds of applications that you have.
Now in my situation, what I would do is I would flow my user request into the public IP, that I get with the back end destination. That traffic gets tunneled through my Inlets exit server, as you can see if you're following that little orange line. It'll tunnel down to my Inlets client on my edge node and that Inlets client decrypts the traffic, passes it along to the Ingress Gateway. That Ingress Gateway routes it using a Virtual Service object in Istio to the right resource. The request gets made, the response makes it back to me. So, that's how it works.
I mean because of this now, I can deploy edge compute apps pretty easily and still expose them and access them. But there's also one additional component. I need to still use the same load balancer service to tie my edge compute instance up to my Gloo Mesh so that I can see all my edge compute deployments.
Now unfortunately when I was trying to make this all work, my demo kind of broke because I found out that our current branch of Ambient Mesh, which is still experimental, doesn't have all the right pieces to make our Gloo Mesh agent work. So all the data wasn't being traversed up to the management plane and we couldn't see a lot of the data.
So what I decided to do was, okay let me at least try to get something working to show you Inlets. So I sat down there and I wanted to show you all that I can SSH into my little remote node using Inlets because that's the other option you can use. You can set up a remote SSH tunnel if you wanted to, but that broke, so I can even access that.
So all I get to show you all today is, if I can minimize this, the fact that I was trying to scale up my edge compute management cluster that was housing my Gloo Mesh. But my little app that's running on my edge compute node which is actually at home right now online and I hope that I can somehow SSH into it.
But I wanted to just quickly point out that if you're able to do this with edge computing, you can do so much more. You can go a lot further. So I want to end here because I know we're at a time. If there are any questions, I'm not going to take them now, come find me later on.
But please, check out the book by Sergio Mendes, where is he? He's back there, say hi to him. He wrote an excellent book on edge computing systems with Kubernetes. Go buy the book, it's an excellent read. And if you want to know more about Ambient Mesh, I have some books here for y'all and some plushies.
Alright, well thank you everyone. And I'm gonna get a selfie quickly.
Stay up to date
Sign up to the Navigate mailing list and stay in the loop with all the latest updates and news about the event.