Wrapping up December Adventure for this year and related to my last post, here's some early prototyping I did over the holidays of a machine management tool. After I posted the last post, I had a conversation with some of my friends in which it was difficult to convey exactly how this might work, so I'm writing this up partly to serve as an explainer for my thought process.
In this post I may assume some basic familiarity with object capabilities and Spritely Goblins, though I'll try to explain things to some degree as I go.
From a high level, what I'd like to accomplish is something like: you have a centralized infra management tool that's responsible for provisioning new machines. When you want to create a new machine, you send a message to that management tool, and the response you get back is a reference to some management interface for that new machine.
Crucially, this system must effectively avoid the trust bootstrapping problem; that is to say, the new instance must be able to securely connect to the management tool, and the management tool must have confidence that the new instance is what just connected to it. Despite seeming simple, this has been surprisingly difficult to accomplish with common devops tools in my experience! It can be done, but often either is not or takes a lot of work. So that's something I'd like to address right out of the gate.
Furthermore, I'd like to make it straightforward to connect services running on different machines to each other. Even when you have a single deployment tool deploying two different services, it's often more involved than you would hope to get those two services talking to each other. Consider the example of a web application and a database instance deployed specifically for that web application. You've deployed the web application onto one machine, and the database instance on another, and you need the web application to connect to the database. What do you need for this? Typically, at least:
In my experience, both of those steps are often done semi-manually; often someone will manually generate database credentials to use, and while the configuration of the webapp will be config managed, it'll pull those static database credentials from some source (maybe it's a sops-encrypted file, maybe it's in a secrets manager somewhere, etc). When it is done automatically via something like Vault, this relies on that separate tool with its own policies, and bootstrapping client trust to Vault is itself a whole rabbit hole!
Wouldn't it be nice if your provisioning tool could take care of this kind of thing for you?
I've built a basic prototype of some of what I have in mind called gobs-of-machines. I'll probably work on it (and document it) more in the future, but for now I'd like to go over what I have so far.
The system I've built has a few parts, most with D&D-esque names. There's the boss, which is responsible for triggering the provisioning process of new machines and keeping track of the machines it's provisioned. There are provisioners, which are what would call out to the APIs of a given cloud provider and provision a new machine with some specified user data. And then there's the hob, a service in two parts - one colocated with the boss, and another running on each deployed machine. This acts as the communication channel between the boss and each machine. (Named after the household spirit or hobgoblin, not the stove. Maybe I'll rename it if it gets too confusing, but hobgoblin is pretty verbose. Names, ya know!)
Starting with the boss, we'll create a Goblins object with a hashtable inside it. This will map a human-readable name for each machine to an object we can use to communicate with that machine.
(define (^boss bcom)
(define machines (spawn ^ghash))
For the time being, since I haven't implemented any cloud provider provisioners, we'll use a dummy provisioner as a placeholder. In a real implementation we'd want a more sophisticated way to manage provisioners for multiple providers, but for now let's just create an instance of the dummy provisioner inside the boss object.
(define dummy-provisioner (spawn ^dummy-provisioner))
We'll add a getter methods to get a specific machine from the hashtable by name, and a list method to list all the machines that have been registered:
(methods
[(get-machine name)
($ machines 'ref name)]
[(list-machines)
(ghash-fold ghash-keys '() ($ machines 'data))]
For those coming from other object oriented programming languages, this will hopefully feel somewhat familiar. We're defining the constructor for an object (in Goblins's convention, the ^
at the beginning of names like ^boss
denotes a constructor) and defining some methods for that object. There are a few Goblins-specific quirks to explain here:
$
in the get-machine
method stands for a synchronous method call on another Goblins object (with some transactional functionality that I won't get into yet)list-machines
method is a bit funky, because Goblins's ghash
objects don't have a built-in method to list all the keys in the hashtable. So what I'm doing here is grabbing the underlying hashtable and pulling out the keys from it. (ghash-keys
is a procedure whose implementation I've elided for brevity, check the source if you're curious.)One thing worth noting about Goblins programming is that, unlike some other object oriented programming languages like Python, you should generally think of method calls as messages sent between objects. It's a very Alan Kay-style object system.
Okay, now for the only particularly interesting part of the boss, the create-machine
method:
[(create-machine name)
(define hob-server (spawn ^hob-server-presence))
(on (<- dummy-provisioner 'new-machine hob-server) ;; TODO selectable provisioners
(lambda (ret)
($ machines 'set name hob-server)))]))
This one adds a couple new things to go over. Within the body of the create-machine
method, we're spawning one of the components of the hob
service - the one that lives alongside the boss
. We're then sending an asynchronous message to the provisioner (that's what <-
does) and passing it a reference to the hob
object that we created.
Async messages, as in some other languages like JavaScript, return promises that will be fulfilled later rather than waiting for the response to come back. In Goblins, you can trigger an action when a promise is resolved with on
. In this case, when the promise is resolved (i.e. when the boss gets a response from the provisioner), we add an entry to the hashtable created above for the new server. Currently I have it waiting until after hearing back from the provisioner to avoid adding entries for new nodes if provisioning fails, but we could just as well create a "pending" entry in the hashtable and update it when provisioning succeeds.
Now that we've seen the code that kicks off the provisioning process, let's take a look at what the provisioner itself does. For now I only have a dummy provisioner with one method to try out the idea:
(define (^dummy-provisioner bcom)
(methods
[(new-machine hob-server-presence)
(display "dummy-provisioner: In a real provisioner, this would talk to a cloud provider\n")
(let ((new-vat (spawn-vat)))
(with-vat new-vat
(define machine (spawn ^dummy-machine))
($ machine 'provision hob-server-presence)))]))
This is the method of the provisioner that we saw the boss call earlier.
Part of this again deserves some explanation, because it's Goblins-specific. Because we're not actually provisioning a new VM at this stage, I'm instead spawning a new "vat" to run some extra code in. A vat is Goblins's mechanism of concurrency; it's an event loop containing objects. Objects within the same vat can make synchronous method calls between each other, while objects in different vats can only send asynchronous method calls. Objects on separate machines can communicate with each other via CapTP, but they will necessarily be in different vats and so can only communicate asynchronously.
Running part of this code in a separate vat is my attempt to simulate running on a separate machine, though of course there are a few things that would be different in a real provisioner:
The latter is trivial in the dummy provisioner case, when everything is running on the same machine, but Goblins gives us a trick up our sleeve that makes it possible across machines. If you have OCapN set up, you can serialize the reference to the hob-server reference as a sturdyref so you can send it over the network as a string. In a cloud provider that supports user data (Linode, for example), you can include the sturdyref in user data for the new instance. The code running on the remote machine can then enliven the sturdyref, turning it back into a live reference, after which point objects on each machine can send asynchronous messages to each other as usual.
The dummy provisioning code that runs in the new vat doesn't do a whole lot; it looks like this:
(define (^dummy-machine bcom)
(define hob-client (spawn ^cell #f))
(methods
[(provision hob-server-presence)
(display "dummy-machine: Provisioning new machine\n")
(let ((client (spawn ^hob-client-presence)))
($ hob-client client)
(on (<- hob-server-presence 'register client)
(lambda (ret)
(display "dummy: Registered new client machine\n"))))]))
(TODO talk about/look into whether or not the hob-client cell above is needed, maybe remove it. We'll need to register the hob-client with a persistence mechanism to keep it alive in the long run, which should keep it from getting GC'd anyway, so maybe we don't need the machine object to stick around)