Admin interface

In order to make managing a production deployment easier, Geopoiesis introduces a concept of a management interface. It shows all the workers and all the runs waiting for their turn across all scopes, making it an especially useful tool for engineers managing an installation with multiple scopes.

Workers view

Currently, the admin interface has two screens. The first screen is called Workers and it displays the current list of workers registered with the app's DynamoDB. Here is an example view with several workers ready to accept jobs:

A worker can be in one of five states:

  • idle, meaning it is ready to accept jobs from the queue;

  • busy, meaning it is currently processing a job;

  • draining, meaning it is currently processing a job but will not accept any new jobs;

  • drained, meaning that it is idle and will not accept any jobs;

  • dead, meaning that we haven't seen it in a while;

Draining workers

The Admin interface allows you to drain individual workers. This functionality is useful in three scenarios. First, if you want to scale down the number of workers but you have no control over which one gets killed. This is a common case with cluster schedulers like Kubernetes or ECS, and the reason why it is dangerous is that one of your workers may be performing a sensitive operation (eg. terraform apply) on the Terraform state when it gets killed. In order to prevent that, you may want to make sure that no workers are running jobs before the service is scaled down.

The second use case is change management - a similar situation, but this time caused by the need to update Geopoiesis itself - with a new configuration, a new binary, a new version of Terraform, new environment variables or some combination of the above.

Last but not least, you may want to drain the worker for debugging purposes. If you see an individual in worker misbehaving, you may want to inspect it but also prevent it from creating further damage by taking more jobs. This approach is known as a lame duck mode.

Note that for now it is not possible to undrain workers, or make them accept jobs again. You will want to kill the worker instead. If you're using a cluster scheduler or an autoscaling group you should have a replacement worker running shortly

Worker aliveness

Every worker periodically updates their lease in the database, which serves as a heartbeat mechanism. If it misses a few heartbeats, garbage collector kicks in and cleans up after the presumably dead worker. The worker itself is also programmed to self-destruct after a few failed heartbeat attempts, making sure that there is eventual consistency between the Geopoiesis list of workers and the reality. This is why you will likely see the workers in the dead state for a brief period of few seconds.

Since Geopoiesis manages worker aliveness, there is no need to introduce any external health checking mechanism.

Worker metadata

If you look closely at any of the entries in the Workers view above, you will see that each has a unique ID. Also, you may notice that they share the same version (your-release-77). You can set both of these values yourself using environment variables:

The unique worker ID can be set using the GEOPOIESIS_WORKER_ID environment variable. In this particular example we are running Geopoiesis in AWS ECS, so we use ECS task ID as the unique worker identifier. Thanks to this we are able to easily map the worker to the actual task where it's running.

The worker version can be set using the GEOPOIESIS_WORKER_VERSION environment variable. Setting it helps with change management, allowing you to monitor rollouts more easily. You may want to set it to something like the Git tag or commit hash which corresponds to your current Geopoiesis installation (presumably Docker image). In the example above however we are referring to a unique release ID set by a release manager.

You don't need to set the worker ID or version. If you don't populate their corresponding environment variables, the worker ID will be generated automatically and the worker version will be set to none.

Setup

config.hcl
admin {
domain = "admin.geopoiesis.io"
identity {
github {
client_id {
environment {
variable = "ADMIN_GITHUB_CLIENT_ID"
}
}
client_secret {
environment {
variable = "ADMIN_GITHUB_CLIENT_SECRET"
}
}
permissions {
user "marcinwyszynski" {
write = true
}
}
}
}
}