Building blocks

In previous articles we worked with a single Geopoiesis process, used as both an HTTP server and worker performing Terraform runs. Perhaps less visible, there were two more things running behind the scenes: garbage collector and metric reporter. This approach is great to start with, but it does not scale. If you need more workers to perform your run, you shouldn't need to spin up more HTTP servers or garbage collectors.

So, you can turn off each process by passing a flag to the Geopoiesis binary. This allows you to group individual Geopoiesis elements into services and scale them according to your needs. The suggested approach is to put the HTTP server, garbage collector and metric reporter all in one service, and worker in the other. You normally need just one instance of the HTTP server, which is extremely lightweight. The number of worker instances will depend on your workflow, and may fluctuate throughout the day.

HTTP server

The HTTP server has three main roles. First, it serves the GraphQL API which the frontend is using. Second, it exposes an endpoint for webhooks coming from VCS providers. It also serves static assets, which are bundled with the binary itself. There's also a /health endpoint for if you want liveness and readiness HTTP checks. The HTTP server is extremely lightweight and needs around 64MB of RAM and a tiny fraction of a CPU under normal load.

You can disable the server by passing --no-server command flag when starting Geopoiesis.

Worker

The worker is what does the heavy lifting. It fetches your source code and runs the Terraform binary and your custom commands. The worker is the most resource-intensive building block of Geopoiesis. The main driver behind resource usage in Terraform is the size of your state, which needs CPU to be resolved, and memory to be stored when refreshing. We would advise to start with 1GB of RAM and a full core and make adjustments based on telemetry. Also, if you're using custom commands in tasks and initialization hooks make sure you're aware of their resource usage.

You can disable the worker by passing --no-worker command flag when starting Geopoiesis.

Garbage collector

Garbage collector is what cleans up after run crashes. Individual runs hold a number of resources, and there's an infinite number of things that can go wrong, including the process itself dying. Geopoiesis is designed to crash easily, and the presence of a separate process to clean up the mess makes it the most effective solution.

You generally need just one such process, and it is not resource intensive. It doesn't do any heavy lifting, only talking HTTP to AWS resources. Normally you'll probably want to keep it in the same service as your HTTP server, but if you want to have it separate, 64MB of RAM and a small fraction of a core should be more than enough.

You can disable the garbage collector by passing --no-garbagecollector command flag when starting Geopoiesis.

Reporter

Reporter is the crucial of all Geopoiesis building blocks. The only thing it does is monitor your worker utilization and the size of your queue and send the metrics to CloudWatch. If you don't care about those metrics (hint: they can be useful when scaling your workers), you don't need to run the reporter.

You generally need just one such process, and it is not resource intensive. Normally you'll probably want to keep it in the same service as your HTTP server, but if you want to have it separate, 64MB of RAM and a small fraction of a core should be more than enough.

You can disable the garbage collector by passing --no-reporter command flag when starting Geopoiesis.