Creating a Cluster with Elixir
The other day asked few candidates for Go engineers to implement coding task. The task is create a tiny service which allows to monitor Bitcoin price fluctuations over a specified period of time with specified frequency. For example a user sends a request to the service to monitor Bitcoin price for a period of 5 minutes with frequency every 10 seconds. To make things simple just create a stub that returns a randomly generated response instead of actually calling an API. The service should be:
- suitable for working in a cluster, with number of nodes growing or shrinking at any time
- able to run locally either using docker-compose or Kubernetes
- reliable and survive minor unpredictable errors (e.g. network errors)
- resilient to to server/node restarts
As a person who likes Elixir and from time to time does small side projects, I was surprised that actually building a cluster could be the biggest headache with that coding task. Candidates were implementing clunky complex solutions with a lot of dependencies and middleware. Because of that it could be prone to failure, hard to maintain.
So thought it would be a good opportunity to show how easy clustering can be done with Elixir and maybe “sell” Elixir to my boss. So let’s implement clustered service.
Let’s go through the steps to create this application from scratch.
First step is to generate OTP Application scaffold.
mix new — sup bitcoin_price_service
BTW, here is a good article that explains what is OTP Application. https://blog.appsignal.com/2018/09/18/elixir-alchemy-how-otp-applications-are-structured.html
So everything starts from mix.exs file. This file contains our Application specifications, starting modules and dependencies. We using LibCluster, `horde` to build our cluster. Nebulex is a wrapper for distributed cache, but we using it as a substitution for persisted repo/store in our app. Plug provides functionality for web endpoint.
Next thing you can see in that file is starting module for Our OTP Application. It is BitcoinPriceService.Application. Every Elixir/Erlang Application has the Supervision Tree. So in that module we implementing that Tree. BitcoinPriceService.ApplicationSupervisor is our main supervisor. It starts its children and monitors them and restarts failed processes, according to its configuration. ApplicationSupervisor spinning up our cluster on start-up. First thing we need for our cluster is to connect nodes and let them see and communicate together.For this we use Cluster.Supervisor provided by LibCluster. That module connects Erlang virtual machines according to configured strategy. Next, when our machines connected we can start distributed register provided by Horde.Registry. That register is replicated on all nodes and holds names and PIDs of processes from different nodes. It is useful if you want one process one node to be able to send message to another process on another node. So next thing is Horde.DynamicSupervisor. It provides us distributed supervisor. That supervisor spreading our processes out over the cluster, using distributed register. Horde.DynamicSupervisor using CRDT (conflict-free replicated data type) strategy to sync data between nodes. It means that our system is “eventually consistent” and sometimes (unlikely, but possible) can have conflicts, despite Horde automatically resolves conflicts. Supervisor and registry should receive member nodes list and have it updated on new node joined cluster or old node leave cluster. For that we can use “members: :auto” or implement own Registry, Supervisor and NodeListener modules. We might need to implement own modules if we want to resolve Registry or Supervisor collisions manually or implement process state handoff when we moving process to another node. We don’t need that for this project so we stick to the simplest solution. More details can be found in documentation.
After our cluster started we starting Nebulex powered Repo which should stub persistent storage for our application. Nebulex can work with Redis but to keep things simple we configure our Repo to be distributed replicated in-memory store powered by Erlang virtual machine. We also have 2 data structures BitcoinPriceService.Monitor and BitcoinPriceService.Monitor.Price to help organize our data. BitcoinPriceService.Monitor.Repo module contains wrappers for our data structures.
Next is BitcoinPriceService.Endpoint module. It provides simple web server for API Endpoints. I tried to leave it as simple as possible. So it should be simple enough to understand how it works without detailed explanation. When we starting new monitoring from WEB API. We calling BitcoinPriceService.Service module. This module contains wrappers for spawning workers that do actual monitoring under ClusterServiceSupervisor provided by Horde.DynamicSupervisor.
That worker is implemented in BitcoinPriceService.Service.Worker module. It uses a very naive approach in the price acquisition algorithm. It schedules the next request after the current request completes, or after a certain timeout since the last successful request when picking up killed/failed process. So the resulting number of prices can vary if you kill a process or a node. We could schedule each request independently, but I don’t think that’s necessary here. Because we only need this worker as a dummy workload to demonstrate how the cluster works. We also don’t need to implement state handoff because we trying to pickup persisted state from Repo on worker start. On startup it calls “:start” callback immediately. It pickups failed process state if any and schedules “:process” callback call if appropriate. “:process” callback does actual work (in our case we stub actual work) and reschedule itself if appropriate. When “:start” or “:process” callbacks find that monitoring period has ended they call “:ready” callback which marks persisted monitor data as finished.
Now let’s see how It works. We could start several nodes right in our console, but let’s try to create more real environment. Let’s build production release and start it in containerized environment with docker-compose. This will require some extra work however. Before setting up docker we need to initiate mix release.
It will create /rel/ folder with necessary files. Those files needed to configure Erlang virtual machines. We don’t need to modify them in our case, but we can see that we need to set few env vars:
RELEASE_NODE — contains name of the node, corresponds with our topology config for LibCluster.
RELEASE_DISTRIBUTION — should be `name` to let our nodes to work in distributed mode.
COOKIE — contains a secret that should be the same an all machines we want to connect together.
Also we should keep in mind that we need separate container to build application package. Then we install and run that package on another container.
So let’s spin-up our cluster with docker and try how it works.
docker network create cluster-netdocker-compose builddocker-compose up
We can see how nodes starting and connecting each other, forming a cluster:
Let’s start one monitor and confirm it running (“state”:”processing”) via app1 WEB Endpoint.
We can see it runs on node app2 node.
Normally we don’t kill node and rather shut down it gracefully, but let’s hard-stop app2 node and see what happen.
We can see process jumped on app1 node. Despite we stopped previous node not gracefully. Let’s stop app1 too:
Now it’s on node 3. Let’s start other 2 nodes, wait some time to let them sync and kill node 3:
It jumped back to node 2.
And after some time we can confirm monitoring task finished successfully (“state”:”ready”).
We can see our cluster automatically reconnects nodes and syncing them even after hard stops. It reconnects and synchronizes nodes even if we eventually restart every node in the cluster.
As you can see despite the main purpose is cluster configuration, this application has very little code for actual cluster infrastructure. And with this little code we created perfectly working powerful cluster that would require massive work with different programming languages.