The rendezvous architecture was introduced by Ted Dunning in his book Machine Learning Logistics. This architecture is aimed to solve several real-world machine learning challenges:
- To meet large-scale changing data and changing goals.
- Isolation of models for evaluation in specifically customized, controlled environments.
- Managing multiple models in production at any given time.
- Have new models being readied to replace production models as situations change smoothly and without interruptions to service.
The heart of rendezvous architecture is to treat every input data as stream, i.e.:
- Put all requests into a stream; consumers (models) process input data when needed.
- Outputs of models are put into another stream.
- Rendezvous server works by maintaining a mailbox for each request it sees in the input stream. As each model reports results into the scores stream, the rendezvous server reads these results and inserts them into the corresponding mailbox.
- Based on the amount of time that has passed, the priority of each model and possibly even a random number, the rendezvous server eventually chooses a result for each pending mailbox and packages that result to be sent as a response to the return address in the original request.
If you are interested the details of rendezvous architecture for machine learning, I highly recommend reading the book Machine Learning Logistics and an article Rendezvous Architecture for Data Science in Production written by Jan Teichmann.
def predict(flower: Flower):
predict receives a request, it publishes the request body to a topic, i.e., the
iris. Then it waits for the result to be published to another topic,
get_model_result gets results from the
score topic and will only keep the result from the latest model by comparing the
modelVersion. We also impose a time constraint
timeout here to ensure our service responds in time.
We have two models, the
production model is the latest, and the
canary model is the baseline model. We’re trying to recreate the real-world scenario where we have a production model that is actively trained with the latest data, providing better performance. On the other hand, the canary model is usually the first model, intended to be used as a scoring baseline to compare with the production model. By comparing these two models, we can usually detect distribution shifts.
This model predict function will process every data from the
iris topic and put model result and relevant debug message into the
provenance data field.
The decoy model accepts data like any other model but does not emit any result. Instead, it just archives the inputs that it sees.
log_collector archives every data we received for every
We also put every result to the elastic search for analytic purposes.
Clone my repo here at https://github.com/munhouiani/rendezvous-arch
Install docker and docker compose.
Create conda environment
conda env create -f env_mac.yaml
conda activate rendezvous_arch
Wait for several minutes until all services are ready.
Make an HTTP-GET request to
http://localhost:8000/ping, should get
Make an HTTP-POST request to
Create indices for