## Stream-First Rendezvous Architecture

The rendezvous architecture was introduced by Ted Dunning in his book Machine Learning Logistics. This architecture is aimed to solve several real-world machine learning challenges:

• To meet large-scale changing data and changing goals.
• Isolation of models for evaluation in specifically customized, controlled environments.
• Managing multiple models in production at any given time.
• Have new models being readied to replace production models as situations change smoothly and without interruptions to service.

The heart of rendezvous architecture is to treat every input data as stream, i.e.:

• Put all requests into a stream; consumers (models) process input data when needed.
• Outputs of models are put into another stream.
• Rendezvous server works by maintaining a mailbox for each request it sees in the input stream. As each model reports results into the scores stream, the rendezvous server reads these results and inserts them into the corresponding mailbox.
• Based on the amount of time that has passed, the priority of each model and possibly even a random number, the rendezvous server eventually chooses a result for each pending mailbox and packages that result to be sent as a response to the return address in the original request.

If you are interested the details of rendezvous architecture for machine learning, I highly recommend reading the book Machine Learning Logistics and an article Rendezvous Architecture for Data Science in Production written by Jan Teichmann.

## Implementation

### predict The Front End

Once predict receives a request, it publishes the request body to a topic, i.e., the iris. Then it waits for the result to be published to another topic, score.

get_model_result gets results from the score topic and will only keep the result from the latest model by comparing the modelVersion. We also impose a time constraint timeout here to ensure our service responds in time.

### Prediction Models

We have two models, the production model is the latest, and the canary model is the baseline model. We’re trying to recreate the real-world scenario where we have a production model that is actively trained with the latest data, providing better performance. On the other hand, the canary model is usually the first model, intended to be used as a scoring baseline to compare with the production model. By comparing these two models, we can usually detect distribution shifts.

This model predict function will process every data from the iris topic and put model result and relevant debug message into the provenance data field.

### Decoy Model

The decoy model accepts data like any other model but does not emit any result. Instead, it just archives the inputs that it sees.

### Log Connector

The log_collector archives every data we received for every timeout second.

### ES Connector

We also put every result to the elastic search for analytic purposes.

## How to Run

Clone my repo here at https://github.com/munhouiani/rendezvous-arch

1. Install docker and docker compose.

2. Create conda environment

3. Train canary and production model

4. Deploy services

Wait for several minutes until all services are ready.

5. Make an HTTP-GET request to http://localhost:8000/ping, should get

6. Make an HTTP-POST request to localhost:8000/predict with body:

should get

7. Create indices for decoy-log and score-log at Kibana http://localhost:5601.