Model API scaling and routing

Scale model deployments horizontally and vertically for optimal performance. You can also use model API routing to deploy simultaneous APIs for testing and production.

Scale horizontally

Scale horizontally for throughput-constrained model API endpoints. Typically these are endpoints that have many concurrent users. Consider horizontal scaling when downstream applications see long queues and running times from your endpoint.

When you publish a model API, select the number of model API instances that you want to run at any given time. Domino automatically load-balances requests to the endpoint between these instances. A minimum of two instances (default) provides a high-availability setup. Domino supports up to 32 instances per model API.

Note	Domino admins use the com.cerebro.domino.modelmanager.instances.defaultNumber Central Configuration key to change the default number of instances.

Scale vertically

Scale vertically for resource-constrained model API endpoints. Consider whether your endpoint requires complex tasks with more processing power. Scale model APIs vertically when downstream applications see long-running jobs for complex processes.

When you publish a model API, select a Resource quota that determines the amount of RAM and CPU/GPU resources available to each model API instance.

Tip	The scaling settings are under Model APIs > <model name> > Settings > Deployment.

Note	If you make changes to scale the model API, you must restart it.

Route your model API

Domino supports basic and advanced routing modes to help you manage development and test deployments. To change routing modes, go to Settings > Deployment for each model API.

Basic mode: In basic mode, one exposed endpoint always points to the latest successfully-deployed model API version. When you deploy a new version, the old version is shut down and replaced with the new one to maintain availability. Basic mode routes have the following signature:

Latest: /models/<modelId>/latest/model

Advanced mode

In advanced mode, a promoted version and the latest version exist simultaneously. Advanced mode lets you point your clients to the promoted, production version, while giving you the ability to test with the latest version. When the latest version is ready for production, seamlessly switch it to the promoted version without downtime. Advanced mode routes have this signature:

Latest: /models/<modelId>/latest/model

Promoted: /models/<modelId>/labels/prod/model

Next steps

Export models to SageMaker