Managing Domino compute resources

Hardware tiers

When launching a Domino execution, users specify a Hardware Tier that determines the resources available for the execution and the node type on which it should run.

../_images/hardware-tier-config.png

Resource requests and limits

At a Kubernetes level, Hardware Tiers specify the CPU, Memory, and GPU for the pods that will host Domino executions. The following properties are used to configure the resource requests and limits for execution pods.

  • Cores Requested

    The number of requested CPUs.

  • Cores Limit

    The maximum number of CPUs. Recommended to be the same as the request.

    Unless the option Allow executions to exceed request when unused CPU is available is selected, Domino will automatically apply a CPU limit equal to the request.

  • Memory Requested (GiB)

    The amount of requested memory.

  • Memory Limit (GiB)

    The maximum amount of memory. Recommended to be the same as the request.

    Unless the option Allow executions memory limit to exceed request is selected, Domino will automatically apply a memory limit equal to the request.

    Warning

    Allowing memory limit to exceed the request should be used with caution since it can make executions more likely to be evicted under memory pressure. You can find more information about how memory requests and limits influence Kubernetes eviction decisions here.

  • Number of GPUs

    The number of requested GPUs.

The request values, CPU Cores, Memory, and GPUs, are thresholds used to determine whether a node has capacity to host an execution pod. These requested resources are effectively reserved for the pod. The limit values control the amount of resources a pod can use above and beyond the amount requested. If there’s additional headroom on the node, the pod can use resources up to this limit.

However, if resources are in contention, and a pod is using resources beyond those it requested, and thereby causing excess demand on a node, the offending pod may be evicted from the node by Kubernetes and the associated Domino Run is terminated. For this reason, Domino strongly recommends setting the requests and limits to the same values.

Keep in mind that your Hardware Tier’s CPU, memory, and GPU requests must not exceed the available resources of the machines in the target node pool after accounting for overhead. Otherwise an execution using such Hardware Tier will never start. If you need more resources than are available on existing nodes, you may need to add a new node pool with different specifications. This may mean adding individual nodes to a static cluster, or configuring new auto-scaling components that provision new nodes with the required specifications and labels.

Node pools

Additionally, Hardware Tiers control the underlying machine type on which a Domino execution will run.

Nodes that have the same value for the dominodatalab.com/node-pool Kubernetes node label form a Node Pool. Executions with a matching value in the Node Pool field will then run on these nodes.

As an example, in the screenshot above, the large-k8s Hardware Tier is configured to use the default node pool.

The diagram below shows a cluster configured with two node pools for Domino, one named default and one named default-gpu. You can make additional node pools available to Domino by labeling them with the same scheme: dominodatalab.com/node-pool=<node-pool-name>. The arrows in this diagram represent Domino requesting that a node with a given label be assigned to an execution. Kubernetes will then assign the execution to a node in the specified pool that has sufficient resources.

../_images/node-pools.png

By default, Domino creates a node pool with the label dominodatalab.com/node-pool=default and all compute nodes Domino creates in cloud environments are assumed to be in this pool. Note that in cloud environments with automatic node scaling, you will configure scaling components like AWS Auto Scaling Groups or Azure Scale Sets with these labels to create elastic node pools.

Advanced hardware tier configuration

In addition to resource requests and node pools, there are several advanced Hardware Tier settings that allow Domino administrators even more control.

Maximum simultaneous executions

When it is necessary to limit the deployment wide capacity accessible through a given Hardware Tier, but creating a dedicated node pool for this Hardware Tire is not practical, a Domino administrator can use the Maximum Simultaneous Executions setting. When the setting is used, Domino will ensure that no more than the specified number of executions can use the desired Hardware Tier at the same time. Additional executions beyond the limit will be queued.

Overprovisioning

On cloud deployments enabled for autoscaling, new nodes will be provisioned in response to capacity requests. This is a great mechanism for minimizing cost, but provisioning a new node can take several minutes causing data scientists to wait while this happens. The situation can be particularly painful in the mornings when a large number of users first log onto a system that has scaled down overnight.

Domino administrators can address this problem by choosing to overprovision a number of “warm” slots for popular Hardware Tiers. Domino will automatically pre-provision any nodes that may be necessary to accommodate the specified number of overprovisioned executions using this Hardware Tier. This will minimize the chance that a user will need to wait for a new node to spin up. To still keep costs under control, an administrator has the option to apply overprovisioning on a scheduled basis for periods when a flurry of new users is expected.

This can be accomplished with the Overprovisioning pods and Overprovisioning schedule Hardware Tier settings.

Custom GPU resource names

By default, Domino will request GPU resources of type nvidia.com/gpu. This works well for most NVIDIA GPU enabled devices, but when your deployment is backed by different GPUs devices (e.g. NVIDIA MIG GPUs, AMD GPUs, AWS vGPUs, Xilinx FPGA) you will need to use a different name for the GPU resources.

To do so, select the Use custom GPU resource name option and specify the appropriate GPU resource name corresponding to the name of the GPU devices being discovered and reported by Kubernetes.

For example, in the case of an NVIDIA A100 GPU configured in MIG Mixed Mode, you will be able to use resources like nvidia.com/mig-1g.5gb, nvidia.com/mig-2g.10gb, or nvidia.com/mig-3g.20gb.

Increased shared memory

You can allow hardware tiers to exceed the default limit of 64MB for shared memory. This is especially beneficial for applications that can make use of shared memory.

To enable this simply select the Allow executions to exceed the default shared memory limit Hardware Tier setting.

Checking this option will override the /dev/shm (shared memory) limit, and any shared memory consumption will count toward the overall memory limit of the hardware tier. Be sure to consider and incorporate the size of /dev/shm in any memory usage calculations for a hardware tier with this option enabled.

../_images/exceed-devshm-default.png

Warning

/dev/shm is considered part of the overall memory footprint of an execution container. It is possible to exceed the total memory of the container when overriding dev/shm to use more shared memory. Exceeding the container’s memory limit via dev/shm will terminate the container.




Scaling compute capacity

The amount of compute power required for your Domino cluster will fluctuate over time as users start and stop executions. Domino relies on Kubernetes to find space for each execution on existing compute resources. In cloud autoscaling environments, if there’s not enough CPU or memory to satisfy a given execution request, the Kubernetes cluster autoscaler will start new compute nodes to fulfill that increased demand. In environments with static nodes, or in cloud environments where you have reached the autoscaling limit, the execution request will be queued until resources are available.

Autoscaling Kubernetes clusters will shut nodes down when they are idle for more than a configurable duration. This reduces your costs by ensuring that nodes are used efficiently, and terminated when not needed.

Cloud autoscaling resources have properties like the minimum and maximum number of nodes they can create. You should set the node maximum to whatever you are comfortable with given the size of your team and expected volume of workloads. All else equal, it is better to have a higher limit than a lower one, as compute node cost is are cheap to start up and shut down, while your data scientists’ time is very valuable. If the cluster cannot scale up any further, your users’ executions will wait in a queue until the cluster can service their request.




User executions quota

To prevent a single user from monopolizing a Domino deployment, an administrator can set a limit on the number of simultaneous executions that a user can have running concurrently. Once the number of simultaneously running executions is reached for a given user, any additional executions will be queued. This includes executions for Domino workspaces, jobs, web applications, as well as any executions that make up an on-demand distributed compute cluster. For example, in the case of an on-demand Spark cluster an execution slot will be consumed for each Spark executor and for the master.

See Important settings for details.




Execution queue limits

To prevent too many queued executions from overwhelming a Domino deployment, an administrator can set a global or per-user limit on the number of execution slots that can be queued. Once either limit is reached, additional executions submitted will be rejected and fail. This includes executions for Domino workspaces, jobs, and web applications.

See Important settings for details.




Common questions


How do I view the current nodes in my compute grid?

From the top menu bar in the admin UI, click Infrastructure. You will see both Platform and Compute nodes in this interface. Click the name of a node to get a complete description, including all applied labels, available resources, and currently hosted pods. This is the full kubectl describe for the node. Non-Platform nodes in this interface with a value in the Node Pool column are compute nodes that can be used for Domino executions by configuring a Hardware Tier to use the pool.

../_images/infrastructure-ui.png

How do I view details on currently active executions?

From the top menu of the admin UI, click Executions. This interface lists active Domino execution pods and shows the type of workload, the Hardware Tier used, the originating user and project, and the status for each pod. There are also links to view a full kubectl describe output for the pod and the node, and an option to download the deployment lifecycle log for the pod generated by Kubernetes and the Domino application.

../_images/executions-ui.png

How do on-demand Spark clusters show up in the active executions interface?

Each Spark node, including master and worker nodes, launched as part of an on-demand Spark cluster will be displayed as a separate row in the executions interface, with complete information available on the originating project and user, as well as the hardware tier.

../_images/spark-execution-admin.png

How do I create or edit a Hardware Tier?

From the top menu of the admin UI, click Advanced > Hardware Tiers, then on the Hardware Tiers page click New to create a new Hardware Tier or Edit to modify an existing Hardware Tier.

../_images/create-edit-hwt.png

Important settings

The following settings in the common namespace of the Domino central configuration affect compute grid behavior.

Deploying state timeout

  • Key: com.cerebro.computegrid.timeouts.sagaStateTimeouts.deployingStateTimeoutSeconds
  • Value: Number of seconds an execution pod in a deploying state will wait before timing out. Default is 60 * 60 (1 hour).

Preparing state timeout

  • Key: com.cerebro.computegrid.timeouts.sagaStateTimeouts.preparingStateTimeoutSeconds
  • Value: Number of seconds an execution pod in a preparing state will wait before timing out. Default is 60 * 60 (1 hour).

Maximum executions per user

  • Key: com.cerebro.domino.computegrid.userExecutionsQuota.maximumExecutionsPerUser
  • Value: Maximum number of executions each user may have running concurrently. If a user tries to run more than this, the excess executions will queue until existing executions finish. Default is 25.

Global execution queue limit

  • Key: com.cerebro.domino.computegrid.userExecutionsQuota.globalExecutionQueueLimit
  • Value: Maximum total number of executions that may be queued across all users. If users try to queue more than this, the excess executions will fail. Default is 1000.

User execution queue limit

  • Key: com.cerebro.domino.computegrid.userExecutionsQuota.userExecutionQueueLimit
  • Value: Maximum number of executions that may be queued per user. If a user tries to queue more than this, the excess executions will fail. Default is 100.

Quota state timeout

  • Key: com.cerebro.computegrid.timeouts.sagaStateTimeouts.userExecutionsOverQuotaStateTimeoutSeconds
  • Value: Number of seconds an execution pod that cannot be assigned due to user quota limitations will wait for resources to become available before timing out. Default is 24 * 60 * 60 (24 hours).