Cluster requirements

You can deploy Domino 4 into a Kubernetes cluster that meets the following requirements.




General requirements

  • Kubernetes minimum version 1.16; maximum version 1.19+

  • Cluster permissions

    Domino needs permission to install and configure pods in the cluster via Helm. The Domino installer is delivered as a containerized Python utility that operates Helm through a kubeconfig that provides service account access to the cluster.

  • Three namespaces

    Domino creates three dedicated namespaces, one for Platform nodes, one for Compute nodes, and one for installer metadata and secrets.




Storage requirements

Storage classes

Domino requires at least two storage classes.

  1. Dynamic block storage

    Domino requires high performance block storage for the following types of data:

    • Ephemeral volumes attached to user execution
    • High performance databases for Domino application object data

    This storage needs to be backed by a storage class with the following properties:

    • Supports dynamic provisioning
    • Can be mounted on any node in the cluster
    • SSD-backed recommended for fast I/O
    • Capable of provisioning volumes of at least 100GB
    • Underlying storage provider can support ReadWriteOnce semantics

    By default, this storage class is named dominodisk.

    In AWS, EBS is used to back this storage class. Consult this example configuration for a compatible EBS storage class:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: domino-compute-storage
    provisioner: kubernetes.io/aws-ebs
    parameters:
      type: gp2
      fsType: ext4
    

    In GCP, compute engine persistent disks are used to back this storage class. Consult this example configuration for a compatible GCEPD storage class:

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: dominodisk
    parameters:
      replication-type: none
      type: pd-standard
    provisioner: kubernetes.io/gce-pd
    reclaimPolicy: Delete
    volumeBindingMode: WaitForFirstConsumer
    

  1. Long term shared storage

    Domino needs a separate storage class for long term storage for:

    • Project data uploaded or created by users
    • Domino Datasets
    • Docker images
    • Domino backups

    This storage needs to be backed by a storage class with the following properties:

    • Dynamically provisions Kubernetes PersistentVolume
    • Can be accessed in ReadWriteMany mode from all nodes in the cluster
    • Uses a VolumeBindingMode of Immediate

    In AWS, these storage requirements are handled by two separate classes. One backed by EFS for Domino Datasets, and one backed by S3 for project data, backups, and Docker images.

    In GCP, these storage requirements are handled by a Cloud Filestore volume mounted as NFS.

    By default, this storage class is named dominoshared.


Native

For shared storage, we allow for (and even require) native cloud provider object store for a few resources and services:

  • Blob Storage. For AWS, the blob storage must be backed by S3 (see Blob storage). For other infrastructure, the dominoshared storage class is used.

  • Logs. For AWS, the log storage must be backed by S3 (see Blob storage). For others, the dominoshared storage class is used.

  • Backups. For all supported cloud providers, storage for backups are backed by the native blob store. For on-prem, backups is backed by the dominoshared storage class.

  • Datasets. For AWS, Datasets storage must be backed by EFS (see Datasets storage). For other infrastructure, the dominoshared storage class is used.

On-Prem

In on-prem environments, both dominodisk and dominoshared can be backed by NFS. In some cases, host volumes can be used (and even preferred). Host volumes are preferred for the Git, Postgres, and MongoDB. Postgres and MongoDB provide state replication. Host volumes can be used for Runs, but not preferred since we want leverage files cached in block storage that can move between nodes. If host volumes are used for Runs, file caching should be disabled and you will potentially expect slow start up executions for large Projects.

Summary

The following table summarizes the storage requirements and options.

Service

Type

Comments

Storage Class

AWS

Azure

GCP

On-Prem

Runs

Block storage

Ephemeral

dominodisk

EBS

Azure Disk Storage

GCP Persistent Disk

NFS or Host Volumes

Blob Storage

Shared storage

Long-Term

dominoshared

N/A

Azure Files

GCP File Store

NFS

Native

S3

N/A

N/A

N/A

Logs

Shared storage

Long-Term

dominoshared

N/A

Azure Files

GCP File Store

NFS

Native

S3

N/A

N/A

N/A

Backups

Shared storage

Long-Term

dominoshared

N/A

N/A

N/A

NFS

Native

S3

Azure Blob Storage

GCP Cloud Storage

N/A

Git

Block storage

Long-Term

dominoshared

EBS

Azure Disk Storage

GCP Persistent Disk

NFS or Host Volumes

Container Images

Shared storage

Long-Term

dominoshared

EBS

Azure Disk Storage

GCP Persistent Disk

NFS

Native

S3

Azure Blob Storage

GCP Cloud Storage

NFS

Datasets

Shared storage

Long-Term

dominoshared

N/A

Azure Files

GCP File Store

NFS

Native

EFS

N/A

N/A

N/A

Postgres

Block storage

For Keycloak. Replication

dominodisk

EBS

Azure Disk Storage

GCP Persistent Disk

NFS or Host Volumes

MongoDB

Block storage

Replication

dominodisk

EBS

Azure Disk Storage

GCP Persistent Disk

NFS or Host Volumes

Default for Services

Block storage

Elasticsearch, RabbitMQ, Redis, etc.

dominodisk

EBS

Azure Disk Storage

GCP Persistent Disk

NFS or Host Volumes



Node pool requirements

Domino requires a minimum of two node pools, one to host the Domino Platform and one to host Compute workloads. Additional optional pools can be added to provide specialized execution hardware for some Compute workloads.

  1. Platform pool requirements
    • Boot Disk: 128GB
    • Min Nodes: 3
    • Max Nodes: 3
    • Spec: 8 CPU / 32GB
    • Labels: dominodatalab.com/node-pool: platform
    • Tags:
      • kubernetes.io/cluster/{{ cluster_name }}: owned
      • k8s.io/cluster-autoscaler/enabled: true #Optional for autodiscovery
      • k8s.io/cluster-autoscaler/{{ cluster_name }}: owned #Optional for autodiscovery

  1. Compute pool requirements

    • Boot Disk: 400GB

    • Recommended Min Nodes: 1

    • Max Nodes: Set as necessary to meet demand and resourcing needs

    • Recommended min spec: 8 CPU / 32GB

    • Enable Autoscaling: Yes

    • Labels: domino/build-node: true, dominodatalab.com/node-pool: default

    • Tags:

      • k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool: default
      • kubernetes.io/cluster/{{ cluster_name }}: owned
      • k8s.io/cluster-autoscaler/node-template/label/domino/build-node: true
      • k8s.io/cluster-autoscaler/enabled: true #Optional for autodiscovery
      • k8s.io/cluster-autoscaler/{{ cluster_name }}: owned #Optional for autodiscovery

  1. Optional GPU compute pool

    • Boot Disk: 400GB

    • Recommended Min Nodes: 0

    • Max Nodes: Set as necessary to meet demand and resourcing needs

    • Recommended min Spec: 8 CPU / 16GB / One or more Nvidia GPU Device

    • Nodes must be pre-configured with appropriate Nvidia driver, Nvidia-docker2 and set the default docker runtime to nvidia. For example, EKS GPU optimized AMI.

    • Labels: dominodatalab.com/node-pool: default-gpu, nvidia.com/gpu: true

    • Tags:

      • k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool: default-gpu
      • kubernetes.io/cluster/{{ cluster_name }}: owned
      • k8s.io/cluster-autoscaler/enabled: true #Optional for autodiscovery
      • k8s.io/cluster-autoscaler/{{ cluster_name }}: owned #Optional for autodiscovery



Cluster networking

Domino relies on Kubernetes network policies to manage secure communication between pods in the cluster. Network policies are implemented by the network plugin, so your cluster use a networking solution which supports NetworkPolicy, such as Calico.




Ingress and SSL

Domino will need to be configured to serve from a specific FQDN, and DNS for that name should resolve to the address of an SSL-terminating load balancer with a valid certificate. The load balancer must target incoming connections on ports 80 and 443 to port 80 on all nodes in the Platform pool. This load balancer must support websocket connections.

Health checks for this load balancer should use HTTP on port 80 and check for 200 responses from a path of /health on the nodes.




NTP

In order to support SSO protocols, TLS connections to external services, intra-cluster TLS when using Istio, and to avoid general interoperability issues, the nodes in your Kubernetes cluster should have a valid Network Time Protocol (NTP) configuration. This will allow for successful TLS validation and operation of other time-sensitive protocols.