Domino on OpenShift

Starting with Domino 4.3.1, the Domino platform is compatible to run on the OpenShift Container Platform (OCP) and OpenShift Kubernetes Engine (OKE). Domino supports the OCP/OKE version 4.4+.




Setting up an OpenShift cluster for Domino

This section describes how to configure an OpenShift Kubernetes Engine cluster for use with Domino.


Namespaces

No namespace configuration is necessary prior to install. Domino will create three namespaces in the cluster during installation, according to the following specifications:

Namespace Contains
platform Durable Domino application, metadata, platform services required for platform operation
compute Ephemeral Domino execution pods launched by user actions in the application
domino-system Domino installation metadata and secrets

Node pools

The OpenShift cluster must have worker nodes with the following specifications and distinct node labels, and it may include an optional GPU pool:

Pool Min-Max vCPU Memory Disk Labels
platform Min 3 8 32G 128G dominodatalab.com/node-pool: platform
default 1-20 8 32G 400G dominodatalab.com/node-pool: default domino/build-node: true
default-gpu (optional) 0-5 8 32G 400G dominodatalab.com/node-pool: default-gpu nvidia.com/gpu: true

More generally, the platform worker nodes need an aggregate minimum of 24 CPUs and 96G of memory. Spreading the resources across multiple nodes with proper failure isolation (e.g. availability zones) is recommended.

Managing nodes and node pools in OpenShift is done through Machine Management and the Machine API. For each node pool above, you will need to create a MachineSet. Be sure to provide the Domino required labels in the Machine spec (spec.template.spec.metadata.labels stanza). Also, update any provider spec per your infrastructure provider of choice and sizing (spec.template.spec.providerSpec stanza); for example, in AWS, updates may include, but not limited to: AMI ID, block device storaage sizing, and availability zone placement.

The following is an example MachineSet for the platform node pool:

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
   machine.openshift.io/cluster-api-cluster: firestorm-dxcpd
name: firestorm-dxcpd-platform-us-west-1a
namespace: openshift-machine-api
spec:
replicas: 3
selector:
   matchLabels:
      machine.openshift.io/cluster-api-cluster: firestorm-dxcpd
      machine.openshift.io/cluster-api-machineset: firestorm-dxcpd-platform-us-west-1a
template:
   metadata:
      labels:
      machine.openshift.io/cluster-api-cluster: firestorm-dxcpd
      machine.openshift.io/cluster-api-machine-role: platform
      machine.openshift.io/cluster-api-machine-type: platform
      machine.openshift.io/cluster-api-machineset: firestorm-dxcpd-platform-us-west-1a
   spec:
      metadata:
      labels:
         node-role.kubernetes.io/default: ""
         dominodatalab.com/node-pool: platform
      providerSpec:
      value:
         ami:
            id: ami-02b6556210798d665
         apiVersion: awsproviderconfig.openshift.io/v1beta1
         blockDevices:
         - ebs:
            iops: 0
            volumeSize: 120
            volumeType: gp2
         credentialsSecret:
            name: aws-cloud-credentials
         deviceIndex: 0
         iamInstanceProfile:
            id: firestorm-dxcpd-worker-profile
         instanceType: m5.2xlarge
         kind: AWSMachineProviderConfig
         metadata:
            creationTimestamp: null
         placement:
            availabilityZone: us-west-1a
            region: us-west-1
         publicIp: null
         securityGroups:
         - filters:
            - name: tag:Name
            values:
            - firestorm-dxcpd-worker-sg
         subnet:
            filters:
            - name: tag:Name
            values:
            - firestorm-dxcpd-private-us-west-1a
         tags:
         - name: kubernetes.io/cluster/firestorm-dxcpd
            value: owned
         userDataSecret:
            name: worker-user-data

The following is an example MachineSet for the default (compute) node pool:

apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
labels:
   machine.openshift.io/cluster-api-cluster: firestorm-dxcpd
name: firestorm-dxcpd-default-us-west-1a
namespace: openshift-machine-api
spec:
replicas: 3
selector:
   matchLabels:
      machine.openshift.io/cluster-api-cluster: firestorm-dxcpd
      machine.openshift.io/cluster-api-machineset: firestorm-dxcpd-default-us-west-1a
template:
   metadata:
      labels:
      machine.openshift.io/cluster-api-cluster: firestorm-dxcpd
      machine.openshift.io/cluster-api-machine-role: default
      machine.openshift.io/cluster-api-machine-type: default
      machine.openshift.io/cluster-api-machineset: firestorm-dxcpd-default-us-west-1a
   spec:
      metadata:
      labels:
         node-role.kubernetes.io/default: ""
         dominodatalab.com/node-pool: default
         domino/build-node: "true"
      providerSpec:
      value:
         ami:
            id: ami-02b6556210798d665
         apiVersion: awsproviderconfig.openshift.io/v1beta1
         blockDevices:
         - ebs:
            iops: 0
            volumeSize: 400
            volumeType: gp2
         credentialsSecret:
            name: aws-cloud-credentials
         deviceIndex: 0
         iamInstanceProfile:
            id: firestorm-dxcpd-worker-profile
         instanceType: m5.2xlarge
         kind: AWSMachineProviderConfig
         metadata:
            creationTimestamp: null
         placement:
            availabilityZone: us-west-1a
            region: us-west-1
         publicIp: null
         securityGroups:
         - filters:
            - name: tag:Name
            values:
            - firestorm-dxcpd-worker-sg
         subnet:
            filters:
            - name: tag:Name
            values:
            - firestorm-dxcpd-private-us-west-1a
         tags:
         - name: kubernetes.io/cluster/firestorm-dxcpd
            value: owned
         userDataSecret:
            name: worker-user-data

Node Autoscaling

For clusters on top of elastic cloud provider, node autoscaling (or Machine autoscaling) is achieved by creating ClusterAutoscaler and MachineAutoscaler resources.

The following is an example ClusterAutoscaler:

apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
name: "default"
spec:
podPriorityThreshold: -10
resourceLimits:
   maxNodesTotal: 20
   cores:
      min: 8
      max: 256
   memory:
      min: 4
      max: 256
   gpus:
      - type: nvidia.com/gpu
      min: 0
      max: 16
      - type: amd.com/gpu
      min: 0
      max: 4
scaleDown:
   enabled: true
   delayAfterAdd: 10m
   delayAfterDelete: 5m
   delayAfterFailure: 30s
   unneededTime: 10m

The following is an example MachineAutoscaler for the MachineSet created for the default node pool:

apiVersion: "autoscaling.openshift.io/v1beta1"
kind: "MachineAutoscaler"
metadata:
name: "firestorm-dxcpd-default-us-west-1a"
namespace: "openshift-machine-api"
spec:
minReplicas: 1
maxReplicas: 5
scaleTargetRef:
   apiVersion: machine.openshift.io/v1beta1
   kind: MachineSet
   name: firestorm-dxcpd-default-us-west-1a

Networking

Domain

Domino will need to be configured to serve from a specific FQDN. To serve Domino securely over HTTPS, you will also need an SSL certificate that covers the chosen name.

Network Plugin

Domino relies on Kubernetes network policies to manage secure communication between pods in the cluster. By default, OpenShift uses the Cluster Network Operator to deploy the OpenShift SDN default CNI network provider plugin, which support network policies and hence should just work.

Ingress

Domino uses the NGNIX ingress controller maintained by the Kubernetes project instead of (but does not replace) the OpenShift implemented HAProxy-based ingress controller and deploys the ingress controller as a node port service. By default, the ingress listens on node ports 443 (HTTPS) and 80 (HTTP).

Load Balancer

A load balancer should be set up to use your DNS name. For example, in AWS, you will need to setup the DNS so it points a CNAME at an Elastic Load Balancer.

After you complete the installation process, you must configure the load balancer to balance across the platform nodes at the ports specified by your ingress.

External Resources

If you plan to connect your cluster to other resources like data sources or authentication services, pods running on the cluster should have network connectivity to those resources.


Container Registry

Domino deploys its own container image registry instead of using the OpenShift built in container image registry. During installation, the OpenShift cluster image configuration is modified to trust the Domino certificate authority (CA). This is done to ensure that OpenShift can run pods using Domino’s custom built images. In the images.config.openshift.io/cluster resource, you can find a reference to a ConfigMap that contains the Domino CA.

spec:
  additionalTrustedCA:
    name: domino-deployment-registry-config

Checking your OpenShift cluster

If you’ve applied the configurations described above to your OpenShift cluster, it should be able to run the Domino cluster requirements checker without errors. If the checker runs successfully, you are ready for Domino to be installed in the cluster.