Domino on GKE

Domino 4 can run on a Kubernetes cluster provided by the Google Kubernetes Engine (GKE).




Overview

When running on GKE, the Domino 4 architecture uses GCP resources to fulfill the Domino cluster requirements as follows:

  • Kubernetes control is managed by the GKE cluster
  • Domino uses one node pool of three n1-standard-8 worker nodes to host the Domino platform
  • Additional node pools host elastic compute for Domino executions with optional GPU accelerators
  • Cloud Filestore is used to store user data, backups, logs, and Domino Datasets
  • A Cloud Storage Bucket is used to store the Domino Docker Registry.
  • The kubernetes.io/gce-pd provisioner is used to create persistent volumes for Domino executions.



Setting up a GKE cluster for Domino

This section describes how to configure an GKE cluster for use with Domino.


Namespaces

No namespace configuration is necessary prior to install. Domino will create three namespaces in the cluster during installation, according to the following specifications:

Namespace Contains
platform Durable Domino application, metadata, platform services required for platform operation
compute Ephemeral Domino execution pods launched by user actions in the application
domino-system Domino installation metadata and secrets

Node pools

The GKE cluster must have at least two node pools that produce worker nodes with the following specifications and distinct node labels, and it may include an optional GPU pool:

Pool Min-Max Instance Disk Labels
platform 3-3 n1-standard-8 128G dominodatalab.com/node-pool: platform
default 1-20 n1-standard-8 400G dominodatalab.com/node-pool: default domino/build-node: true
default-gpu (optional) 0-5 n1-standard-8 400G dominodatalab.com/node-pool: default-gpu

If you want to configure the default-gpu pool, you must add a GPU accelerator the the node pool. Read the GKE documentation on available accelerators and on deploying a DaemonSet that automatically installs the necessary drivers.

Additional node pools can be added with distinct dominodatalab.com/node-pool labels to make other instance types available for Domino executions. Read Managing the Domino compute grid to learn how these different node types are referenced by label from the Domino application.

Consult the Terraform snippets below for code representations of the required node pools.

Platform pool

resource "google_container_node_pool" "platform" {
  name     = "platform"
  location = $YOUR_CLUSTER_ZONE_OR_REGION
  cluster  = $YOUR_CLUSTER_NAME

  initial_node_count = 3
  autoscaling {
    max_node_count = 3
    min_node_count = 3
  }

  node_config {
    preemptible  = false
    machine_type = "n1-standard-8"

    labels = {
      "dominodatalab.com/node-pool" = "platform"
    }

    disk_size_gb    = 128
    local_ssd_count = 1
  }

  management {
    auto_repair  = true
    auto_upgrade = true
  }

  timeouts {
    delete = "20m"
  }
}

Default compute pool

resource "google_container_node_pool" "compute" {
  name     = "compute"
  location = $YOUR_CLUSTER_ZONE_OR_REGION
  cluster  = $YOUR_CLUSTER_NAME

  initial_node_count = 1
  autoscaling {
    max_node_count = 20
    min_node_count = 1
  }

  node_config {
    preemptible  = false
    machine_type = "n1-standard-8"

    labels = {
      "domino/build-node"            = "true"
      "dominodatalab.com/build-node" = "true"
      "dominodatalab.com/node-pool"  = "default"
    }

    disk_size_gb    = 400
    local_ssd_count = 1
  }

  management {
    auto_repair  = true
    auto_upgrade = true
  }

  timeouts {
    delete = "20m"
  }
}

Optional GPU pool

resource "google_container_node_pool" "gpu" {
  provider = google-beta
  name     = "gpu"
  location = $YOUR_CLUSTER_ZONE_OR_REGION
  cluster  = $YOUR_CLUSTER_NAME

  initial_node_count = 0

  autoscaling {
    max_node_count = 5
    min_node_count = 0
  }

  node_config {
    preemptible  = false
    machine_type = "n1-standard-8"

    guest_accelerator {
      type  = "nvidia-tesla-p100"
      count = 1
    }

    labels = {
      "dominodatalab.com/node-pool" = "default-gpu"
    }

    disk_size_gb    = 400
    local_ssd_count = 1

    workload_metadata_config {
      node_metadata = "GKE_METADATA_SERVER"
    }
  }

  management {
    auto_repair  = true
    auto_upgrade = true
  }

  timeouts {
    delete = "20m"
  }
}

Network policy enforcement

Domino relies on Kubernetes network policies to manage secure communication between pods in the cluster. By default, the network plugin in GKE will not enforce these policies. To run Domino securely on GKE, you must enable enforcement of network policies.

Read the GKE documentation for instructions on enabling network policy enforcement for your cluster.


Dynamic block storage

The Domino installer will automatically create a storage class like the example below for use provisioning GCE persistent disks as execution volumes. No manual setup is necessary for this storage class.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: dominodisk
parameters:
  replication-type: none
  type: pd-standard
provisioner: kubernetes.io/gce-pd
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Shared storage

A Cloud Filestore instance must be provisioned with at least 10T of capacity and it must be configured to allow access from the cluster. You will provide the IP address and mount path of this instance to the Domino installer, and it will create an NFS storage class like the below.

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  labels:
    app.kubernetes.io/instance: nfs-client-provisioner
    app.kubernetes.io/managed-by: Tiller
    app.kubernetes.io/name: nfs-client-provisioner
    helm.sh/chart: nfs-client-provisioner-1.2.6-0.1.4
  name: domino-shared
parameters:
  archiveOnDelete: "false"
provisioner: cluster.local/nfs-client-provisioner
reclaimPolicy: Delete
volumeBindingMode: Immediate

Docker registry storage

You will need one Cloud Storage Bucket accessible from your cluster to be used for storing the internal Domino Docker Registry.


Domain

Domino will need to be configured to serve from a specific FQDN. To serve Domino securely over HTTPS, you will also need an SSL certificate that covers the chosen name. Record the FQDN for use when installing Domino. Once Domino is deployed into your cluster, you must set up DNS for this name to point to an HTTPS Cloud Load Balancer that has an SSL certificate for the chosen name, and forwards traffic to port 80 on your platform nodes.


Checking your GKE cluster

If you’ve applied the configurations described above to your GKE cluster, it should be able to run the Domino cluster requirements checker without errors. If the checker runs successfully, you are ready for Domino to be installed in the cluster.