Domino on AKS

Domino 4 can run on a Kubernetes cluster provided by the Azure Kubernetes Service. When running on AKS, the Domino 4 architecture uses Azure resources to fulfill the Domino cluster requirements as follows:

../_images/aks_simple.png
  • For a complete Terraform module for Domino-compatible AKS provisioning, see terraform-azure-aks on GitHub.
  • Kubernetes control is handled by the AKS control plane with managed Kubernetes masters
  • The AKS cluster’s default node pool is configured to host the Domino platform
  • Additional AKS node pools provide compute nodes for user workloads
  • An Azure storage account stores Domino blob data and datasets
  • The kubernetes.io/azure-disk provisioner is used to create persistent volumes for Domino executions
  • The Advanced Azure CNI is used for cluster networking, with network policy enforcement handled by Calico
  • Ingress to the Domino application is handled by an SSL-terminating Application Gateway that points to a Kubernetes load balancer
  • Domino recommends provisioning with Terraform for extended control and customizability of all resources. When setting up your Azure Terraform provider, please add a partner_id with a value of 31912fbf-f6dd-5176-bffb-0a01e8ac71f2 to enable usage attribution.



Setting up an AKS cluster for Domino

This section describes how to configure an AKS cluster for use with Domino.


Resource groups

You can provision the cluster, storage, and application gateway in an existing resource group. Note that in the process of creating the cluster itself, Azure will create a separate resource group that will contain the cluster components themselves.


Namespaces

No namespace configuration is necessary prior to install. Domino will create three namespaces in the cluster during installation, according to the following specifications:

Namespace Contains
platform Durable Domino application, metadata, platform services required for platform operation
compute Ephemeral Domino execution pods launched by user actions in the application
domino-system Domino installation metadata and secrets

Node pools

The AKS cluster’s initial default node pool can be sized and configured to host the must have at least two node pools that produce worker nodes with the following specifications and distinct node labels, and it may include an optional GPU pool:

Pool Min-Max VM Disk Labels
platform 1-4 Standard_DS5_v2 128G dominodatalab.com/node-pool: platform
default 1-20 Standard_DS4_v2 128G dominodatalab.com/node-pool: default domino/build-node: true
default-gpu (optional) 0-5 Standard_NC6 128G dominodatalab.com/node-pool: default-gpu nvidia.com/gpu: true

The recommended architecture configures the cluster’s initial default node pool with the correct label and size to serve as the platform node pool. See the below cluster Terraform resource for a complete example.

resource "azurerm_kubernetes_cluster" "aks" {

  name                       = example_cluster
  enable_pod_security_policy = false
  location                   = "East US"
  resource_group_name        = "example_resource_group"
  dns_prefix                 = "example_cluster"
  private_cluster_enabled    = false

  default_node_pool {
    enable_node_public_ip = false
    name                  = "platform"
    node_count            = 4
    node_labels           = { "dominodatalab.com/node-pool" : "platform" }
    vm_size               = "Standard_DS5_v2"
    availability_zones    = ["1", "2", "3"]
    max_pods              = 250
    os_disk_size_gb       = 128
    node_taints           = []
    enable_auto_scaling   = true
    min_count             = 1
    max_count             = 4
  }

  network_profile {
    load_balancer_sku  = "Standard"
    network_plugin     = "azure"
    network_policy     = "calico"
    dns_service_ip     = "100.97.0.10"
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr       = "100.97.0.0/16"
  }

}

A separate node pool for Domino default compute should be added after the cluster is created. Note that this is not the initial cluster default node pool, but a separate node pool named default that is added to serve default Domino compute. See the below node pool Terraform resource for a complete example.

resource "azurerm_kubernetes_cluster_node_pool" "aks" {

  enable_node_public_ip = false
  kubernetes_cluster_id = "example_cluster_id"
  name                  = "default"
  node_count            = 1
  vm_size               = "Standard_DS4_v2"
  availability_zones    = ["1", "2", "3"]
  max_pods              = 250
  os_disk_size_gb       = 128
  os_type               = "Linux"
  node_labels = {
    "domino/build-node"            = "true"
    "dominodatalab.com/build-node" = "true"
    "dominodatalab.com/node-pool"  = "default"
  }
  node_taints           = []
  enable_auto_scaling   = true
  min_count             = 1
  max_count             = 20

}

Additional node pools can be added with distinct dominodatalab.com/node-pool labels to make other instance types available for Domino executions. Read Managing the Domino compute grid to learn how these different node types are referenced by label from the Domino application. When adding GPU node pools, keep in mind the Azure guidance and best practices on using GPU nodes in AKS.


Network plugin

The Domino-hosting cluster should use the Advanced Azure CNI with network policy enforcement by Calico. See the below network_profile configuration example.

network_profile {
  load_balancer_sku  = "Standard"
  network_plugin     = "azure"
  network_policy     = "calico"
  dns_service_ip     = "100.97.0.10"
  docker_bridge_cidr = "172.17.0.1/16"
  service_cidr       = "100.97.0.0/16"
}

Dynamic block storage

AKS clusters come equipped with several kubernetes.io/azure-disk backed storage classes by default. Domino requires use of premium disks for adequate input and output performance. The managed-premium class that is created by default can be used. Consult the following storage class specification as an example.

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  labels:
    kubernetes.io/cluster-service: "true"
  name: managed-premium
  selfLink: /apis/storage.k8s.io/v1/storageclasses/managed-premium
parameters:
  cachingmode: ReadOnly
  kind: Managed
  storageaccounttype: Premium_LRS
reclaimPolicy: Delete
volumeBindingMode: Immediate

Persistent blob and data storage

Domino uses one Azure storage account for both blob data and files. See the below configuration for the two resources required, the storage account itself and a blob container inside the account.

resource "azurerm_storage_account" "domino" {
  name                     = "example_storage_account"
  resource_group_name      = "example_resource_group"
  location                 = "East US"
  account_kind             = "StorageV2"
  account_tier             = "Standard"
  account_replication_type = "LRS"
  access_tier              = "Hot"
}

resource "azurerm_storage_container" "domino_registry" {
  name                  = "docker"
  storage_account_name  = "example_storage_account"
  container_access_type = "private"
}

Record the names of these resources for use when installing Domino.


Domain

Domino will need to be configured to serve from a specific FQDN. To serve Domino securely over HTTPS, you will also need an SSL certificate that covers the chosen name. Record the FQDN for use when installing Domino.


Checking your AKS cluster

If you’ve applied the configurations described above to your AKS cluster, it should be able to run the Domino cluster requirements checker without errors. If the checker runs successfully, you are ready for Domino to be installed in the cluster.




Example installer configuration

See below for an example configuration file for the Domino installer based on the provisioning examples above.

schema: '1.0'
name: domino-deployment
version: 4.1.9
hostname: domino.example.org
pod_cidr: '100.97.0.0/16'
ssl_enabled: true
ssl_redirect: true
request_resources: true
enable_network_policies: true
enable_pod_security_policies: true
create_restricted_pod_security_policy: true
namespaces:
  platform:
    name: domino-platform
    annotations: {}
    labels:
      domino-platform: 'true'
  compute:
    name: domino-compute
    annotations: {}
    labels: {}
  system:
    name: domino-system
    annotations: {}
    labels: {}
ingress_controller:
  create: true
  gke_cluster_uuid: ''
storage_classes:
  block:
    create: false
    name: managed-premium
    type: azure-disk
    access_modes:
    - ReadWriteOnce
    base_path: ''
    default: false
  shared:
    create: true
    name: dominoshared
    type: azure-file
    access_modes:
    - ReadWriteMany
    efs:
      region: ''
      filesystem_id: ''
    nfs:
      server: ''
      mount_path: ''
      mount_options: []
    azure_file:
      storage_account: 'example_storage_account'
blob_storage:
  projects:
    type: shared
    s3:
      region: ''
      bucket: ''
      sse_kms_key_id: ''
    azure:
      account_name: ''
      account_key: ''
      container: ''
    gcs:
      bucket: ''
      service_account_name: ''
      project_name: ''
  logs:
    type: shared
    s3:
      region: ''
      bucket: ''
      sse_kms_key_id: ''
    azure:
      account_name: ''
      account_key: ''
      container: ''
    gcs:
      bucket: ''
      service_account_name: ''
      project_name: ''
  backups:
    type: shared
    s3:
      region: ''
      bucket: ''
      sse_kms_key_id: ''
    azure:
      account_name: ''
      account_key: ''
      container: ''
    gcs:
      bucket: ''
      service_account_name: ''
      project_name: ''
  default:
    type: shared
    s3:
      region: ''
      bucket: ''
      sse_kms_key_id: ''
    azure:
      account_name: ''
      account_key: ''
      container: ''
    gcs:
      bucket: ''
      service_account_name: ''
      project_name: ''
    enabled: true
autoscaler:
  enabled: false
  cloud_provider: azure
  groups:
  - name: ''
    min_size: 0
    max_size: 0
  aws:
    region: ''
  azure:
    resource_group: ''
    subscription_id: ''
spotinst_controller:
  enabled: false
  token: ''
  account: ''
external_dns:
  enabled: false
  provider: aws
  domain_filters: []
  zone_id_filters: []
git:
  storage_class: managed-premium
email_notifications:
  enabled: false
  server: smtp.customer.org
  port: 465
  encryption: ssl
  from_address: domino@customer.org
  authentication:
    username: ''
    password: ''
monitoring:
  prometheus_metrics: true
  newrelic:
    apm: false
    infrastructure: false
    license_key: ''
helm:
  tiller_image: gcr.io/kubernetes-helm/tiller
  appr_registry: quay.io
  appr_insecure: false
  appr_username: '$QUAY_USERNAME'
  appr_password: '$QUAY_PASSWORD'
private_docker_registry:
  server: quay.io
  username: '$QUAY_USERNAME'
  password: '$QUAY_PASSWORD'
internal_docker_registry:
  s3_override:
    region: ''
    bucket: ''
    sse_kms_key_id: ''
  gcs_override:
    bucket: ''
    service_account_name: ''
    project_name: ''
  azure_blobs_override:
    account_name: 'example_storage_account'
    account_key: 'example_storage_account_key'
    container: 'docker'
telemetry:
  intercom:
    enabled: false
  mixpanel:
    enabled: false
    token: ''
gpu:
  enabled: false
fleetcommand:
  enabled: false
  api_token: ''
teleport:
  acm_arn: arn:aws:acm:<region>:<account>:certificate/<id>
  enabled: false
  hostname: teleport-domino.example.org