Domino can run on a Kubernetes cluster provided by AWS Elastic Kubernetes Service. When running on EKS, the Domino architecture uses AWS resources to fulfill the Domino cluster requirements as follows:
-
Kubernetes controls moves to the EKS control plane with managed Kubernetes masters.
-
Domino uses a dedicated Auto Scaling Group (ASG) of EKS workers to host the Domino platform.
-
ASGs of EKS workers host elastic compute for Domino executions.
-
AWS S3 stores user data, internal Docker registry, backups, and logs.
-
AWS EFS stores Domino Datasets.
-
The
ebs.csi.aws.com
provisioner creates persistent volumes for Domino executions. -
Calico is a network plugin that supports Kubernetes network policies.
-
Domino cannot be installed on EKS Fargate, since Fargate does not support stateful workloads with persistent volumes.
-
Instead of EKS Managed Node groups, Domino recommends creating custom node groups to allow for additional control and customized Amazon Machine Images. Domino recommends that you use
eksctl
, Terraform, or CloudFormation to set up custom node groups.
All nodes in such a deployment have private IPs, and inter-node traffic is routed by internal load balancer. Nodes in the cluster can have egress to the Internet through a NAT gateway.
All AWS services listed previously are required except GPU compute instances, which are optional.
Your annual Domino license fee will not include any charges incurred from using AWS services. You can find detailed pricing information for the Amazon services listed above at https://aws.amazon.com/pricing.
This section describes how to configure an Amazon EKS cluster for use with Domino. You must be familiar with the following AWS services:
-
Elastic Kubernetes Service (EKS)
-
Identity and Access Management (IAM)
-
Virtual Private Cloud (VPC) Networking
-
Elastic Block Store (EBS)
-
Elastic File System (EFS)
-
S3 Object Storage
Additionally, a basic understanding of Kubernetes concepts like node pools, network CNI, storage classes, autoscaling, and Docker are useful when deploying the cluster.
Security considerations
You must create IAM policies in the AWS console to provision an EKS cluster. Domino recommends that you grant the least privilege when you create IAM policies. Grant elevated privileges when necessary. See information about the grant least privilege concept.
Service quotas
Amazon maintains default service quotas for each of the services listed previously. Log in to the AWS Service Quotas console to check the default service quotas and manage your quotas.
VPC networking
If you plan to do VPC peering or set up a site-to-site VPN connection to connect your cluster to other resources like data sources or authentication services, to configure your cluster VPC accordingly to avoid address space collisions.
Namespaces
You do not have to configure namespaces prior to install. Domino will create the following namespaces in the cluster during installation, according to the following specifications:
Namespace | Contains |
---|---|
| Durable Domino application, metadata, platform services required for platform operation |
| Ephemeral Domino execution pods launched by user actions in the application |
| Domino installation metadata and secrets |
Node pools
The EKS cluster must have at least two ASGs that produce worker nodes with the following specifications and distinct node labels, and it might include an optional GPU pool:
Pool | Min-Max | Instance | Disk | Labels |
---|---|---|---|---|
| 4-6 | m5.2xlarge | 128G |
|
| 1-20 | m5.2xlarge | 400G |
|
Optional: | 0-5 | p3.2xlarge | 400G |
|
The platform
ASG can run in one availability zone or across three availability zones.
If you want Domino to run with some components deployed as highly available ReplicaSets you must use three availability zones.
Using two zones is not supported, as it results in an even number of nodes in a single failure domain.
All compute node pools you use must have corresponding ASGs in any AZ used by other node pools.
If you set up an isolated node pool in one zone, you might encounter volume affinity issues.
To run the default
and default-gpu
pools across multiple availability zones, you must duplicate ASGs in each zone with the same configuration, including the same labels, to ensure pods are delivered to the zone where the required ephemeral volumes are available.
To get suitable drivers onto GPU nodes, use the EKS-optimized AMI distributed by Amazon as the machine image for the GPU node pool.
You can add ASGs with distinct dominodatalab.com/node-pool
labels to make other instance types available for Domino executions.
See Manage Compute Resources to learn how these different node types are referenced by label from the Domino application.
Network plugin
Domino relies on Kubernetes network policies to manage secure communication between pods in the cluster.
The network plugin implements network policies, so your cluster must use a networking solution that supports NetworkPolicy
, such as Calico.
See the AWS documentation about installing Calico for your EKS cluster.
If you use the Amazon VPC CNI for networking, with only NetworkPolicy enforcement components of Calico, ensure the subnets you use for your cluster have CIDR ranges of sufficient size, as every deployed pod in the cluster will be assigned an elastic network interface and consume a subnet address. Domino recommends at least a /23 CIDR for the cluster.
Dynamic block storage
The EKS cluster must be equipped with an EBS-backed storage class that Domino will use to provision ephemeral volumes for user execution. GP2 and GP3 volume types are supported. See the following for an example storage class specification:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: dominodisk-gp3
parameters:
type: gp3
provisioner: ebs.csi.aws.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
When using GP3, your IAM policy must allow additional permissions to operate on these ephemeral volumes. Use this example IAM policy as a reference.
Datasets storage
To store Datasets in Domino, you must configure an EFS (Elastic File System). You must provision the EFS file system and configure an access point to allow access from the EKS cluster.
Configure the access point with the following key parameters, also shown in the following image.
-
Root directory path:
/domino
-
User ID:
0
-
Group ID:
0
-
Owner user ID:
0
-
Owner group ID:
0
-
Root permissions:
777
Record the file system and access point IDs for use when you install Domino.
Blob storage
When running in EKS, Domino can use Amazon S3 for durable object storage.
Create the following S3 buckets:
-
One bucket for user data
-
One bucket for internal Docker registry
-
One bucket for logs
-
One bucket for backups
Configure each bucket to permit read and write access from the EKS cluster. This means that you must apply an IAM policy to the nodes in the cluster like the following:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetBucketLocation",
"s3:ListBucketMultipartUploads"
],
"Resource": [
"arn:aws:s3:::$your-logs-bucket-name",
"arn:aws:s3:::$your-backups-bucket-name",
"arn:aws:s3:::$your-user-data-bucket-name",
"arn:aws:s3:::$your-registry-bucket-name"
]
},
{
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:ListMultipartUploadParts",
"s3:AbortMultipartUpload"
],
"Resource": [
"arn:aws:s3:::$your-logs-bucket-name/*",
"arn:aws:s3:::$your-backups-bucket-name/*",
"arn:aws:s3:::$your-user-data-bucket-name/*",
"arn:aws:s3:::$your-registry-bucket-name/*"
]
}
]
}
Record the names of these buckets for use when you install Domino.
Autoscaler access
If you intend to deploy the Kubernetes Cluster Autoscaler in your cluster, the instance profile used by your platform nodes must have the necessary AWS Auto Scaling permissions.
See the following example policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeInstanceTypes"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
See the following for a sample YAML configuration file you can use with eksctl, the official EKS command line tool, to create a Domino-compatible cluster.
After creating a cluster with this configuration, you must still create the EFS and S3 storage systems and configure them for access from the cluster as described previously.
# $LOCAL_DIR/cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: domino-test-cluster
region: us-west-2
nodeGroups:
- name: domino-platform
instanceType: m5.2xlarge
minSize: 3
maxSize: 3
desiredCapacity: 3
volumeSize: 128
availabilityZones: ["us-west-2a"]
labels:
"dominodatalab.com/node-pool": "platform"
tags:
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
- name: domino-default
instanceType: m5.2xlarge
minSize: 0
maxSize: 10
desiredCapacity: 1
volumeSize: 400
availabilityZones: ["us-west-2a"]
labels:
"dominodatalab.com/node-pool": "default"
"domino/build-node": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
"k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
preBootstrapCommands:
- "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
- "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' > /etc/docker/jq_script"
- "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
- "systemctl restart docker"
- name: domino-gpu
instanceType: p2.8xlarge
minSize: 0
maxSize: 5
volumeSize: 400
availabilityZones: ["us-west-2a"]
ami:
ami-0ad9a8dc09680cfc2
labels:
"dominodatalab.com/node-pool": "default-gpu"
"nvidia.com/gpu": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c"]
See Install Configuration Reference for more information about autodiscovery.
The following shows a sample YAML configuration file to use with eksctl, the official EKS command line tool, to create a Domino-compatible cluster spanning multiple availability zones. To avoid issues with execution volume affinity, create duplicate groups in each AZ.
# $LOCAL_DIR/cluster.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
name: domino-test-cluster
region: us-west-2
nodeGroups:
- name: domino-platform-a
instanceType: m5.2xlarge
minSize: 1
maxSize: 3
desiredCapacity: 1
volumeSize: 128
availabilityZones: ["us-west-2a"]
labels:
"dominodatalab.com/node-pool": "platform"
tags:
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
- name: domino-platform-b
instanceType: m5.2xlarge
minSize: 1
maxSize: 3
desiredCapacity: 1
volumeSize: 128
availabilityZones: ["us-west-2b"]
labels:
"dominodatalab.com/node-pool": "platform"
tags:
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
- name: domino-platform-c
instanceType: m5.2xlarge
minSize: 1
maxSize: 3
desiredCapacity: 1
volumeSize: 128
availabilityZones: ["us-west-2c"]
labels:
"dominodatalab.com/node-pool": "platform"
tags:
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
- name: domino-default-a
instanceType: m5.2xlarge
minSize: 0
maxSize: 3
volumeSize: 400
availabilityZones: ["us-west-2a"]
labels:
"dominodatalab.com/node-pool": "default"
"domino/build-node": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
"k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
preBootstrapCommands:
- "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
- "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' > /etc/docker/jq_script"
- "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
- "systemctl restart docker"
- name: domino-default-b
instanceType: m5.2xlarge
minSize: 0
maxSize: 3
volumeSize: 400
availabilityZones: ["us-west-2b"]
labels:
"dominodatalab.com/node-pool": "default"
"domino/build-node": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
"k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
preBootstrapCommands:
- "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
- "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' > /etc/docker/jq_script"
- "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
- "systemctl restart docker"
- name: domino-default-c
instanceType: m5.2xlarge
minSize: 0
maxSize: 3
volumeSize: 400
availabilityZones: ["us-west-2c"]
labels:
"dominodatalab.com/node-pool": "default"
"domino/build-node": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default"
"k8s.io/cluster-autoscaler/node-template/label/domino/build-node": "true"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
preBootstrapCommands:
- "cp /etc/docker/daemon.json /etc/docker/daemon_backup.json"
- "echo -e '.bridge=\"docker0\" | .\"live-restore\"=false' > /etc/docker/jq_script"
- "jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json"
- "systemctl restart docker"
- name: domino-gpu-a
instanceType: p2.8xlarge
minSize: 0
maxSize: 2
volumeSize: 400
availabilityZones: ["us-west-2a"]
ami:
ami-0ad9a8dc09680cfc2
labels:
"dominodatalab.com/node-pool": "default-gpu"
"nvidia.com/gpu": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
- name: domino-gpu-b
instanceType: p2.8xlarge
minSize: 0
maxSize: 2
volumeSize: 400
availabilityZones: ["us-west-2b"]
ami:
ami-0ad9a8dc09680cfc2
labels:
"dominodatalab.com/node-pool": "default-gpu"
"nvidia.com/gpu": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
- name: domino-gpu-c
instanceType: p2.8xlarge
minSize: 0
maxSize: 2
volumeSize: 400
availabilityZones: ["us-west-2c"]
ami:
ami-0ad9a8dc09680cfc2
labels:
"dominodatalab.com/node-pool": "default-gpu"
"nvidia.com/gpu": "true"
tags:
"k8s.io/cluster-autoscaler/node-template/label/dominodatalab.com/node-pool": "default-gpu"
"k8s.io/cluster-autoscaler/enabled": "true" #Optional for autodiscovery
"k8s.io/cluster-autoscaler/{{ cluster_name }}": "owned" #Optional for autodiscovery <insert your cluster_name>
availabilityZones: ["us-west-2a", "us-west-2b", "us-west-2c"]
See Install Configuration Reference for more information about autodiscovery.