Domino Data Importer





Overview

The Data Importer is a containerized tool that can be used to load data from a source Domino installation into a different target Domino installation. It can be used to load backups into a recovery environment, or to migrate data from one environment to another. The Data Importer is capable of loading all of the critical data stores that are automatically backed up by Domino:

  • Projects
  • Logs
  • Docker registry
  • Datasets
  • MongoDB
  • Domino Git
  • Postgres

The Data Importer also has functionality to perform necessary transformations and schema changes when migrating data across different Domino versions.

For migrations, the Data Importer will do incremental transfers. It can be run multiple times on the same source and target installations, and will synchronize them.




Configuration

The Data Importer is controlled by a YAML configuration file called importer.yaml. See the bottom of this page for a complete example, and see below for detailed schema information.

Throughout this document, references to “source Domino” will mean the Domino installation that the data originated from, and “target Domino” is the installation you want to load the data into.

domino

The domino object contains configuration and credential information for the target Domino.

Key Type Example Description
install_configmap String fleetcommand-agent-config Name of the Kubernetes configmap that stores the target Domino installer configuration. This will exist in the default namespace.
install_secret String credential-store-domino-platform Name of the Kubernetes secret that stores credentials for the target Domino. This will exist in the system namespace.
system_namespace String domino-system Name of the Kubernetes namespace that contains Domino system resources. This name may be domino-system or $NAME-system.

configData

The configData object describes connection information to the source Domino, and specifies the method for import operations.


configData.remote_ssh

The remote_ssh object defines how the Data Importer will connect to the source Domino. This is only necessary for legacy Domino installs (Domino version < 4.0). The values used here will be used for all migrations that need remote SSH unless an explicit override is added to the migration configuration.

Key Type Example Description
bastion_host String hostname domino-bastion.domain.com Hostname of the bastion host in the source Domino. This is a machine from which you can SSH to the rest of the Domino infrastructure.
bastion_user String ubuntu User on the bastion_host that can be authenticated via SSH.
ssh_host String hostname domino-central.domain.com Hostname of the central host in the source Domino. This is where the Domino central server is running.
ssh_user String ubuntu User on the ssh_host that can be authenticated vis SSH.
ssh_key_path String file path /opt/sshkeys/domino Filesystem path to an SSH key provided in configData.sshKeys that can be used to SSH to the bastion_host and ssh_host.

configData.migrations

The migrations object should contain a list of migration objects that describe the data migrations to execute. These objects are treated as an _ordered_ list of migrations to perform. The services should be migrated in the following order:

  • k8s_secrets (Not required for legacy migration)
  • mongo
  • postgres (Not required for legacy migration)
  • git
  • logjam
  • blobs
  • registry
  • datasets

See the bottom of this page for complete examples.

Key Type Example Description
method Domino migration method mongo For each migration, specify one of the Domino migration methods.
name String mongo Name for the migration.
service String mongo One of mongo, git, datasets, logjam, registry, blobs, k8s_secrets, postgres
config Object with method configurations
- migrate_legacy_users: true
- reset_keycloak: true
Configuration object specific to the chosen method.

configData.sshKeys

The sshKeys object is used to provide the Data Importer with SSH keys for connecting to hosts that have data to import. When migrating from older Domino versions, it is typically necessary to supply an SSH key for the source Domino’s bastion server.

A YAML key with a given $NAME in this object will have a corresponding file written to opt/sshkeys/$NAME inside the Data Importer container with the string literal contents provided. When configuring a method that requires an ssh_host and ssh_user, you should supply the opt/sshkeys/$NAME path that points to a file containing the correct key for the target user and host.

Key Type Example Description
$NAME String literal RSA private key
|
-----BEGIN RSA PRIVATE KEY-----
$SSH_KEY_DATA
-----END RSA PRIVATE KEY-----
Each of these objects produces a file at opt/sshkeys/$NAME inside the Data Importer container with the supplied key contents.



Import methods


mongo

The mongo method uses mongorestore to load MongoDB data into the target Domino. This data can be retrieved from a legacy deployment via SSH to the central server via bastion, or the data can be loaded from a local MongoDB backup in a .tar archive, like the ones produced by Domino automated backups.

The default mode is to connect to the ssh_host, dump MongoDB data, then transfer it into the container and automatically set backup_path to point to it. If you already have a .tar backup of MongoDB data, you can mount or pull it into the Data Importer container in /opt/scratch and provide a path to it in backup_path. If there is a user-defined value in backup_path, the SSH transfer is skipped and the file at the user-defined path is used.

This process always excludes central configuration and feature flag collections.

Configuration options

Key Default Description
bastion_host Defaults to the value of configData.remote_ssh.bastion_host Overrides the bastion_host value in configData.remote_ssh for just this method.
bastion_user Defaults to the value of configData.remote_ssh.bastion_user User on the bastion_host that can be authenticated via SSH.
ssh_host Defaults to the value of configData.remote_ssh.ssh_host Hostname of the central host in the source Domino.
ssh_user Defaults to the value of configData.remote_ssh.ssh_user Overrides the ssh_user value in configData.remote_ssh for just this method.
ssh_key_path Defaults to the value of configData.remote_ssh.ssh_key_path Filesystem path to an SSH key provided in configData.sshKeys that can be used to SSH to the bastion_host and ssh_host.
ssh_port 22 Port to use for SSH to the ssh_host.
backup_path Null Filesystem path to a .tar archive in the Data Importer container with MongoDB backups to load. This should typically be a .tar you have pulled into the Data Importer container at /opt/scratch/. Supplying a value for this option overrides the normal SSH mode of Mongo data retrieval.
migrate_legacy_users False Set to True if the source Domino is running a legacy version (version < 4.0)
reset_keycloak False If set to True, this deletes all users in the target Domino prior to migration of users from the source Domino.
excluded_collections
[
“domino.config”,
“domino.feature_flag_overrides”,
“domino.scheduler_locks”,
“domino.cache”
]
This advanced option can specify MongoDB collections to exclude from migration. By default, central connfiguration settings, feature flag settings, scheduler locks, and cache collections are excluded.

s3_to_s3

This method syncs the contents of a source and destination S3 bucket. This is suitable for migrating blobs, logs, and in cases where Docker in the source Domino is backed by S3, it can also migrate registry data.

Key Default Description  
source_bucket_name None (Required) deployment1-domino-project-data Name of the S3 bucket containing the desired data from the source Domino. A user-defined bucket name must be supplied.
dest_bucket_name By default this will automatically discover and use the name of the bucket used by the chosen service in the target deployment Name of the S3 bucket to copy data into.  
access_key None AWS access key to use for access to the buckets. This is not required if the worker node running the Data Importer has an AWS instance role that grants access to both buckets.  
secret_key None AWS secret key to use for access to the buckets. This is not required if the worker node running the Data Importer has an AWS instance role that grants access to both buckets.  
session_token None Session token where needed for AWS temporary credentials.  

disk_to_s3

This method syncs the contents of a filesystem directory on a remote ssh_host machine to an S3 bucket. This is suitable for migrating blob data stored in source on-premises Domino NFS systems to target AWS Domino S3 buckets, or for migrating Docker registry data from legacy source deployments into target deployments that use an S3-backed Docker registry.

Key Default Description
bastion_host Defaults to the value of configData.remote_ssh.bastion_host Overrides the bastion_host value in configData.remote_ssh for just this method.
bastion_user Defaults to the value of configData.remote_ssh.bastion_user User on the bastion_host that can be authenticated via SSH.
ssh_host Defaults to the value of configData.remote_ssh.ssh_host Hostname of the central host in the source Domino.
ssh_user Defaults to the value of configData.remote_ssh.ssh_user Overrides the ssh_user value in configData.remote_ssh for just this method.
ssh_key_path Defaults to the value of configData.remote_ssh.ssh_key_path Filesystem path to an SSH key provided in configData.sshKeys that can be used to SSH to the bastion_host and ssh_host.
ssh_port 22 Port to use for SSH to the ssh_host.
remote_path Uses service-defined defaults Filesystem path on the remote ssh_host containing files to copy to the dest_bucket_name.
dest_bucket_name By default this will automatically discover and use the name of the bucket used by the chosen service in the target deployment Name of the S3 bucket to copy data into.
iam_access_key_id None AWS access key to use for access to the buckets. This is not required if the worker node running the Data Importer has an AWS instance role that grants access to both buckets.
iam_secret_key None AWS secret key to use for access to the buckets. This is not required if the worker node running the Data Importer has an AWS instance role that grants access to both buckets.
iam_session_token None Session token where needed for AWS temporary credentials.

tar

This method extracts and loads data from a .tar archive on a local path in the Data Importer.

Key Default Description
source_path None (Required) Provide a path to a .tar archive with the required data for the chosen service. This must be a file system path inside the Data Importer container.
dest_path Uses service-defined defaults This is the path data from the archive in the source_path is extracted to. This will default to the correct path for the chosen service.

rsync

This method transfer data from a remote host path to a local path via rsync. Both paths will use service-defined defaults, where the remote path with be the standard path to legacy data, and the local path will be a mounted volume for the correct destination service.

Key Default Description
bastion_host Defaults to the value of configData.remote_ssh.bastion_host Overrides the bastion_host value in configData.remote_ssh for just this method.
bastion_user Defaults to the value of configData.remote_ssh.bastion_user User on the bastion_host that can be authenticated via SSH.
ssh_host Defaults to the value of configData.remote_ssh.ssh_host Hostname of the central host in the source Domino.
ssh_user Defaults to the value of configData.remote_ssh.ssh_user Overrides the ssh_user value in configData.remote_ssh for just this method.
ssh_key_path Defaults to the value of configData.remote_ssh.ssh_key_path Filesystem path to an SSH key provided in configData.sshKeys that can be used to SSH to the bastion_host and ssh_host.
ssh_port 22 Port to use for SSH to the ssh_host.
remote_path Uses service-defined defaults Filesystem path to a directory on the remote host containing source data.
local_path Uses service-defined defaults Filesystem path in the local Data Importer container to sync data to.
backup_dir None If there is any existing data in the local_path, it will be erased by the migration. If you supply a local filesystem path (/opt/scratch/$SERVICE_NAME is recommended) the local data will be backed up there prior to migration.

postgres

This method ingests a local .sql backup of Postgres data and loads it into the target Domino Postgres service. This method is not necessary for legacy migrations, which do not have Postgres data.

Key Default Description
backup_path None (Required) Points to local .sql backup of Postgres data that you have mounted or pulled into the Data Importer container.

k8s_secrets

Key Default Description
backup_path None (Required) Filesystem path to a backup of the credential-store-domino-platform secret from the source Domino.



Running the importer

After composing an importer.yaml, you can install the importer tool and load the configuration by running:

helm registry upgrade quay.io/domino/helm-domino-data-importer:beta --install -f ./importer.conf \
--namespace domino-platform domino-data-importer

This will deploy the helm-domino-data-importer image into the cluster as a Kubernetes pod, and load the configuration into the running container at /opt/config/config.yaml.

Once the pod is running, you can attach to the pod by running:

kubectl -n stagename-platform attach -it domino-data-importer-0

Then from inside the container, run ./importer to execute the import.

The tool will use the config provided. There will be scratch space setu p for you in /opt/scratch, and it will persist after the pod is deleted or if the helm chart is deleted or purged. Any custom configs should go here, as should any files that need to be manually copied into the deployment, such as .tar backups.

After the Data Importer finishes all migrations, you need to restart the frontend and dispatcher services in the target Domino to index the new data in relevant systems. This can be accomplished from the administration UI by clicking Advanced > Restart Services.




Example AWS migration configuration

domino:
  install_configmap: "fleetcommand-agent-<new install stagename>"
  install_secret: "credential-store-<new install stagename>-platform"
  system_namespace: "<new install stagename>-system"
configData:
  remote_ssh:
    bastion_host: <bastion hostname>
    bastion_user: <central instance username>
    ssh_host: <central instance hostname>
    ssh_key_path: /opt/sshkeys/domino
    ssh_user: <central instance username>
  migrations:
  - method: mongo
    name: mongo
    service: mongo
    config:
      migrate_legacy_users: true # migrates users from mongo to keycloak
      reset_keycloak: true # resets keycloak migrations
      keycloak_migrations_image: quay.io/domino/keycloak-realm-migration:latest # Need to override until upgraded version gets out there
  - config:
      remote_path: /domino/git/projectrepos/
    method: rsync
    name: git
    service: git
  - method: rsync
    name: datasets
    service: datasets
  - config:
      source_bucket_name: <logs bucket name>
    method: s3_to_s3
    name: logjam
    service: logjam
  - config:
      source_bucket_name: <blobs bucket name>
    method: s3_to_s3
    name: blobs
    service: blobs
  - method: disk_to_s3
    name: registry
    service: registry
sshKeys:
  domino: |
    -----BEGIN RSA PRIVATE KEY-----
    <ssh key data here>
    -----END RSA PRIVATE KEY-----



Example AWS restore configuration

domino:
  install_configmap: "fleetcommand-agent-<new install stagename>"
  install_secret: "credential-store-<new install stagename>-platform"
  system_namespace: "<new install stagename>-system"
 migrations:
 - method: k8s_secrets
   name: secrets
   service: k8s_secrets
   config:
     backup_path: /opt/scratch/restore/k8s_secrets/secrets.yaml
 - config:
     migrate_legacy_users: false
     reset_keycloak: false
     backup_path: /opt/scratch/restore/mongo/20200307-0000.tar.gz
   method: mongo
   name: mongo
   service: mongo
 - config:
     backup_path: /opt/scratch/restore/postgres/20200307-0000.sql
   method: postgres
   name: postgres
   service: postgres
 - config:
     source_path: /opt/scratch/restore/git/20200307-0000.tar.gz
   method: tar
   name: git
   service: git
 - config:
     source_bucket_name: stagename-log-snaps
   method: s3_to_s3
   name: logjam
   service: logjam
 - config:
     source_bucket_name: stagename-blobs
   method: s3_to_s3
   name: blobs
   service: blobs
 - config:
     source_bucket_name: stagename-docker-registry
   method: s3_to_s3
   name: registry
   service: registry
 - method: rsync
   name: datasets
   service: datasets
   config:
     bastion_host: 1.2.3.4 # optional bastion host
     bastion_user: bastion_user
     ssh_host: 5.6.7.8 # host with datasets nfs mounted
     ssh_key_path: /opt/sshkeys/domino
     ssh_user: host_user
      remote_path: '/efs/datasets/mount/root' # /filecache should be inside this directory