Domino Datasets provide high-performance, versioned, and structured filesystem storage in Domino. With Domino Datasets, you can build several curated collections of data in one project, and share them with your fellow contributors across their projects.
A Domino Dataset is a collection of files that are available in user executions as a filesystem directory. A Dataset always reflects the most recent version of the data. You can modify the contents of a Dataset through the Domino UI or through workload executions at any time.
You can version the contents of a Domino Dataset by creating a Snapshot containing a read-only copy of the Dataset files at a given time. Snapshots are associated with the Dataset they version.
The following are the primary ways to interact with a Domino Dataset:
-
Work with Datasets local to your project
-
Read from a shared Dataset you have mounted to your project
Domino Datasets belong to Domino projects. Permission to read and write from a dataset is granted to project contributors, just like the behavior of project files. A Dataset that belongs to a project is considered to be local to that project.
You can have five datasets in a project, by default. See read-write datasets.
-
Click Data from the project menu, then click Create New Dataset.
-
Enter a name and optional description, then click Create Dataset.
-
Upload data in your browser. To preserve the filesystem structure of your uploads, you can drag and drop directories and subdirectories. Additionally, you can pause and resume the upload as needed.
Note
|
The browser upload is suitable for up to 50GB or 50,000 individual files. For larger uploads, Domino recommends that you use the Domino CLI for your upload. To do this, run the following command, adjusted for your dataset and file path:
For information about how to install and configure the Domino CLI, see this topic. |
You can modify the contents of a Dataset at any time. A simple way to do this is through the Domino Dataset page.
To version the contents of a Dataset, you can create a Snapshot. Snapshots are read-only, immutable states of the dataset. You can create multiple snapshots, but cannot modify existing snapshots.
-
From the Datasets page of your project, click the name of the Dataset you want to version to open its overview page.
-
Click Take Snapshot.
By default, you can create a Snapshot that will copy all files in the Dataset. Alternatively, you can select a subset of the files and folders to include in the Snapshot.
-
When prompted, initiate the dataset creation process.
You can specify a tag that can be used to mount the snapshot under a friendly name in subsequent executions. You can see a preliminary estimate of how long the snapshot creation will take based on some basic heuristics. The estimate will be refined once the process is underway.
Note
While a snapshot is in progress you can cancel it from the Dataset overview page. If you cancel a snapshot, any partial snapshot data will be automatically deleted.
From the Datasets page of your project, click the name of a dataset to open its Overview page. You can see the Dataset name and description, buttons to rename, mark for deletion, or upload files to the Dataset. You can also take a snapshot.
By default, the page shows the latest files and folders in the Dataset. If snapshots have been created, you can also select a Snapshots to toggle to a particular snapshot and examine its contents.
For a snapshot, you can perform the following actions:
-
Add Tag - Tags create a friendly path when mounting a snapshot inside executions. A Dataset owner can move a tag between different snapshots to provide a stable path to whichever snapshot holds the desired state of the data.
NoteIf more than one tag is used, the last added tag will be used for mounting purposes.
-
Mark for Deletion - When a snapshot is no longer needed, you can mark it for deletion. Such snapshots will no longer be mounted in subsequent executions. The Snapshot will be flagged to a Domino administrator as ready for deletion, but will not be fully deleted until the administrator takes an additional action to delete it.
If you no longer need the entire dataset, you can mark it for deletion. Similar to Snapshots, a Domino administrator must perform the final deletion. The primary difference is that marking a Dataset for deletion will remove not only the Dataset but also its associated snapshots.
Datasets and associated Snapshots from a project are automatically available in Domino executions (Workspaces, Jobs, Apps, and Launchers) at a predefined path that follows the conventions described below.
You no longer have to use a domino.yaml
configuration file to control mounting behavior, as in previous Domino releases.
The following configuration will demonstrate how it translates into paths that will be available in executions.
-
Dataset called
clapton
(local to the project)-
Snapshot 1 (tagged with
tag1
) -
Snapshot 2 (not tagged)
-
-
Dataset called
mingus
(local to project)-
Snapshot 1 (tagged with
tag2
) -
Snapshot 2 (not tagged)
-
-
Dataset called
ella
(shared from another project)-
Snapshot 1 (tagged with
tag3
) -
Snapshot 2 (not tagged)
-
-
Dataset called
davis
(shared from another project)-
Snapshot 1 (tagged with
tag4
) -
Snapshot 2 (not tagged)
-
Paths when using Git-based projects with CodeSync
For a Git-based project with CodeSync, the Datasets and Snapshots above will be available under the following hierarchy:
/mnt
|--/data
|--/clapton <== R/W dataset
|--/mingus <== R/W dataset
|--/snapshots <== Snapshot folder organized by dataset
|--/clapton <== RO Snapshots for clapton dataset
|--/tag1 <== Mounted under latest tag
|--/1 <== Always mounted under the snapshot ID
|--/2
|--/mingus
|--/tag2
|--/1
|--/2
|--/imported
|--/data
|--/ella <== RO shared dataset
|--/davis <== RO shared dataset
|--/snapshots <== Snapshot folder organized by dataset
|--/ella <== RO Snapshots for ella dataset
|--/tag3 <== Mounted under latest tag
|--/1 <== Always mounted under the snapshot ID
|--/2
|--/davis
|--/tag4
|--/1
|--/2
The paths for all mounted Datasets and the root for any associated snapshots can always be seen in the Settings panel inside a Workspace or when launching an execution.
Paths when using Domino File System projects
For a Domino File System Based project, the Datasets and Snapshots above will be available under the following hierarchy:
/domino
|--/datasets
|--/local <== local datasets and snapshots
|--/clapton <== R/W dataset
|--/mingus <== R/W dataset
|--/snapshots <== Snapshot folder organized by dataset
|--/clapton <== RO Snapshots for clapton dataset
|--/tag1 <== Mounted under latest tag
|--/1 <== Always mounted under the snapshot ID
|--/2
|--/mingus
|--/tag2
|--/1
|--/2
|--/ella <== RO shared dataset
|--/davis <== RO shared dataset
|--/snapshots <== Shared datasets snapshots organized by dataset
|--/ella
|--/tag3 <== RO snapshot for ella dataset
|--/1 <== Mounted under latest tag
|--/2 <== Always mounted under the snapshot ID
|--/davis
|--/tag4
|--/1
|--/2
The paths for all mounted Datasets and the root for any associated snapshots can always be seen in the Settings panel inside a Workspace or when launching an execution.
Domino 4.5+ brings several improvements to datasets. If you just upgraded from a version prior to Domino 4.5, the following information might be of particular interest.
Summary of changes
-
Datasets are now always read/write and reflect the latest version of the files. You can freely manipulate the contents of a dataset from Domino Workpaces, Jobs, Apps, and Launchers.
-
You can optionally create Read Only snapshots associated with a dataset. This is now an explicit action.
-
Datasets and associated Snapshots are automatically mounted for Domino executions.
domino.yaml
has been deprecated, and you no longer need to use it for dataset and snapshot mounting. -
Scratch spaces, which were previously meant for convenient read/write iterations are also deprecated and are replaced with a new default per project dataset.
Migration considerations
While the above improvements are significant, any datasets and snapshots created with a prior version of Domino will be migrated seamlessly according to the following rules:
-
Datasets that did not have any snapshots previously will automatically become read/write.
-
Datasets with one or more snapshots will have the most recent snapshot promoted to a dataset and will automatically become read/write.
-
domino.yaml
in existing projects will be ignored and datasets and snapshots will be mounted in executions based on the mounting rules described previously.Note -
Scratch spaces with data in them will be promoted to a dataset. The Domino username of the user who owned the scratch space will be used as the name of the Dataset. Scratch spaces that are empty at the time of upgrade will not be migrated.
-
A new dataset with the same name as the project will be automatically created.
Warning