domino logo
About DominoArchitecture
Kubernetes
Cluster RequirementsDomino on EKSDomino Kubernetes Version CompatibilityDomino on GKEDomino on AKSDomino on OpenShiftNVIDIA DGX in DominoDomino in Multi-Tenant Kubernetes ClusterEncryption in Transit
Installation
Installation ProcessConfiguration ReferenceInstaller Configuration ExamplesPrivate or Offline Installationfleetcommand-agent release notes
Azure Deployments
Prepare for InstallationProvision Infrastructure and Runtime EnvironmentDeploy Domino
Google Cloud Deployments
Prepare for InstallationProvision Infrastructure and Runtime EnvironmentDeploy Domino
Amazon Web Services Deployments
Prepare for InstallationProvision Infrastructure and Runtime EnvironmentDeploy Domino
Configuration
Central ConfigurationNotificationsChange The Default Project For New UsersProject Stage ConfigurationDomino Integration With Atlassian Jira
Compute
Manage Domino Compute ResourcesHardware Tier Best PracticesModel Resource QuotasPersistent Volume ManagementAdding a Node Pool to your Domino ClusterRemove a Node from Service
Keycloak Authentication Service
Operations
Domino Application LoggingDomino MonitoringSizing Infrastructure for Domino
Data Management
Data in DominoData Flow In DominoExternal Data VolumesDatasets AdministrationSubmit GDPR Requests
User Management
RolesView User InformationRun a User Activity ReportSchedule a User Activity Report
Environments
Environment Management Best PracticesCache Environment Images in EKS
Backup and Restore
Backup StructureBackup LocationCustomize BackupsRun a Manual, On-Demand BackupRestore backups
Control Center
Control Center OverviewExport Control Center Data with The API
domino logo
About Domino
Domino Data LabKnowledge BaseData Science BlogTraining
Admin Guide
>
Data Management
>
Data in Domino

Data in Domino

This topic describes how Domino stores and handles data that users upload, import, or create in Domino. The following systems store user data in Domino:

  • Domino project files

  • Domino Datasets

Additionally, Domino supports connecting to many external data stores. Users can import data from external stores into Domino, export data from Domino to external stores, or run code in Domino that reads and writes from external stores without saving data in Domino itself.

About Domino project files

How is the data in project files stored?

Work in Domino happens in projects. Every Domino project has a corresponding collection of project files. While at rest, project files are stored in a durable object storage system, referred to as the Domino Blob Store.

Domino has native support for backing the Domino Blob Store with the following cloud storage services:

  • Amazon S3

  • Azure File Storage

  • Google Cloud Storage

Alternatively, the Domino Blob Store can be backed with a shared Kubernetes Persistent Volume from a compatible storage class. You can provide an NFS storage service, and Domino installation utilities can deploy the nfs-client-provisioner and configure a compatible storage class backed by the provided NFS system.

Is project file data encrypted?

Domino supports server-side encryption with customer-provided keys (SSE-C) for Amazon S3.

Domino supports EBS file system encryption using the industry-standard AES-256 algorithm on Elastic Block Store.

Domino also supports default encryption keys for:

  • Amazon S3

  • Azure File Storage

  • Google Cloud Filestore

Domino does not provide pre-write encryption for nfs-client-provisioner volumes.

How does data get stored in project files?

When a user starts a run in Domino, the files from his or her project are fetched from the Domino Blob Store and loaded into the run in the working directory of the Domino service filesystem. When the Run finishes, or the user initiates a manual sync in an interactive Workspace session, any changes to the contents of the working directory are written back to Domino as a new revision of the project files. Domino’s versioning system tracks file-level changes and can provide rich file difference information between revisions.

Domino also has several features that provide users with easy paths to quickly initiating a file sync. The following events in Domino can trigger a file sync, and the subsequent creation of a new revision of a project’s files.

  • User uploads files from the Domino web application upload interface.

  • User authors or edits a file in the Domino web application file editor.

  • User syncs their local files to Domino from the Domino Command Line Interface.

  • User uploads files to Domino through the Domino API.

  • User executes code in a Domino Job that writes files to the working directory.

  • User writes files to the working directory during an interactive Workspace session, and then initiates a manual sync or chooses to commit those files when the session finishes.

By default, all revisions of project files that Domino creates are kept indefinitely, since project files are a component in the Domino Reproducibility Engine. You can always return to and work with past revisions of project files, except for files that have been subjected to a full delete by a system administrator.

Who can access the data in project files?

Users can read and write files to the projects they create, on which they automatically are granted an Owner role. Owners can add collaborators to their projects with the following additional roles and associated files permissions.

The permissions available to each role are described in more detail in Sharing and collaboration.

Users can also inherit roles from membership in Domino Organizations.

Domino users with some administrative system roles are granted additional access to project files across the Domino deployment they administer. Learn more in Roles.

About Domino Datasets

How is the data in Domino Datasets stored?

When users have large quantities of data, including collections of many files and large individual files, Domino recommends storing the data in a Domino Dataset. Datasets are collections of Snapshots, where each Snapshot is an immutable image of a filesystem directory from the time when the Snapshot was created. These directories are stored in a network filesystem managed by Kubernetes as a shared Persistent Volume.

Domino has native support for backing Domino Datasets with the following cloud storage services:

  • Amazon EFS

  • Azure File Storage

  • Google Cloud Filestore

Alternatively, the Domino Blob Store can be backed with a shared Kubernetes Persistent Volume from a compatible storage class. You can provide an NFS storage service, and Domino installation utilities can deploy the nfs-client-provisioner and configure a compatible storage class backed by the provided NFS system.

Each Snapshot of a Domino Dataset is an independent state, and its membership in a Dataset is an organizational convenience for working on, sharing, and permissioning related data. Domino supports running scheduled Jobs that create Snapshots, enabling users to write or import data into a Dataset as part of an ongoing pipeline.

Dataset Snapshots can be permanently deleted by Domino system administrators. Snapshot deletion is designed as a two-step process to avoid data loss, where users mark Snapshots they believe can be deleted, and admins then confirm the deletion if appropriate. This permanent deletion capability makes Datasets the right choice for storing data in Domino that has regulatory requirements for expiration.

Who can access the data in Domino Datasets?

Datasets in Domino belong to projects, and access is afforded accordingly to users who have been granted roles on the containing project.

The permissions available to each role are described in more detail in Sharing and collaboration.

Users can also inherit roles from membership in Domino Organizations. Learn more in the Organizations overview.

Domino users with administrative system roles are granted additional access to Datasets across the Domino deployment they administer. Learn more in Roles.

Integrate Domino with other data stores and databases

Domino can be configured to connect to external data stores and databases. This process involves loading the required client software and drivers for the external service into a Domino environment, and loading any credentials or connection details into Domino environment variables. Users can then interact with the external service in their Runs.

Users can import data from the external service into their project files by writing the data to the working directory of the Domino service filesystem, and they can write data from the external service to Dataset Snapshots. Alternatively, You can construct workflows in Domino that save no data to Domino itself, but instead pull data from an external service, do work on the data, then push it to an external service.

Learn more in the Data sources overview and Connect to External Data.

Track and audit data interactions in Domino

You can set up audit logs for user activity in the platform. These logs record events whenever users:

  • Create files

  • Edit files

  • Upload files

  • View files

  • Sync file changes from a Run

  • Mount Dataset Snapshots

  • Write Dataset Snapshots

This list is not exhaustive, and will expand as Domino adds new features and capabilities.

Can contact support@dominodatalab.com for assistance enabling, accessing, and processing these logs.

Domino Data LabKnowledge BaseData Science BlogTraining
Copyright © 2022 Domino Data Lab. All rights reserved.