domino logo
Tech Ecosystem
Get started with Python
Step 0: Orient yourself to DominoStep 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Get started with R
Step 0: Orient yourself to Domino (R Tutorial)Step 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Get Started with MATLAB
Step 1: Orient yourself to DominoStep 2: Create a Domino ProjectStep 3: Configure Your Domino ProjectStep 4: Start a MATLAB WorkspaceStep 5: Fetch and Save Your DataStep 6: Develop Your ModelStep 7: Clean Up Your Workspace
Step 8: Deploy Your Model
Scheduled JobsLaunchers
Step 9: Working with Domino Datasets
Domino Reference
Projects
Projects Overview
Revert Projects and Files
Revert a ProjectRevert a File
Projects PortfolioProject Goals in Domino 4+Jira Integration in DominoUpload Files to Domino using your BrowserFork and Merge ProjectsSearchSharing and CollaborationCommentsDomino Service FilesystemCompare File RevisionsArchive a Project
Advanced Project Settings
Project DependenciesProject TagsRename a ProjectSet up your Project to Ignore FilesUpload files larger than 550MBExporting Files as a Python or R PackageTransfer Project Ownership
Domino Runs
JobsDiagnostic Statistics with dominostats.jsonNotificationsResultsRun Comparison
Advanced Options for Domino Runs
Run StatesDomino Environment VariablesEnvironment Variables for Secure Credential StorageUse Apache Airflow with Domino
Scheduled Jobs
Domino Workspaces
WorkspacesUse Visual Studio Code in Domino WorkspacesPersist RStudio PreferencesAccess Multiple Hosted Applications in one Workspace SessionUse Domino Workspaces in Safari
Spark on Domino
On-Demand Spark
On-Demand Spark OverviewValidated Spark VersionConfigure PrerequisitesWork with your ClusterManage DependenciesWork with Data
External Hadoop and Spark
Hadoop and Spark OverviewConnect to a Cloudera CDH5 cluster from DominoConnect to a Hortonworks cluster from DominoConnect to a MapR cluster from DominoConnect to an Amazon EMR cluster from DominoRun Local Spark on a Domino ExecutorUse PySpark in Jupyter WorkspacesKerberos Authentication
Customize the Domino Software Environment
Environment ManagementDomino Standard EnvironmentsInstall Packages and DependenciesAdd Workspace IDEs
Advanced Options for Domino Software Environment
Install Custom Packages in Domino with Git IntegrationAdd Custom DNS Servers to Your Domino EnvironmentConfigure a Compute Environment to User Private Cran/Conda/PyPi MirrorsScala notebooksUse TensorBoard in Jupyter WorkspacesUse MATLAB as a WorkspaceCreate a SAS Data Science Workspace Environment
Publish your Work
Publish a Model API
Model Publishing OverviewModel Invocation SettingsModel Access and CollaborationModel Deployment ConfigurationPromote Projects to ProductionExport Model Image
Publish a Web Application
Cross-Origin Security in Domino web appsApp Publishing OverviewGet Started with DashGet Started with ShinyGet Started with Flask
Advanced Web Application Settings in Domino
App Scaling and PerformanceHost HTML Pages from DominoHow to Get the Domino Username of an App Viewer
Launchers
Launchers OverviewAdvanced Launcher Editor
Assets Portfolio Overview
Connect to your Data
Domino Datasets
Datasets OverviewDatasets Best PracticesAbout domino.yamlDatasets Advanced Mode TutorialDatasets Scratch SpacesConvert Legacy Data Sets to Domino Datasets
Data Sources Overview
Connect to Data Sources
External Data Volumes
Git and Domino
Git Repositories in DominoWork From a Commit ID in Git
Work with Data Best Practices
Work with Big Data in DominoWork with Lots of FilesMove Data Over a Network
Advanced User Configuration Settings
User API KeysDomino TokenOrganizations Overview
Use the Domino Command Line Interface (CLI)
Install the Domino Command Line (CLI)Domino CLI ReferenceDownload Files with the CLIForce-Restore a Local ProjectMove a Project Between Domino DeploymentsUse the Domino CLI Behind a Proxy
Browser Support
Get Help with Domino
Additional ResourcesGet Domino VersionContact Domino Technical SupportSupport Bundles
domino logo
About Domino
Domino Data LabKnowledge BaseData Science BlogTraining
User Guide
>
Domino Reference
>
Connect to your Data
>
Domino Datasets
>
Convert Legacy Data Sets to Domino Datasets

Convert Legacy Data Sets to Domino Datasets

This article describes how to convert legacy Data Sets workflows to use Domino Datasets. This is a two-step process that involves moving your data into a new Domino Dataset, and then updating all projects and artifacts that consume the data to retrieve it from the new location.

Migrate data from a legacy Data Set into a Domino Dataset

Legacy Data Sets are semantically similar to Domino Projects. If your deployment is running a version of Domino with the new Domino Datasets feature, you can create Domino Datasets inside legacy Data Sets. This will allow for a very simple migration path for a legacy Data Set, where all of the existing data is added to a single Domino Dataset owned by the legacy Data Set, and the entire file structure is preserved.

The long term deprecation plan for legacy Data Sets is to transform them into ordinary Domino Projects, which will continue to contain and share any Domino Datasets you created in them.

To get started, you need to add a script to the contents of your legacy Data Set that can transfer all of your data into a Domino Dataset output mount. From the Files page of your legacy Data Set, click Add File:

Screen_Shot_2019-01-16_at_10.29.33_AM.png

Name the file migrate.sh, and paste in the example command provided below.

cp -R $DOMINO_WORKING_DIR/. /domino/datasets/output/main

This example migration script copies the contents of $DOMINO_WORKING_DIR`, which is a default <code>Domino environment variable that always points to the root of your project, to a Domino Dataset output mount path. The directory named `main in the path below is derived from the name of the Domino Dataset that will be created to store the files from this legacy Data Set.

Click Save when finished. Your script should look like this:

Screen_Shot_2019-01-16_at_10.36.12_AM.png

Next, click Datasets from the project menu, then click Create New Dataset.

Screen_Shot_2019-01-16_at_10.39.08_AM.png

Be sure to name this Dataset to match the path to the output mount in the migration script. If you copied the command above and added it to your script without modification, you should name this Dataset main. You can supply an optional description, then click Upload Contents. On the upload page, click to expand the Create by Running Script section.

Screen_Shot_2019-02-21_at_7.25.32_AM.png

Double-check to make sure the listed Output Directory matches the path from your migration script, then enter the name of your script and click Start. A Job will be launched that mounts the new Dataset for output and executes your script. If the Job finishes successfully, you can return to the Datasets page from the project menu and click the name of your new Dataset to see its contents.

Screen_Shot_2019-01-16_at_10.51.42_AM.png

You now have all of the data from your legacy Data Set loaded into a Domino Dataset. This method preserves the file structure of the legacy Data Set, which is useful for the next step: updating consumers to use the new Dataset.

Update data consumers to use the new Domino Dataset

Potential consumers of your legacy Data Set are those users to whom you granted Project Importer, Results Consumer, or Contributor permissions. As the project Owner, you also may have other projects consuming the contents of your legacy Data Set. This same set of permissions will grant access to your new Domino Dataset.

A project consuming data from your legacy Data Set will import it as a project dependency, and it will be visible on the Other Projects tab of the Files page.

Screen_Shot_2019-01-16_at_11.16.28_AM.png

In the example above, the global-power project imports the data-quick-start legacy Data Set. The contents of data-quick-start are then available in global-power Runs and Workspaces at the path shown in the Location column. Anywhere your code for batch Runs, scheduled Runs, or Apps refers to that path will need to be updated to point to the new Domino Dataset.

To determine the new path and set up access to the Domino Dataset, you need to mount the Dataset. With the consuming project open, click Datasets from the project menu, then click Mount Shared Dataset. The Dataset to Mount field is a dropdown menu that will show shared Datasets you have access to. In the above example, the main Dataset from the data-quick-start project will be mounted at the latest snapshot. Select the Dataset that you migrated your data into earlier, then click Mount.

Screen_Shot_2019-01-16_at_11.22.08_AM.png

When finished, you will see the Dataset you added listed under Shared Datasets. The Path column shows the path where the contents of the Dataset will be mounted in this project’s Runs and Workspaces.

Screen_Shot_2019-01-16_at_11.25.05_AM.png

Remember that if you used the migration script shown earlier, the file structure at that path will be identical to the file structure of the imported legacy Data Set location. All you need to do to access the same data is change the path to this new Domino Dataset mount.

Be sure to contact other users who are consuming your legacy Data Set and provide them with information about the new Domino Dataset.

Domino Data LabKnowledge BaseData Science BlogTraining
Copyright © 2022 Domino Data Lab. All rights reserved.