domino logo
Tech Ecosystem
Get started with Python
Step 0: Orient yourself to DominoStep 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Get started with R
Step 0: Orient yourself to Domino (R Tutorial)Step 1: Create a projectStep 2: Configure your projectStep 3: Start a workspaceStep 4: Get your files and dataStep 5: Develop your modelStep 6: Clean up WorkspacesStep 7: Deploy your model
Get Started with MATLAB
Step 1: Orient yourself to DominoStep 2: Create a Domino ProjectStep 3: Configure Your Domino ProjectStep 4: Start a MATLAB WorkspaceStep 5: Fetch and Save Your DataStep 6: Develop Your ModelStep 7: Clean Up Your Workspace
Step 8: Deploy Your Model
Scheduled JobsLaunchers
Step 9: Working with Domino Datasets
Domino Reference
Notifications
On-Demand Open MPI
Configure MPI PrerequisitesFile Sync MPI ClustersValidate MPI VersionWork with your ClusterManage Dependencies
Projects
Projects OverviewProjects PortfolioReference ProjectsProject Goals in Domino 4+
Git Integration
Git Repositories in DominoGit-based ProjectsWorking from a Commit ID in Git
Jira Integration in DominoUpload Files to Domino using your BrowserFork and Merge ProjectsSearchSharing and CollaborationCommentsDomino File SystemCompare File RevisionsArchive a Project
Advanced Project Settings
Project DependenciesProject TagsRename a ProjectSet up your Project to Ignore FilesUpload files larger than 550MBExporting Files as a Python or R PackageTransfer Project Ownership
Domino Runs
JobsDiagnostic Statistics with dominostats.jsonNotificationsResultsRun Comparison
Advanced Options for Domino Runs
Run StatesDomino Environment VariablesEnvironment Variables for Secure Credential StorageUse Apache Airflow with Domino
Scheduled Jobs
Domino Workspaces
WorkspacesUse Git in Your WorkspaceRecreate A Workspace From A Previous CommitUse Visual Studio Code in Domino WorkspacesPersist RStudio PreferencesAccess Multiple Hosted Applications in one Workspace Session
Spark on Domino
On-Demand Spark
On-Demand Spark OverviewValidated Spark VersionConfigure PrerequisitesWork with your ClusterManage DependenciesWork with Data
External Hadoop and Spark
Hadoop and Spark OverviewConnect to a Cloudera CDH5 cluster from DominoConnect to a Hortonworks cluster from DominoConnect to a MapR cluster from DominoConnect to an Amazon EMR cluster from DominoRun Local Spark on a Domino ExecutorUse PySpark in Jupyter WorkspacesKerberos Authentication
On-Demand Ray
On-Demand Ray OverviewValidated Ray VersionConfigure PrerequisitesWork with your ClusterManage DependenciesWork with Data
On-Demand Dask
On-Demand Dask OverviewValidated Dask VersionConfigure PrerequisitesWork with Your ClusterManage DependenciesWork with Data
Customize the Domino Software Environment
Environment ManagementDomino Standard EnvironmentsInstall Packages and DependenciesAdd Workspace IDEsAdding Jupyter Kernels
Use Custom Images as a Compute Environment
Pre-requisites for Automatic Custom Image CompatibilityModify the Default Workspace ToolsCreate a Domino Image with an NGC ContainerCreate a Domino Environment with a Pre-Built ImageManually Modify Images for Domino Compatibility
Partner Environments for Domino
Use MATLAB as a WorkspaceUse Stata as a WorkspaceUse SAS as a Workspace
Advanced Options for Domino Software Environment
Publish in Domino with Custom ImagesInstall Custom Packages in Domino with Git IntegrationAdd Custom DNS Servers to Your Domino EnvironmentConfigure a Compute Environment to User Private Cran/Conda/PyPi MirrorsUse TensorBoard in Jupyter Workspaces
Publish your Work
Publish a Model API
Model Publishing OverviewModel Invocation SettingsModel Access and CollaborationModel Deployment ConfigurationPromote Projects to ProductionExport Model ImageExport to NVIDIA Fleet Command
Publish a Web Application
App Publishing OverviewGet Started with DashGet Started with ShinyGet Started with FlaskContent Security Policies for Web Apps
Advanced Web Application Settings in Domino
App Scaling and PerformanceHost HTML Pages from DominoHow to Get the Domino Username of an App Viewer
Launchers
Launchers OverviewAdvanced Launcher Editor
Manage Externally-Hosted Models
Model Requirements
Use Domino's REST API to Export a Model
Export Model ImageExport to NVIDIA Fleet Command
Create an ExportCheck the Status of an ExportPush a New VersionSet up Monitoring for an ExportArchive an ExportView Monitoring StatusTroubleshooting
Assets Portfolio Overview
Model Monitoring and Remediation
Monitor WorkflowsData Drift and Quality Monitoring
Set up Monitoring for Model APIs
Set up Prediction CaptureSet up Drift DetectionSet up Model Quality MonitoringSet up NotificationsSet Scheduled ChecksSet up Cohort Analysis
Set up Model Monitor
Connect a Data SourceRegister a ModelSet up Drift DetectionSet up Model Quality MonitoringSet up Cohort AnalysisSet up NotificationsSet Scheduled ChecksUnregister a Model
Use Monitoring
Access the Monitor DashboardAnalyze Data DriftAnalyze Model QualityExclude Features from Scheduled Checks
Remediation
Cohort Analysis
Review the Cohort Analysis
Remediate a Model API
Monitor Settings
API TokenHealth DashboardNotification ChannelsTest Defaults
Monitoring Config JSON
Supported Binning Methods
Model Monitoring APIsTroubleshoot the Model Monitor
Connect to your Data
Data in Domino
Datasets OverviewProject FilesDatasets Best Practices
Connect to Data Sources
External Data VolumesDomino Data Sources
Connect to External Data
Connect to Amazon S3 from DominoConnect to Azure Data Lake StorageConnect to BigQueryConnect to DataRobotConnect to Generic S3 from DominoConnect to Google Cloud StorageConnect to IBM DB2Connect to IBM NetezzaConnect to ImpalaConnect to MSSQLConnect to MySQLConnect to OkeraConnect to Oracle DatabaseConnect to PostgreSQLConnect to RedshiftConnect to Snowflake from DominoConnect to Teradata
Work with Data Best Practices
Work with Big Data in DominoWork with Lots of FilesMove Data Over a Network
Advanced User Configuration Settings
User API KeysDomino TokenOrganizations Overview
Use the Domino Command Line Interface (CLI)
Install the Domino Command Line (CLI)Domino CLI ReferenceDownload Files with the CLIForce-Restore a Local ProjectMove a Project Between Domino DeploymentsUse the Domino CLI Behind a Proxy
Browser Support
Get Help with Domino
Additional ResourcesGet Domino VersionContact Domino Technical SupportSupport Bundles
domino logo
About Domino
Domino Data LabKnowledge BaseData Science BlogTraining
User Guide
>
Domino Reference
>
Connect to your Data
>
Connect to Data Sources
>
Connect to External Data
>
Connect to Amazon S3 from Domino

Connect to Amazon S3 from Domino

This topic describes how to connect to Amazon Simple Storage Service (S3) from Domino.

Amazon S3 is a cloud object store available as a service from AWS.

Domino recommends that you use a Domino data source to connect to an Amazon S3 instance from Domino.

Create an Amazon S3 data source

  1. From the navigation pane, click Data.

  2. Click Create a Data Source.

  3. In the New Data Source window, from Select Data Store, select Amazon S3.

    amazon s3 ds
  4. Enter the Bucket.

  5. Enter the Region.

  6. Enter the Data Source Name.

  7. Optional: Enter a Description to explain the purpose of the data source to others.

  8. Click Next.

  9. Enter the credentials to authenticate to S3.

    By default, Domino supports basic authentication; the Domino secret store backed by HashiCorp Vault securely stores the credentials. If your administrator enabled it, IAM credential propagation might be available.

    Note
  10. Click Test Credentials.

  11. If the data source authenticates, click Next.

  12. Select who can view and use the data source in projects.

  13. Click Finish Setup.

If your users have Domino permissions to use the data source and enter their credentials, they can now use the Domino Data API to retrieve data with the connector.

See Retrieve data for more information.

Alternate way to connect to a Amazon S3 data source

Warning

Use one of the following methods to authenticate with S3 from Domino.

Both follow the common naming convention of environment variables for AWS packages so you don’t have to explicitly reference credentials in your code.

Setup credentials

  1. Use a short-lived credential file obtained through Domino’s AWS Credential Propagation feature.

    After your administrator configures this feature, Domino automatically populates any run or job with your AWS credentials file. These credentials will be periodically refreshed throughout the duration of the workspace to make sure they don’t expire.

    Following common AWS conventions, you will see an environment variable AWS_SHARED_CREDENTIALS_FILE which contains the location to your credential files which will be stored at /var/lib/domino/home/.aws/credentials.

    Learn more about using a credential file with the AWS SDK.

  2. Store AWS your access keys securely as environment variables.

    To connect to the S3 buckets to which your AWS account has access, enter your AWS Access Key and AWS Secret Key to the AWS CLI. By default, AWS utilities look for these in your environment variables.

    Set the following as Domino environment variables on your user account:

    • AWS_ACCESS_KEY_ID

    • AWS_SECRET_ACCESS_KEY

    See Environment Variables for Secure Credential Storage to learn more about Domino environment variables.

Get a file from an S3-hosted public path

  1. If you have files in S3 that are set to allow public read access, use Wget from the OS shell of a Domino executor to fetch the files. The request for those files will look similar to the following:

    wget https://s3-<region>.amazonaws.com/<bucket-name>/<filename>

    This method is simple, but doesn’t require authentication or authorization. Do not use this method with sensitive data.

AWS CLI

  1. Use the AWS CLI for a secure method to read S3 from the OS shell of a Domino executor. Making the AWS CLI work from your executor, install it in your environment and enter your credentials.

  2. Get the AWS CLI as a Python package from pip.

  3. Use the following Dockerfile instruction to install the CLI and automatically add it to your system PATH. You must have pip installed.

    USER root
    RUN pip install awscli --upgrade
    USER ubuntu
  4. After your Domino environment and credentials are set up, fetch the contents of an S3 bucket to your current directory by running:

    aws s3 sync s3://<bucket-name> .
  5. If you are using an AWS credential file with multiple profiles, you might need to specify the profile. (The default profile is used if none is specified.)

    aws s3 sync s3://<bucket-name> . --profile <profile name>

    See the official AWS CLI documentation on S3 for more commands and options.

Python and boto3

  1. To interact with AWS services from Python, Domino recommends boto3.

  2. If you’re using a Domino standard environment, boto3 will already be installed. If you want to add boto3 to an environment, use the following Dockerfile instruction.

    This instruction assumes you already have pip installed.

    USER root
    RUN pip install boto3
    USER ubuntu
  3. To interact with S3 from boto3, see the official documentation. The following is an example for downloading a file where:

    • You have set up your credentials as instructed above === your account has access to an S3 bucket named *my_bucket === the bucket contains an object named *some_data.csv

      import boto3
      import io
      import pandas as pd
      
      # create new S3 client
      client = boto3.client('s3')
      
      # download some_data.csv from my_bucket and write to ./some_data.csv locally
      file = client.download_file('my_bucket', 'some_data.csv', './some_data.csv')

      Alternatively, for users using a credential file.

      import boto3
      
      #Specify your profile if you are credential file contains multiple profiles
      session = boto3.Session(profile_name='<profile name>')
      
      #Specify your bucket name
      users_bucket = session.resource('s3').Bucket('my_bucket')
      
      # 'list' bucket should succeed
      for obj in users_bucket.objects.all():
         print(obj.key)
      
      #download a file
      users_bucket.download_file('some_data.csv', './some_data.csv')

      This code does not provide credentials as arguments to the client constructor, since it assumes:

    • Credentials will be automatically populated at /var/lib/domino/home/.aws/credentials as specified in the environment variable AWS_SHARED_CREDENTIALS_FILE

    • You have already set up credentials in the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables.

After running the previous code, a local copy of _some_data.csv_ exists in the same directory as your Python script or notebook. You can load the data into a pandas dataframe.

df = pd.read_csv('some_data.csv')

See part 1 of the Get Started (Python) tutorial for a more detailed example of working with CSV data in Python.

R and aws.s3

  1. To with S3 from R, Domino recommends the cloudyr project’s package called aws.s3.

  2. If you’re using a Domino standard environment, aws.s3 will already be installed. To add aws.s3 to an environment, use the following Dockerfile instructions.

    USER root
    
    RUN R -e 'install.packages(c("httr","xml2"), repos="https://cran.r-project.org")'
    RUN R -e 'install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))'
    
    USER ubuntu
  3. For basic instructions about using aws.s3 see the package README. The following is an example for downloading a file where:

    • You have set up the environment variables with credentials for your AWS account === your account has access to an S3 bucket named *my_bucket === the bucket contains an object named *some_data.csv

    # load the package
    library("aws.s3")
    
    #If you are using a credential file and that files has multiple profiles. Otherwise, this can be excluded.
    Sys.setenv("AWS_PROFILE" = "<AWS profile>")
    
    # download some_data.csv from my_bucket and write to ./some_data.csv locally
    save_object("some_data.csv", file = "./some_data.csv", bucket = "my_bucket")
  4. After running the previous code, a local copy of _some_data.csv_ exists in the same directory as your R script or notebook. Read from that local file to work with the data it contains.

    myData <- read.csv(file="./some_data.csv", header=TRUE, sep=",")
    View(myData)
Domino Data LabKnowledge BaseData Science BlogTraining
Copyright © 2022 Domino Data Lab. All rights reserved.