To create an on-demand Spark cluster attached to a Domino Workspace, click New Workspace from the Workspaces menu. On the Launch New Workspace dialog select the option to Attach Cluster. Specify the desired cluster settings and launch you workspace. After the workspace is up, it will have access to the Spark cluster you configured.
The Hardware Tier for your workspace will determine the compute resources available to your Spark driver process.
Similarly to workspaces, to create and on-demand Spark cluster attached to a Domino job, click on Run from the Jobs menu. One the Start a Job dialog select the option to Attach Cluster. Specify the desired cluster settings and launch your job. The job will have access to the Spark cluster you configured.
As your command, you can use any Python script that contains a PySpark job.
You can also submit jobs using spark-submit
but since it is not
recognized automatically as one of the Domino supported job types you
will need to wrap it with a shell script unless you included a copy as
spark-submit.sh
as part of
preparing your compute environment.
The following is an example of a simple wrapper my-spark-submit.sh
#!/usr/bin/env bash
spark-submit $@
Domino makes it simple to specify key settings when creating a Spark cluster.
-
Number of Executors
Number of Executors that will be available to your Spark application when the cluster starts. If Auto-scale workers is not enabled, this will always be the size of the cluster. The combined capacity of the executors will be available for your workloads.
When you instantiate Spark context with the default settings, the
spark.executor.instances
Spark setting will be set to the number specified in the above dialog.