Apache Spark is a fast and general-purpose cluster computing system that offers a unified analytics engine for large-scale data processing and machine learning.
Domino provides flexibility on how to use Spark. You can dynamically provision an on-demand Spark cluster orchestrated by Domino or you can connect to an existing Spark cluster outside of Domino.
Spark clusters can use Spot instances to save the infrastructure costs. We recommend to use Spot instances only for the driver nodes as they can recover in case of failure. For Master note, always use on-demand nodes.
If AWS interrupts a spot instance, the on-demand or scheduled job on the Spark cluster may slow down the execution. If this happens, and until AWS spot instances of the requested type become available again, the remediation is to change the hardware tier of the job to use a non-spot node pool.