Message Passing Interface (MPI), is a communication protocol for distributed parallel computing. Domino validates the use of Open MPI, a popular open-source MPI distribution that is widely used in high performance computing.
Open MPI has these features:
Leading open source MPI distribution: Open MPI provides low-latency and high bandwidth, gradual parallelism, and flexibility.
Support for machine learning in high performance environments: MPI is the underlying communication mechanism for higher-level machine learning training libraries. MPI is often used in Horovod to train models in high-performance environments.
Domino can dynamically provision and orchestrate an MPI cluster directly on the infrastructure backing the Domino deployment. You get quick access without needing an IT team.
Domino on-demand MPI clusters are suitable for the following workloads:
- Distributed multi-GPU training
Open MPI is ideal for distributed multi-GPU and multi-CPU training for Tensorflow, PyTorch, Keras, or MXNet models.
- High performance computing
MPI clusters have lower overhead than other distributed computing systems and are highly customizable.