Using a custom framework¶
Distribute training across multiple nGraph backends¶
Distributed training is not officially supported in version 0.27; however, the following configuration options have worked for nGraph devices with mixed or limited success in testing.
In the previous section, we described the steps needed to create a “trainable” nGraph model. Here we demonstrate how to train a data parallel model by distributing the graph to more than one device.
Frameworks can implement distributed training with nGraph versions prior to 0.13:
-DNGRAPH_DISTRIBUTED_ENABLE=OMPIto enable distributed training with OpenMPI. Use of this flag requires that OpenMPI be a pre-existing library in the system. If it’s not present on the system, install OpenMPI version
2.1.1or later before running the compile.
-DNGRAPH_DISTRIBUTED_ENABLE=MLSLto enable the option for Intel® Machine Learning Scaling Library for Linux* OS:
The Intel® MLSL option applies to Intel® Architecture CPUs (
Interpreterbackends only. For all other backends,
OpenMPIis presently the only supported option. We recommend the use of Intel MLSL for CPU backends to avoid an extra download step.
Finally, to run the training using two nGraph devices, invoke
To deploy data-parallel training, the
AllReduce op should be added after the
steps needed to complete the backpropagation;
the new code is highlighted below:
See the full code in the
mpirun -np 2 dist_mnist_mlp
To synchronize gradients across all workers, the essential operation for data
parallel training, due to its simplicity and scalability over parameter servers,
allreduce. The AllReduce op is one of the nGraph Library’s core ops. To
enable gradient synchronization for a network, we simply inject the AllReduce op
into the computation graph, connecting the graph for the autodiff computation
and optimizer update (which then becomes part of the nGraph graph). The
nGraph Backend will handle the rest.