Distribute training across multiple nGraph backends


Distributed training is not officially supported in version 0.15; however, the following configuration options have worked for nGraph devices with mixed or limited success in testing.

In the previous section, we described the steps needed to create a “trainable” nGraph model. Here we demonstrate how to train a data parallel model by distributing the graph to more than one device.

Frameworks can implement distributed training with nGraph versions prior to 0.13:

  • Use -DNGRAPH_DISTRIBUTED_ENABLE=OMPI to enable distributed training with OpenMPI. Use of this flag requires that OpenMPI be a pre-existing library in the system. If it’s not present on the system, install OpenMPI version 2.1.1 or later before running the compile.

  • Use -DNGRAPH_DISTRIBUTED_ENABLE=MLSL to enable the option for Intel® Machine Learning Scaling Library for Linux* OS:


    The Intel® MLSL option applies to Intel® Architecture CPUs (CPU) and Interpreter backends only. For all other backends, OpenMPI is presently the only supported option. We recommend the use of Intel MLSL for CPU backends to avoid an extra download step.

Finally, to run the training using two nGraph devices, invoke

$ mpirun

To deploy data-parallel training, the AllReduce op should be added after the steps needed to complete the backpropagation; the new code is highlighted below:

    ngraph::autodiff::Adjoints adjoints(NodeVector{loss},
    auto grad_W0 = adjoints.backprop_node(W0);
    auto grad_b0 = adjoints.backprop_node(b0);
    auto grad_W1 = adjoints.backprop_node(W1);
    auto grad_b1 = adjoints.backprop_node(b1);

    auto avg_grad_W0 = std::make_shared<op::AllReduce>(grad_W0);
    auto avg_grad_b0 = std::make_shared<op::AllReduce>(grad_b0);
    auto avg_grad_W1 = std::make_shared<op::AllReduce>(grad_W1);
    auto avg_grad_b1 = std::make_shared<op::AllReduce>(grad_b1);

    auto W0_next = W0 + avg_grad_W0;
    auto b0_next = b0 + avg_grad_b0;
    auto W1_next = W1 + avg_grad_W1;
    auto b1_next = b1 + avg_grad_b1;

See the full code in the examples folder /doc/examples/mnist_mlp/dist_mnist_mlp.cpp.

$ mpirun -np 2 dist_mnist_mlp