Testing latency

Important

This tutorial was tested using previous versions. While it is not currently or officially supported in the latest nGraph Compiler stack 0.27, some configuration options may still work.

Many open-source DL frameworks provide a layer where experts in data science can make use of optimizations contributed by machine learning engineers. Having a common API benefits both: it simplifies deployment and makes it easier for ML engineers working on advanced deep learning hardware to bring highly-optimized performance to a wide range of models, especially in inference.

One DL framework with advancing efforts on graph optimizations is Apache MXNet*, where Intel has contributed efforts showing how to work with our nGraph Compiler stack as an experimental backend. Our approach provides more opportunities to start working with different kinds of graph optimizations than would be available to the MXNet framework alone, for reasons outlined in our introduction documentation. Note that the MXNet bridge requires trained models only; it does not support distributed training.

Up to 45X faster

Up to 45X faster compilation with nGraph backend

Tutorial: Testing inference latency of ResNet-50-V2 with MXNet

This tutorial supports compiling MXNet with nGraph’s CPU backend.

Begin by cloning MXNet from GitHub:

git clone --recursive https://github.com/apache/incubator-mxnet

To compile run:

cd incubator-mxnet
make -j USE_NGRAPH=1

MXNet’s build system will automatically download, configure, and build the nGraph library, then link it into libmxnet.so. Once this is complete, we recommend building a python3 virtual environment for testing, and then install MXNet to the virtual environment:

python3 -m venv .venv
. .venv/bin/activate
cd python
pip install -e .
cd ../

Now we’re ready to use nGraph to run any model on a CPU backend. Building MXNet with nGraph automatically enabled nGraph on your model scripts, and you shouldn’t need to do anything special. If you run into trouble, you can disable nGraph by setting

MXNET_SUBGRAPH_BACKEND=

If you do see trouble, please report it and we’ll address it as soon as possible.

Running ResNet-50-V2 Inference

To show a working example, we’ll demonstrate how MXNet may be used to run ResNet-50 Inference. For ease, we’ll consider the standard MXNet ResNet-50-V2 model from the gluon model zoo, and we’ll test with batch_size=1. Note that the nGraph-MXNet bridge supports static graphs only (dynamic graphs are in the works); so for this example, we begin by converting the gluon model into a static graph. Also note that any model with a saved checkpoint can be considered a “static graph” in nGraph. For this example, we’ll presume that the model is pre-trained.

import mxnet as mx

# Convert gluon model to a static model
from mxnet.gluon.model_zoo import vision
import time

batch_shape = (1, 3, 224, 224)

input_data = mx.nd.zeros(batch_shape)

resnet_gluon = vision.resnet50_v2(pretrained=True)
resnet_gluon.hybridize()
resnet_gluon.forward(input_data)
resnet_gluon.export('resnet50_v2')
resnet_sym, arg_params, aux_params = mx.model.load_checkpoint('resnet50_v2', 0)

To load the model into nGraph, we simply bind the symbol into an Executor.

model = resnet_sym.simple_bind(ctx=mx.cpu(), data=batch_shape, grad_req='null')
model.copy_params_from(arg_params, aux_params)

At binding, the MXNet Subgraph API finds nGraph, determines how to partition the graph, and in the case of Resnet, sends the entire graph to nGraph for compilation. This produces a single call to an NNVM NGraphSubgraphOp embedded with the compiled model. At this point, we can test the model’s performance.

dry_run = 5
num_batches = 100
for i in range(dry_run + num_batches):
   if i == dry_run:
       start_time = time.time()
   outputs = model.forward(data=input_data, is_train=False)
   for output in outputs:
       output.wait_to_read()
print("Average Latency = ", (time.time() - start_time)/num_batches * 1000, "ms")