This walk-through guides users through several key concepts for using the nervana graph. The corresponding jupyter notebook is found here.

Let’s begin with a very simple example: computing x+1 for several values of x using the ngraph API. We should think of the computation as being invoked from the host, but possibly taking place somewhere else, which we will refer to as the device.

The nervana graph currently uses a compilation model. Users first define the computations, then they are compiled and run. In the future, we plan an even more compiler-like approach, where an executable is produced that can later be run on various platforms, in addition to an interactive version.

Our first program will provide values for x and receive x+1 for each x provided.

The x+1 program

The complete program, which we will walk through, is:

from __future__ import print_function
import ngraph as ng
import ngraph.transformers as ngt

# Build the graph
x = ng.placeholder(axes=())
x_plus_one = x + 1

# Select a transformer
transformer = ngt.make_transformer()

# Define a computation
plus_one = transformer.computation(x_plus_one, x)

# Run the computation
for i in range(5):

We begin by importing ngraph, the Python module for graph construction, and ngraph.transformers, the module for transformer operations.

import ngraph as ng
import ngraph.transformers as ngt

Next, we create an operational graph (op-graph) for the computation. Following TensorFlow terminology, we use placeholder to define a port for transferring tensors between the host and the device. Axes are used to tell the graph the tensor shape. In this example, x is a scalar so the axes are empty.

x = ng.placeholder(axes=())

The ngraph graph construction API uses functions to build a graph of Op objects. Each function may add operations to the graph, and will return an Op that represents the computation. Here, the Op returned is a TensorOp, which defines the Python “magic methods” for arithmetic (for example, __add__()).

x_plus_one = x + 1

Another bit of behind the scenes magic occurs with the Python 1, which is not an Op. When an argument to a graph constructor is not an Op, nervana graph will attempt to convert it to an Op using ng.constant, the graph function for creating a constant. Thus, what is really happening is:

x_plus_one = ng.add(x, ng.constant(1))

Once the op-graph is defined, we can compile it with a transformer. Here we use make_transformer to make a default transformer. We tell the transformer the function to compute, x_plus_one, and the associated parameter x. The current default transformer uses NumPy for execution.

# Select a transformer
transformer = ngt.make_transformer()

# Define a computation
plus_one = transformer.computation(x_plus_one, x)

The first time the transformer executes a computation, the graph is analyzed and compiled, and storage is allocated and initialized on the device. Once compiled, the computations are callable Python objects.

On each call to x_plus_one the value of x is copied to the device, 1 is added, and then the result is copied back from the device.

# Run the computation
for i in range(5):

The Compiled x + 1 Program

The compiled code can be examined (currently located in /tmp folder) to view the runtime device model. Here we show the code with some clarifying comments.

class Model(object):
    def __init__(self):
        self.a_AssignableTensorOp_0_0 = None
        self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = None
        self.a_AssignableTensorOp_1_0 = None
        self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = None
        self.a_AddZeroDim_0_0 = None
        self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = None
        self.be = NervanaObject.be

    def alloc_a_AssignableTensorOp_0_0(self):
        self.update_a_AssignableTensorOp_0_0(np.empty(1, dtype=np.dtype('float32')))

    def update_a_AssignableTensorOp_0_0(self, buffer):
        self.a_AssignableTensorOp_0_0 = buffer
        self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = np.ndarray(shape=(), dtype=np.float32,
            buffer=buffer, offset=0, strides=())

    def alloc_a_AssignableTensorOp_1_0(self):
        self.update_a_AssignableTensorOp_1_0(np.empty(1, dtype=np.dtype('float32')))

    def update_a_AssignableTensorOp_1_0(self, buffer):
        self.a_AssignableTensorOp_1_0 = buffer
        self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = np.ndarray(shape=(), dtype=np.float32,
            buffer=buffer, offset=0, strides=())

    def alloc_a_AddZeroDim_0_0(self):
        self.update_a_AddZeroDim_0_0(np.empty(1, dtype=np.dtype('float32')))

    def update_a_AddZeroDim_0_0(self, buffer):
        self.a_AddZeroDim_0_0 = buffer
        self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = np.ndarray(shape=(), dtype=np.float32,
            buffer=buffer, offset=0, strides=())

    def allocate(self):

    def Computation_0(self):

    def init(self):

Tensors have two components: - storage for their elements (using the convention a_ for the allocated storage of a tensor) and - views of that storage (denoted as a_...v_).

The alloc_ methods allocate storage and then create the views of the storage that will be needed. The view creation is separated from the allocation because storage may be allocated in multiple ways.

Each allocated storage can also be initialized to, for example, random Gaussian variables. In this example, there are no initializations, so the method init, which performs the one-time device initialization, is empty. Constants, such as 1, are copied to the device as part of the allocation process.

The method Computation_0 handles the plus_one computation. Clearly this is not the optimal way to add 1 to a scalar, so let’s look at a more complex example next in the Logistic Regression walk-through.

Logistic Regression

This example performs logistic regression. The corresponding jupyter notebook is found here.

We want to classify an observation \(x\) into one of two classes, denoted by \(y=0\) and \(y=1\). Using a simple linear model:


we want to find the optimal values for \(W\). Here, we use gradient descent with a learning rate of \(\alpha\) and the cross-entropy as the error function.


The nervana graph uses Axes to attach shape information to tensors. The identity of Axis objects are used to pair and specify dimensions in symbolic expressions. The function ng.make_axis will create an Axis object with an optionally supplied name argument. For example:

import ngraph as ng
import ngraph.transformers as ngt

my_axis = ng.make_axis(length=256, name='my_axis')

Alternatively, we can use a NameScope to set the names of the various axes. A NameScope is an object that sets the name of an object to that of its assigned attribute. So when we set ax.N to an Axis object, the name of the object is automatically set to ax.N. This a convenient way to define axes, so we use this approach for the rest of this example.

ax = ng.make_name_scope("ax")
ax.N = ng.make_axis(length=128, batch=True)
ax.C = ng.make_axis(length=4)

We add batch as a property to ax.N to indicate that the axis is a batch axis. A batch axis is held out of the default set of axes reduced in reduction operations such as sums.

Building the graph

Our model has three placeholders: X, Y, and alpha, each of which need to have axes defined. alpha is a scalar, so we pass in empty axes:

alpha = ng.placeholder(axes=())

X and Y are tensors for the input and output data, respectively. Our convention is to use the last axis for samples. The placeholders can be specified as:

X = ng.placeholder(axes=[ax.C, ax.N])
Y = ng.placeholder(axes=[ax.N])

We also need to specify the training weights, W. Unlike a placeholder, W should retain its value from computation to computation (for example, across mini-batches of training). Following TensorFlow, we call this a variable. We specify the variable with both Axes and also an initial value:

W = ng.variable(axes=[ax.C - 1], initial_value=0)

The nervana graph axes are agnostic to data layout on the compute device, so the ordering of the axes does not matter. As a consequence, when two tensors are provided to a ng.dot() operation, for example, one needs to indicate which are the corresponding axes that should be matched together. We use “dual offsets” of +/- 1 to mark which axes should be matched during a multi-axis operation, which gives rise to the ax.C - 1 observed above. For more information, see the Axes section of the user guide.

Now we can estimate y as Y_hat and compute the average loss L:

Y_hat = ng.sigmoid(ng.dot(W, X))
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)

Here we use several ngraph functions, including ng.dot and ng.sigmoid. Since a tensor can have multiple axes, we need a way to mark which axes in the first argument of ng.dot are to act on which axes in the second argument.

Every axis is a member of a family of axes we call duals of the axis, and each axis in the family has a position. When you create an axis, its dual position is 0. dot pairs axes in the first and second arguments that are of the same dual family and have consecutive positions.

We want the variable W to act on the ax.C axis, so we want the axis for W to be in the position before ax.C, which we can obtain with ax.C - 1. We initialize W to 0.

Gradient descent requires computing the gradient, \(\frac{dL}{dW}\)

grad = ng.deriv(L, W)

The ng.deriv function computes the backprop using autodiff. We are almost done. The update step computes the new weight and assigns it to W:

update = ng.assign(W, W - alpha * grad / ng.tensor_size(Y_hat))


Now we create a transformer and define a computation. We pass the ops from which we want to retrieve the results for, followed by the placeholders:

transformer = ngt.make_transformer()
update_fun = transformer.computation([L, W, update], alpha, X, Y)

Here, the computation will return three values for the L, W, and update, given inputs to fill the placeholders.

The input data is synthetically generated as a mixture of two Gaussian distributions in 4-d space. Our dataset consists of 10 mini-batches of 128 samples each, which we create with a convenience function:

import gendata

g = gendata.MixtureGenerator([.5, .5], (ax.C.length,))
XS, YS = g.gen_data(ax.N.length, 10)

Finally, we train the model across 10 epochs, printing the loss and updated weights:

for i in range(10):
    for xs, ys in zip(XS, YS):
        loss_val, w_val, _ = update_fun(5.0 / (1 + i), xs, ys)
        print("W: %s, loss %s" % (w_val, loss_val))

Also see Part 2 of logistic regressions, which walks uses through adding additional variables, computations, and dimensions.

Logistic Regression Part 2

In this example, we extend the code from Part 1 with several important features: - Instead of just updating the weight matrix W, we add a bias b and use the .variables() method to compactly update both variables. - We attach an additional computation to the transformer to compute the loss on a held-out validation dataset. - We switch from a flat C-dimensional feature space to a W x H feature space to demonstrate multi-dimensional logistic regression.

The corresponding jupyter notebook is found here.

import ngraph as ng
import ngraph.transformers as ngt
import gendata

The axes creation is the same as before, except we now add a new axes H to represent the new feature space.

ax = ng.make_name_scope(name="ax")

ax.W = ng.make_axis(length=4)
ax.H = ng.make_axis(length=1)  # new axis added
ax.N = ng.make_axis(length=128, batch=True)

Building the graph

Our model has three placeholders: X, Y, and alpha. Now, the the input X has shape (W, H, N):

alpha = ng.placeholder(())
X = ng.placeholder([ax.W, ax.H, ax.N])  # now has shape (W, H, N)
Y = ng.placeholder([ax.N])

Similarly, the weight matrix is now multi-dimensional, with shape (W, H), and we add a new scalar bias variable.

W = ng.variable([ax.W - 1, ax.H - 1], initial_value=0).named('W')  # now has shape (W, H)
b = ng.variable((), initial_value=0).named('b')

Our predicted output now include the bias b:

Y_hat = ng.sigmoid(ng.dot(W, X) + b)
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)

For the parameter updates, instead of explicitly specifying the variables W and b, we can call L.variables() to retrieve all the variables that the loss function depends on:

print([var.name for var in L.variables()])

For complicated graphs, the variables() method makes it easy to iterate over all its dependant variables. Our new parameter update is then

updates = [ng.assign(v, v - alpha * ng.deriv(L, v) / ng.batch_size(Y_hat))
           for v in L.variables()]

The ng.deriv function computes the backprop using autodiff. We are almost done. The update step computes the new weight and assigns it to W:

all_updates = ng.doall(updates)


We have our update computation as before, but we also add an evaluation computation that computes the loss on a separate dataset without performing the updates:

transformer = ngt.make_transformer()

update_fun = transformer.computation([L, W, b, all_updates], alpha, X, Y)
eval_fun = transformer.computation(L, X, Y)

For convenience, we define a function that computes the average cost across the validation set.

def avg_loss(xs, ys):
    total_loss = 0
    for x, y in zip(xs, ys):
        loss_val = eval_fun(x, y)
        total_loss += loss_val
    return total_loss / x.shape[-1]

We then generate our training and evaluation sets and perform the updates. We emit the average loss on the validation set during training.

g = gendata.MixtureGenerator([.5, .5], (ax.W.length, ax.H.length))
XS, YS = g.gen_data(ax.N.length, 10)
EVAL_XS, EVAL_YS = g.gen_data(ax.N.length, 4)

print("Starting avg loss: {}".format(avg_loss(EVAL_XS, EVAL_YS)))
for i in range(10):
    for xs, ys in zip(XS, YS):
        loss_val, w_val, b_val, _ = update_fun(5.0 / (1 + i), xs, ys)
    print("After epoch %d: W: %s, b: %s, avg loss %s" % (i, w_val.T, b_val, avg_loss(EVAL_XS, EVAL_YS)))