# Walk-through¶

This walk-through guides users through several key concepts for using the nervana graph. The corresponding jupyter notebook is found here.

Let’s begin with a very simple example: computing `x+1`

for several
values of `x`

using the `ngraph`

API. We should think of the
computation as being invoked from the *host*, but possibly taking place
somewhere else, which we will refer to as *the device.*

The nervana graph currently uses a compilation model. Users first define the computations, then they are compiled and run. In the future, we plan an even more compiler-like approach, where an executable is produced that can later be run on various platforms, in addition to an interactive version.

Our first program will provide values for `x`

and receive `x+1`

for
each `x`

provided.

## The x+1 program¶

The complete program, which we will walk through, is:

```
from __future__ import print_function
import ngraph as ng
import ngraph.transformers as ngt
# Build the graph
x = ng.placeholder(axes=())
x_plus_one = x + 1
# Select a transformer
transformer = ngt.make_transformer()
# Define a computation
plus_one = transformer.computation(x_plus_one, x)
# Run the computation
for i in range(5):
print(plus_one(i))
```

We begin by importing `ngraph`

, the Python module for graph
construction, and `ngraph.transformers`

, the module for transformer
operations.

```
import ngraph as ng
import ngraph.transformers as ngt
```

Next, we create an operational graph (op-graph) for the computation.
Following TensorFlow terminology, we use `placeholder`

to define a
port for transferring tensors between the host and the device. `Axes`

are used to tell the graph the tensor shape. In this example, `x`

is a
scalar so the axes are empty.

```
x = ng.placeholder(axes=())
```

The `ngraph`

graph construction API uses functions to build a graph of
`Op`

objects. Each function may add operations to the graph, and will
return an `Op`

that represents the computation. Here, the `Op`

returned is a `TensorOp`

, which defines the Python “magic methods” for
arithmetic (for example, `__add__()`

).

```
x_plus_one = x + 1
```

Another bit of behind the scenes magic occurs with the Python `1`

,
which is not an `Op`

. When an argument to a graph constructor is not
an `Op`

, nervana graph will attempt to convert it to an `Op`

using
`ng.constant`

, the graph function for creating a constant. Thus, what
is really happening is:

```
x_plus_one = ng.add(x, ng.constant(1))
```

Once the op-graph is defined, we can compile it with a *transformer*.
Here we use `make_transformer`

to make a default transformer. We tell
the transformer the function to compute, `x_plus_one`

, and the
associated parameter `x`

. The current default transformer uses NumPy
for execution.

```
# Select a transformer
transformer = ngt.make_transformer()
# Define a computation
plus_one = transformer.computation(x_plus_one, x)
```

The first time the transformer executes a computation, the graph is analyzed and compiled, and storage is allocated and initialized on the device. Once compiled, the computations are callable Python objects.

On each call to `x_plus_one`

the value of `x`

is copied to the
device, 1 is added, and then the result is copied back from the device.

```
# Run the computation
for i in range(5):
print(plus_one(i))
```

### The Compiled x + 1 Program¶

The compiled code can be examined (currently located in `/tmp`

folder)
to view the runtime device model. Here we show the code with some
clarifying comments.

```
class Model(object):
def __init__(self):
self.a_AssignableTensorOp_0_0 = None
self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = None
self.a_AssignableTensorOp_1_0 = None
self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = None
self.a_AddZeroDim_0_0 = None
self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = None
self.be = NervanaObject.be
def alloc_a_AssignableTensorOp_0_0(self):
self.update_a_AssignableTensorOp_0_0(np.empty(1, dtype=np.dtype('float32')))
def update_a_AssignableTensorOp_0_0(self, buffer):
self.a_AssignableTensorOp_0_0 = buffer
self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = np.ndarray(shape=(), dtype=np.float32,
buffer=buffer, offset=0, strides=())
def alloc_a_AssignableTensorOp_1_0(self):
self.update_a_AssignableTensorOp_1_0(np.empty(1, dtype=np.dtype('float32')))
def update_a_AssignableTensorOp_1_0(self, buffer):
self.a_AssignableTensorOp_1_0 = buffer
self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = np.ndarray(shape=(), dtype=np.float32,
buffer=buffer, offset=0, strides=())
def alloc_a_AddZeroDim_0_0(self):
self.update_a_AddZeroDim_0_0(np.empty(1, dtype=np.dtype('float32')))
def update_a_AddZeroDim_0_0(self, buffer):
self.a_AddZeroDim_0_0 = buffer
self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = np.ndarray(shape=(), dtype=np.float32,
buffer=buffer, offset=0, strides=())
def allocate(self):
self.alloc_a_AssignableTensorOp_0_0()
self.alloc_a_AssignableTensorOp_1_0()
self.alloc_a_AddZeroDim_0_0()
def Computation_0(self):
np.add(self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_,
self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_,
out=self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_)
def init(self):
pass
```

Tensors have two components: - storage for their elements (using the
convention `a_`

for the allocated storage of a tensor) and - views of
that storage (denoted as `a_...v_`

).

The `alloc_`

methods allocate storage and then create the views of the
storage that will be needed. The view creation is separated from the
allocation because storage may be allocated in multiple ways.

Each allocated storage can also be initialized to, for example, random
Gaussian variables. In this example, there are no initializations, so
the method `init`

, which performs the one-time device initialization,
is empty. Constants, such as 1, are copied to the device as part of the
allocation process.

The method `Computation_0`

handles the `plus_one`

computation.
Clearly this is not the optimal way to add 1 to a scalar, so let’s look
at a more complex example next in the Logistic Regression walk-through.

## Logistic Regression¶

This example performs logistic regression. The corresponding jupyter notebook is found here.

We want to classify an observation \(x\) into one of two classes, denoted by \(y=0\) and \(y=1\). Using a simple linear model:

we want to find the optimal values for \(W\). Here, we use gradient descent with a learning rate of \(\alpha\) and the cross-entropy as the error function.

### Axes¶

The nervana graph uses `Axes`

to attach shape information to tensors.
The identity of `Axis`

objects are used to pair and specify dimensions
in symbolic expressions. The function `ng.make_axis`

will create an
`Axis`

object with an optionally supplied `name`

argument. For
example:

```
import ngraph as ng
import ngraph.transformers as ngt
my_axis = ng.make_axis(length=256, name='my_axis')
```

Alternatively, we can use a `NameScope`

to set the names of the
various axes. A `NameScope`

is an object that sets the name of an
object to that of its assigned attribute. So when we set `ax.N`

to an
`Axis`

object, the `name`

of the object is automatically set to
`ax.N`

. This a convenient way to define axes, so we use this approach
for the rest of this example.

```
ax = ng.make_name_scope("ax")
ax.N = ng.make_axis(length=128, batch=True)
ax.C = ng.make_axis(length=4)
```

We add `batch`

as a property to `ax.N`

to indicate that the axis is
a batch axis. A batch axis is held out of the default set of axes
reduced in reduction operations such as sums.

### Building the graph¶

Our model has three placeholders: `X`

, `Y`

, and `alpha`

, each of
which need to have axes defined. `alpha`

is a scalar, so we pass in
empty axes:

```
alpha = ng.placeholder(axes=())
```

`X`

and `Y`

are tensors for the input and output data, respectively.
Our convention is to use the last axis for samples. The placeholders can
be specified as:

```
X = ng.placeholder(axes=[ax.C, ax.N])
Y = ng.placeholder(axes=[ax.N])
```

We also need to specify the training weights, `W`

. Unlike a
placeholder, `W`

should retain its value from computation to
computation (for example, across mini-batches of training). Following
TensorFlow, we call this a *variable*. We specify the variable with both
`Axes`

and also an initial value:

```
W = ng.variable(axes=[ax.C - 1], initial_value=0)
```

The nervana graph axes are agnostic to data layout on the compute
device, so the ordering of the axes does not matter. As a consequence,
when two tensors are provided to a `ng.dot()`

operation, for example,
one needs to indicate which are the corresponding axes that should be
matched together. We use “dual offsets” of +/- 1 to mark which axes
should be matched during a multi-axis operation, which gives rise to the
`ax.C - 1`

observed above. For more information, see the `Axes`

section of the user guide.

Now we can estimate `y`

as `Y_hat`

and compute the average loss
`L`

:

```
Y_hat = ng.sigmoid(ng.dot(W, X))
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)
```

Here we use several ngraph functions, including `ng.dot`

and
`ng.sigmoid`

. Since a tensor can have multiple axes, we need a way to
mark which axes in the first argument of `ng.dot`

are to act on which
axes in the second argument.

Every axis is a member of a family of axes we call duals of the axis,
and each axis in the family has a position. When you create an axis, its
dual position is 0. `dot`

pairs axes in the first and second arguments
that are of the same dual family and have consecutive positions.

We want the variable `W`

to act on the `ax.C`

axis, so we want the
axis for `W`

to be in the position before `ax.C`

, which we can
obtain with `ax.C - 1`

. We initialize `W`

to `0`

.

Gradient descent requires computing the gradient, \(\frac{dL}{dW}\)

```
grad = ng.deriv(L, W)
```

The `ng.deriv`

function computes the backprop using autodiff. We are
almost done. The update step computes the new weight and assigns it to
`W`

:

```
update = ng.assign(W, W - alpha * grad / ng.tensor_size(Y_hat))
```

### Computation¶

Now we create a transformer and define a computation. We pass the ops from which we want to retrieve the results for, followed by the placeholders:

```
transformer = ngt.make_transformer()
update_fun = transformer.computation([L, W, update], alpha, X, Y)
```

Here, the computation will return three values for the `L`

, `W`

, and
`update`

, given inputs to fill the placeholders.

The input data is synthetically generated as a mixture of two Gaussian distributions in 4-d space. Our dataset consists of 10 mini-batches of 128 samples each, which we create with a convenience function:

```
import gendata
g = gendata.MixtureGenerator([.5, .5], (ax.C.length,))
XS, YS = g.gen_data(ax.N.length, 10)
```

Finally, we train the model across 10 epochs, printing the loss and updated weights:

```
for i in range(10):
for xs, ys in zip(XS, YS):
loss_val, w_val, _ = update_fun(5.0 / (1 + i), xs, ys)
print("W: %s, loss %s" % (w_val, loss_val))
```

Also see Part 2 of logistic regressions, which walks uses through adding additional variables, computations, and dimensions.

## Logistic Regression Part 2¶

In this example, we extend the code from Part 1 with several important
features: - Instead of just updating the weight matrix `W`

, we add a
bias `b`

and use the `.variables()`

method to compactly update both
variables. - We attach an additional computation to the transformer to
compute the loss on a held-out validation dataset. - We switch from a
flat `C`

-dimensional feature space to a `W x H`

feature space to
demonstrate multi-dimensional logistic regression.

The corresponding jupyter notebook is found here.

```
import ngraph as ng
import ngraph.transformers as ngt
import gendata
```

The axes creation is the same as before, except we now add a new axes
`H`

to represent the new feature space.

```
ax = ng.make_name_scope(name="ax")
ax.W = ng.make_axis(length=4)
ax.H = ng.make_axis(length=1) # new axis added
ax.N = ng.make_axis(length=128, batch=True)
```

### Building the graph¶

Our model has three placeholders: `X`

, `Y`

, and `alpha`

. Now, the
the input `X`

has shape `(W, H, N)`

:

```
alpha = ng.placeholder(())
X = ng.placeholder([ax.W, ax.H, ax.N]) # now has shape (W, H, N)
Y = ng.placeholder([ax.N])
```

Similarly, the weight matrix is now multi-dimensional, with shape
`(W, H)`

, and we add a new scalar bias variable.

```
W = ng.variable([ax.W - 1, ax.H - 1], initial_value=0).named('W') # now has shape (W, H)
b = ng.variable((), initial_value=0).named('b')
```

Our predicted output now include the bias `b`

:

```
Y_hat = ng.sigmoid(ng.dot(W, X) + b)
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)
```

For the parameter updates, instead of explicitly specifying the
variables `W`

and `b`

, we can call `L.variables()`

to retrieve all
the variables that the loss function depends on:

```
print([var.name for var in L.variables()])
```

For complicated graphs, the `variables()`

method makes it easy to
iterate over all its dependant variables. Our new parameter update is
then

```
updates = [ng.assign(v, v - alpha * ng.deriv(L, v) / ng.batch_size(Y_hat))
for v in L.variables()]
```

The `ng.deriv`

function computes the backprop using autodiff. We are
almost done. The update step computes the new weight and assigns it to
`W`

:

```
all_updates = ng.doall(updates)
```

### Computation¶

We have our update computation as before, but we also add an evaluation computation that computes the loss on a separate dataset without performing the updates:

```
transformer = ngt.make_transformer()
update_fun = transformer.computation([L, W, b, all_updates], alpha, X, Y)
eval_fun = transformer.computation(L, X, Y)
```

For convenience, we define a function that computes the average cost across the validation set.

```
def avg_loss(xs, ys):
total_loss = 0
for x, y in zip(xs, ys):
loss_val = eval_fun(x, y)
total_loss += loss_val
return total_loss / x.shape[-1]
```

We then generate our training and evaluation sets and perform the updates. We emit the average loss on the validation set during training.

```
g = gendata.MixtureGenerator([.5, .5], (ax.W.length, ax.H.length))
XS, YS = g.gen_data(ax.N.length, 10)
EVAL_XS, EVAL_YS = g.gen_data(ax.N.length, 4)
print("Starting avg loss: {}".format(avg_loss(EVAL_XS, EVAL_YS)))
for i in range(10):
for xs, ys in zip(XS, YS):
loss_val, w_val, b_val, _ = update_fun(5.0 / (1 + i), xs, ys)
print("After epoch %d: W: %s, b: %s, avg loss %s" % (i, w_val.T, b_val, avg_loss(EVAL_XS, EVAL_YS)))
```