# Building graphs¶

Frontends (or users who require the flexibility of constructing Intel® Nervana™ graph `Ops`

directly) can utilize a set of graph construction functions to construct Intel Nervana graphs. We walk through the common patterns and arguments of these `Ops`

here. We also discuss the underlying class structure of `Op`

, which is not typically a concern for users or frontends but that illustrates a hierarchical structure that can be helpful.

## Intel Nervana graph structure¶

### Data dependencies¶

An `Op`

‘s primary role is to function as a node in a directed acyclic graph dependency computation graph. The `Op`

class’s attribute `args`

is a list containing all upstream dependencies that this `Op`

operates upon. These operate as the directed edges of the graph.

For example:

```
>>> x = ng.constant(0)
>>> y = ng.constant(1)
>>> mysum = ng.add(x, y)
>>> type(mysum)
ngraph.op_graph.op_graph.AddOp
>>> issubclass(mysum, ngraph.op_graph.op_graph.Op)
True
>>> mysum.args
(<AssignableTensorOp(<Const(0)>):4500972432>,
<AssignableTensorOp(<Const(1)>):4500974224>)
```

`mysum`

then refers to an instance of the class `Add`

, which is a subclass of `Op`

. `mysum.args`

is a list containing the `Ops`

pointed to by the Python variables `x`

and `y`

.

### Initializers¶

In addition to `args`

, there are two other types of edges in Intel Nervana graphs. Each op has an attribute, `initializers`

, which contains a (possibly empty) set of ops that need to execute before any computations occur. To use our running example:

```
>>> mysum.initializers
set()
>>> x.initializers
{<InitTensorOp(InitTensorOp_1):4500973392>}
```

We see here that `mysum`

doesn’t have any initializers because its value is only known at runtime. On the other hand, `x`

is a constant, and can and must be initialized before any computations occur. Initializer subgraphs (the ops in `initializers`

and all upstream ops) themselves contain `SetItem`

, `Fill`

, `Flatten`

, `ConstantOp`

and other ops to manipulate a tensor to get it ready for computation.

### Non-data control dependencies¶

Finally, consider the following graph construction:

```
>>> x = ng.placeholder((), initial_value=0)
>>> a = ng.assign(x, 5)
>>> z = x + 1
```

Here we create a scalar placeholder `x`

, an assignment of 5 to the placeholder `x`

, and an addition of 1 to `x`

.

It might not be clear if `z`

should equal `1`

or `6`

when evaluated. The subgraph for `z`

does not include the assignment, so the result would be `1`

. To include the assignment, we provide `ng.sequential`

which causes ops to be executed in the order listed, with the last op serving as the value, subject to the constraint that ops in a computation are only executed once.

To force the assignment, we would write:

```
>>> x = ng.placeholder((), initial_value=0)
>>> z = ng.sequential([
ng.assign(x, 5),
x + 1
])
```

Now `z`

performs the assignment and then returns the value of `x + 1`

.

## General properties of ops¶

All operational graph ops are instances of the class `ngraph.op_graph.op_graph.Op`

, which extends `ngraph.op_graph.names.ScopedNameableValue`

. This provides `Ops`

with automatically generated unique names.

In addition to the graph properties explained above (`args`

) all ops have the following additional attributes:

*axes*- The axes of the result of the computation. This only needs to be specified
by the frontend or user during
`Op`

creation if the default result is not correct or is not inferrable for a particular`Op`

type. The`axes`

are also available as a gettable property. *name*- A string that can help identify the node during debugging, or when searching for a node in a set of nodes.
Some frontends may also make use of the
`name`

. The`name`

is a settable property. *metadata*- A dictionary of key, value string pairs that can be used to select/filter
ops when manipulating them. For example,
`stochastic=dropout`

may be used to indicate groups of trainable variables in conjunction with dropout.

## Op hierarchy¶

Users and frontends do not typically need to worry about the implementation details of the various `Op`

classes. This is why they are hidden behind graph construction functions.

## Ops influencing evaluation¶

During computation (which we cover in more detail in Transformers), the input and output values must be stored somewhere. To create a `placeholder`

expression in the operational graph, we must import the operational backend symbols and then create the `placeholder`

:

```
import ngraph as ng
ax_C = ng.make_axis(length=4, name='C')
ax_W = ng.make_axis(length=2, name='W')
ax_H = ng.make_axis(length=2, name='H')
ax_N = ng.make_axis(length=128, name='N')
x = ng.placeholder((ax_C, ax_W, ax_H, ax_N))
```

This `placeholder`

creates an `AssignableTensorOp`

that triggers the necessary storage to be allocated on the host device and triggers values to be transferred between the device and host. When the `Op`

is used in a graph computation, the `Op`

serves as a Python handle for the tensor stored on the device.

It is important to remember that `x`

is a Python variable that holds an `Op`

. Therefore, the following code

```
x = x + x
```

does not directly double the value of the tensor in the `placeholder`

. Instead, the `__add__`

method is called with
both arguments pointing to the same `placeholder`

object. This returns a new `Op`

that is now stored as the python variable `x`

.

Consider the following example:

```
x1 = x + x
y = x1 * x1 - x
```

The intermediate value `x + x`

is only computed once, since the same `Op`

is used for both arguments of the multiplication in `y`

.

Furthermore, in this computation, all the computations are automatically performed in place. If the computation is later modified such that the intermediate value `x + x`

is needed, the op-graph automatically adjusts the computation’s implementation to make the intermediate result `x + x`

available. This same flexibility exists with *NumPy* or *PyCUDA*, but those implementations always allocate tensors for the intermediate values, relying on Python’s garbage collector to clean them up. This means the peak memory usage will be higher and there will be more overhead.

## Derivatives¶

Because `Ops`

describe computations, we have enough information to compute derivatives, using the `deriv`

function.

```
import ngraph as ng
ax_C = ng.make_axis(length=4, name='C')
ax_Y = ng.make_axis(length=4, name='Y')
ax_W = ng.make_axis(length=2, name='W')
ax_H = ng.make_axis(length=2, name='H')
ax_N = ng.make_axis(length=128, name='N')
x = ng.placeholder((ax_C, ax_W, ax_H, ax_N))
y0 = ng.placeholder((ax_Y, ax_N))
w = ng.variable((ax_C, ax_W, ax_H, ax_Y))
b = ng.variable((ax_Y,))
y = ng.tanh(ng.dot(w, x) + b)
c = ng.squared_L2(y - y0)
d = ng.deriv(c, w)
```

The Python variable `d`

will hold an `Op`

whose value is the derivative `dc/dw`

. In this example, we knew which `Ops`

contain the variables to be trained (for example, `w`

). For a more general optimizer, we could search through all the subexpressions to look for the dependant variables. This is handled by the `variables`

method, so `c.variables()`

would return the list of `Ops`

`[w, b]`

.

An important distinction to make here is that the `deriv`

function does not perform symbolic or numeric differentiation. In fact, it does not compute anything at all. Its sole job is to construct another computational graph using the existing upstream graph of `c`

and then return a handle to that new computational graph (`d`

). Therefore, no computation is taking place at this point until a user evaluates a computation of `d`

using a transformer.

Note

The following functionality is likely to be supplanted by more composable abstractions involving op graph containers in the future.

In some cases, it is convenient for an op graph construction function to associate additional information with an `Op`

. For example, the `softmax`

function returns a `DivideOp`

but when that output value is then used in a cross-entropy entropy calculation, the derivative computation would be numerically unstable if performed directly. To avoid this, the `softmax`

function can indicate that the `DivideOp`

is part of a `softmax`

computation and can add a `deriv_handler`

to the `DivideOp`

to indicate the subgraphs that are useful in cross-entropy and derivative calculations.

More details about the mechanics of automatic differiantion and how `deriv`

works are covered in Autodiff.