# Logistic Regression¶

This example performs logistic regression. The corresponding jupyter notebook is found here. We want to classify an observation \(x\) into one of two classes, denoted by \(y=0\) and \(y=1\). Using a simple linear model:

we want to find the optimal values for \(W\). Here, we use gradient descent with a learning rate of \(\alpha\) and the cross-entropy as the error function.

## Axes¶

The nervana graph uses `Axes`

to attach shape information to tensors.
The identity of `Axis`

objects are used to pair and specify dimensions
in symbolic expressions. The function `ng.make_axis`

will create an
`Axis`

object with an optionally supplied `name`

argument. For
example:

```
import ngraph as ng
import ngraph.transformers as ngt
N = ng.make_axis(length=128, name='N')
C = ng.make_axis(length=4)
```

We add `batch`

as a property to `N`

to indicate that the axis is a
batch axis. A batch axis is held out of the default set of axes reduced
in reduction operations such as sums.

## Building the graph¶

Our model has three placeholders: `X`

, `Y`

, and `alpha`

, each of
which need to have axes defined. `alpha`

is a scalar, so we pass in
empty axes:

```
alpha = ng.placeholder(axes=())
```

`X`

and `Y`

are tensors for the input and output data, respectively.
Our convention is to use the last axis for samples. The placeholders can
be specified as:

```
X = ng.placeholder(axes=[C, N])
Y = ng.placeholder(axes=[N])
```

We also need to specify the training weights, `W`

. Unlike a
placeholder, `W`

should retain its value from computation to
computation (for example, across mini-batches of training). Following
TensorFlow, we call this a *variable*. We specify the variable with both
`Axes`

and also an initial value:

```
W = ng.variable(axes=[C], initial_value=0)
```

Now we can estimate `y`

as `Y_hat`

and compute the average loss
`L`

:

```
Y_hat = ng.sigmoid(ng.dot(W, X))
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=()) / ng.batch_size(Y_hat)
```

Here we use several ngraph functions, including `ng.dot`

and
`ng.sigmoid`

. Since a tensor can have multiple axes, we need a way to
mark which axes in the first argument of `ng.dot`

are to act on which
axes in the second argument. Please also note that the `W`

has been
defined with one axis, while `X`

has two axis. Every tensor component
along C axis in `X`

is being dot-producted with `W`

, and the `N`

results are stored in `Y_hat`

, that has only one axis, the `N`

axis.

Once `Y_hat`

has been computed (the whole batch computation was
defined above), we can move on and update the weights in `W`

. Gradient
descent requires computing the gradient, \(\frac{dL}{dW}\)

```
grad = ng.deriv(L, W)
```

The `ng.deriv`

function computes the backprop using autodiff. We are
almost done as we are now ready to update `W`

. The update step (which
is an Op that will be carried out at the time of real computation on the
device) computes the new weight and assigns it to `W`

:

```
update = ng.assign(W, W - alpha * grad / ng.tensor_size(Y_hat))
```

We will also need a way to generate input data. Below the input data,
`XS`

and `YS`

, is synthetically generated as a mixture of two
Gaussian distributions in 4-d space. We shape our entire dataset as 10
mini-batches of 128 samples each, which we create with a convenient
function:

```
import gendata
g = gendata.MixtureGenerator([.5, .5], (C.length,))
XS, YS = g.gen_data(N.length, 10)
```

## Computation¶

Now we create a transformer and define a computation for learning. In order to do so, we pass the ops from which we want to retrieve the results for, followed by the placeholders.

Here, the computation will return three values for the `L`

, `W`

, and
`update`

, given inputs to fill the placeholders, \(\alpha\)
(Learning Rate), X (inputs), Y (expected outputs).

```
from contextlib import closing
with closing(ngt.make_transformer()) as transformer:
update_fun = transformer.computation([L, W, update], alpha, X, Y)
for i in range(10):
for xs, ys in zip(XS, YS):
loss_val, w_val, _ = update_fun(5.0 / (1 + i), xs, ys)
print("W: %s, loss %s" % (w_val, loss_val))
```

Finally, we train the model across the 10 epochs, printing the loss and
updated weights. Please note that we are using a decreasing policy (with
the epoch number) for \(\alpha\). Also note that there is no need to
specify the outputs when invoking update_fun, as they were specified at
definition time. Now we need only to feed the inputs into the
`update_fun`

call:

Also see Part 2 of logistic regressions, which walks users through adding additional variables, computations, and dimensions.

### Walk-through¶

This walk-through guides users through several key concepts for using the nervana graph. The corresponding jupyter notebook is found here.

Let’s begin with a very simple example: computing `x+1`

for several
values of `x`

using the `ngraph`

API. We should think of the
computation as being invoked from the *host*, but possibly taking place
somewhere else, which we will refer to as *the device.*

The nervana graph currently uses a compilation model. Users first define the computations by building a graph of operations, then they are compiled and run. In the future, we plan an even more compiler-like approach, where an executable is produced that can later be run on various platforms, in addition to an interactive version.

Our first program will use ngraph to compute `x+1`

for each `x`

provided.

# The x+1 program¶

The complete program, which we will walk through, is:

```
from __future__ import print_function
from contextlib import closing
import ngraph as ng
import ngraph.transformers as ngt
# Build the graph
x = ng.placeholder(axes=())
x_plus_one = x + 1
# Select a transformer
with closing(ngt.make_transformer()) as transformer:
# Define a computation
plus_one = transformer.computation(x_plus_one, x)
# Run the computation
for i in range(5):
print(plus_one(i))
```

We begin by importing `ngraph`

, the Python module for graph
construction, and `ngraph.transformers`

, the module for transformer
operations.

```
import ngraph as ng
import ngraph.transformers as ngt
```

Next, we create a computational graph, which we refer to as ngraph, for
the computation. Following TensorFlow terminology, we use
`placeholder`

to define a port for transferring tensors between the
host and the device. `Axes`

are used to tell the graph the tensor
shape. In this example, `x`

is a scalar so the axes are empty.

```
x = ng.placeholder(axes=())
```

x can be thought as a dummy node of the ngraph, providing an entry point
for data into the computational graph. The `ngraph`

graph construction
API uses functions to build a graph of `Op`

objects, the ngraph. Each
function may add operations to the ngraph, and will return an `Op`

that represents the computation. Here below, using implicitly ngraph as
it will be made evident at the next step, we are adding an `Op`

to the
ngraph that takes as input the variable tensor x just defined, and the
constant number 1.

```
x_plus_one = x + 1
```

A bit of behind the scenes magic occurs with the Python number `1`

in
the expression above, which is not an `Op`

. When an argument to a
graph constructor is not an `Op`

, nervana graph will attempt to
convert it to an `Op`

using `ng.constant`

, the graph function for
creating a constant.

Thus, what it is really happening (when we are defining x_plus_one as above) is:

```
x_plus_one = ng.add(x, ng.constant(1))
```

For more information about the Op hierarchy please visit:
https://ngraph.nervanasys.com/docs/latest/building_graphs.html At this
point, our computational graph has been defined with only one function
to compute represented by x_plus_one. Once the ngraph is defined, we
can compile it with a *transformer*. Here we use `make_transformer`

to
make a default transformer. We tell the transformer the function to
compute, `x_plus_one`

, and the associated input parameters, only `x`

in our example. The constant needs not to be repeated here, as it is
part of the definition of the function to compute. The current default
transformer uses NumPy for execution.

```
# Select a transformer
with closing(ngt.make_transformer()) as transformer:
# Define a computation
plus_one = transformer.computation(x_plus_one, x)
# Run the computation
for i in range(5):
print(plus_one(i))
```

The first time the transformer executes a computation, the ngraph is
analyzed and compiled, and storage is allocated and initialized on the
device. Once compiled, the computations are callable Python objects
residing on the host. On each call to `plus_one`

the value of `x`

is
copied to the device, 1 is added, and then the result is copied back
from the device to the host.

## The Compiled x + 1 Program¶

The compiled code, to be executed on the device, can be examined
(currently located in `/tmp`

folder) to view the runtime device model.
Here we show the code with some clarifying comments.

```
class Model(object):
def __init__(self):
self.a_AssignableTensorOp_0_0 = None
self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = None
self.a_AssignableTensorOp_1_0 = None
self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = None
self.a_AddZeroDim_0_0 = None
self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = None
self.be = NervanaObject.be
def alloc_a_AssignableTensorOp_0_0(self):
self.update_a_AssignableTensorOp_0_0(np.empty(1, dtype=np.dtype('float32')))
def update_a_AssignableTensorOp_0_0(self, buffer):
self.a_AssignableTensorOp_0_0 = buffer
self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_ = np.ndarray(shape=(), dtype=np.float32,
buffer=buffer, offset=0, strides=())
def alloc_a_AssignableTensorOp_1_0(self):
self.update_a_AssignableTensorOp_1_0(np.empty(1, dtype=np.dtype('float32')))
def update_a_AssignableTensorOp_1_0(self, buffer):
self.a_AssignableTensorOp_1_0 = buffer
self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_ = np.ndarray(shape=(), dtype=np.float32,
buffer=buffer, offset=0, strides=())
def alloc_a_AddZeroDim_0_0(self):
self.update_a_AddZeroDim_0_0(np.empty(1, dtype=np.dtype('float32')))
def update_a_AddZeroDim_0_0(self, buffer):
self.a_AddZeroDim_0_0 = buffer
self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_ = np.ndarray(shape=(), dtype=np.float32,
buffer=buffer, offset=0, strides=())
def allocate(self):
self.alloc_a_AssignableTensorOp_0_0()
self.alloc_a_AssignableTensorOp_1_0()
self.alloc_a_AddZeroDim_0_0()
def Computation_0(self):
np.add(self.a_AssignableTensorOp_0_0_v_AssignableTensorOp_0_0_,
self.a_AssignableTensorOp_1_0_v_AssignableTensorOp_1_0_,
out=self.a_AddZeroDim_0_0_v_AddZeroDim_0_0_)
def init(self):
pass
```

Tensors have two components: - storage for their elements (using the
convention `a_`

for the allocated storage of a tensor) and - views of
that storage (denoted as `a_...v_`

).

The `alloc_`

methods allocate storage and then create the views of the
storage that will be needed. The view creation is separated from the
allocation because storage may be allocated in multiple ways.

Each allocated storage can also be initialized to, for example, random
Gaussian variables. In this example, there are no initializations, so
the method `init`

, which performs the one-time device initialization,
is empty. Constants, such as 1, are copied to the device as part of the
allocation process.

The method `Computation_0`

handles the `plus_one`

computation.
Clearly this is not the optimal way to add 1 to a scalar, so let’s look
at a more complex example next in the Logistic Regression walk-through.

# Logistic Regression Part 2¶

In this example, we extend the code from Part 1 with several important
features: - Instead of just updating the weight matrix `W`

, we add a
bias `b`

and use the `.variables()`

method to compactly update both
variables. - We attach an additional computation to the transformer to
compute the loss on a held-out validation dataset. - We switch from a
flat `C`

-dimensional feature space to a `W x H`

feature space to
demonstrate multi-dimensional logistic regression.

The corresponding jupyter notebook is found here.

```
import ngraph as ng
import ngraph.transformers as ngt
import gendata
```

The axes creation is conceptually the same as before, except we now add
a new axes `H`

to represent the new feature space.

```
ax_W = ng.make_axis(length=2)
ax_H = ng.make_axis(length=2) # new axis added.
ax_N = ng.make_axis(length=128, name='N')
```

## Building the graph¶

Our model, as in the previous example, has three placeholders: `X`

,
`Y`

, and `alpha`

. But now, the the input `X`

has shape
`(W, H, N)`

:

```
alpha = ng.placeholder(())
X = ng.placeholder([ax_W, ax_H, ax_N]) # now X has shape (W, H, N)
Y = ng.placeholder([ax_N])
```

Similarly, the weight matrix is now multi-dimensional, with shape
`(W, H)`

, and we add a new scalar bias variable. We want also to
specify that, for the weight matrix `W`

, both axes will be reduced
when computing the element-wise product and summation with the inputs
(so we add `-1`

to specify this)

```
W = ng.variable([ax_W, ax_H], initial_value=0).named('W') # now the Weight Matrix W has shape (W, H)
b = ng.variable((), initial_value=0).named('b')
```

Our predicted output will now be including the bias `b`

. Please note
there here the + operation implicitly broadcasts `b`

to the batch size
N, the size of the only axis of Y_hat:

```
Y_hat = ng.sigmoid(ng.dot(W, X) + b)
L = ng.cross_entropy_binary(Y_hat, Y, out_axes=())
```

For the parameter updates, instead of explicitly specifying the
variables `W`

and `b`

, we can call `L.variables()`

to retrieve all
the variables that the loss function depends on:

```
print([var.name for var in L.variables()])
```

For complicated ngraphs, the `variables()`

method makes it easy to
iterate over all its dependant variables. Our new parameter update is
then:

```
updates = [ng.assign(v, v - alpha * ng.deriv(L, v) / ng.batch_size(Y_hat))
for v in L.variables()]
```

Please note that this time we embedded the (call to the) gradient
computation inside the definition of the weight update computation. As
stated in the previous example, the `ng.deriv`

function computes the
backprop using autodiff. The update step computes the new weight and
assigns it to `W`

:

```
all_updates = ng.doall(updates)
```

## Computation¶

We have our update computation as before, but we also add an evaluation
computation that computes the loss on a separate dataset without
performing the updates. Since the evaluation computation does not
perform any update operation, we need not pass in the learning rate
`alpha`

For convenience, we define a function that computes the average cost across the validation set.

```
def avg_loss(xs, ys):
total_loss = 0
for x, y in zip(xs, ys):
loss_val = eval_fun(x, y)
total_loss += loss_val
return total_loss / x.shape[-1]
```

We then generate our training and evaluation sets and perform the updates with the same technique that we used in the previous example. We emit the average loss on the validation set during training. Please note that because the length of the axes W and H is 2 now (for both; before we had only one axis of lenght 4), the number of weights is the same as in the previous example

```
from contextlib import closing
with closing(ngt.make_transformer()) as transformer:
update_fun = transformer.computation([L, W, b, all_updates], alpha, X, Y)
eval_fun = transformer.computation(L, X, Y)
g = gendata.MixtureGenerator([.5, .5], (ax_W.length, ax_H.length))
XS, YS = g.gen_data(ax_N.length, 10)
EVAL_XS, EVAL_YS = g.gen_data(ax_N.length, 4)
print("Starting avg loss: {}".format(avg_loss(EVAL_XS, EVAL_YS)))
for i in range(10):
for xs, ys in zip(XS, YS):
loss_val, w_val, b_val, _ = update_fun(5.0 / (1 + i), xs, ys)
print("After epoch %d: W: %s, b: %s, avg loss %s" % (i, w_val.T, b_val, avg_loss(EVAL_XS, EVAL_YS)))
```