# BatchNormTraining¶

BatchNormTraining  // Compute mean and variance from the input.


## Description¶

### Inputs¶

Name Element Type Shape
input real $$(\bullet, C, \ldots)$$
gamma same as input $$(C)$$
beta same as input $$(C)$$

### Attributes¶

Name Type Notes
epsilon double Small bias added to variance to avoid division by 0.

### Outputs¶

Name Element Type Shape
normalized same as gamma Same as input
batch_mean same as gamma $$(C)$$
batch_variance same as gamma $$(C)$$

The batch_mean and batch_variance outputs are computed per-channel from input.

## Mathematical Definition¶

The axes of the input fall into two categories: positional and channel, with channel being axis 1. For each position, there are $$C$$ channel values, each normalized independently.

Normalization of a channel sample is controlled by two values:

• the batch_mean $$\mu$$, and
• the batch_variance $$\sigma^2$$;

and by two scaling attributes: $$\gamma$$ and $$\beta$$.

The values for $$\mu$$ and $$\sigma^2$$ come from computing the mean and variance of input.

$\begin{split}\mu_c &= \mathop{\mathbb{E}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \sigma^2_c &= \mathop{\mathtt{Var}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \mathtt{normlized}_{\bullet, c, \ldots} &= \frac{\mathtt{input}_{\bullet, c, \ldots}-\mu_c}{\sqrt{\sigma^2_c+\epsilon}}\gamma_c+\beta_c\end{split}$

## Backprop¶

$\begin{split}[\overline{\texttt{input}}, \overline{\texttt{gamma}}, \overline{\texttt{beta}}]=\\ \mathop{\texttt{BatchNormTrainingBackprop}}(\texttt{input},\texttt{gamma},\texttt{beta},\texttt{mean},\texttt{variance},\overline{\texttt{normed_input}}).\end{split}$

## C++ Interface¶

class BatchNormTraining : public ngraph::op::Op

Batchnorm for training operation.

Public Functions

const NodeTypeInfo &get_type_info() const

Returns the NodeTypeInfo for the node’s class. During transition to type_info, returns a dummy type_info for Node if the class has not been updated yet.

BatchNormTraining(const Output<Node> &input, const Output<Node> &gamma, const Output<Node> &beta, double epsilon)

Parameters
• input: Must have rank >= 2, [., C, …]
• gamma: gamma scaling for normalized value. [C]
• beta: bias added to the scaled normalized value [C]
• epsilon: Avoids divsion by 0 if input has 0 variance

BatchNormTraining(double eps, const Output<Node> &gamma, const Output<Node> &beta, const Output<Node> &input)

In this version of BatchNorm:

MEAN AND VARIANCE: computed directly from the content of ‘input’.

OUTPUT VALUE: A tuple with the following structure:  - The normalization of ‘input’.  - The per-channel means of (pre-normalized) ‘input’.  - The per-channel variances of (pre-normalized) ‘input’.

AUTODIFF SUPPORT: yes: ‘generate_adjoints(…)’ works as expected.

SHAPE DETAILS: gamma: must have rank 1, with the same span as input’s channel axis. beta: must have rank 1, with the same span as input’s channel axis. input: must have rank >= 2. The second dimension represents the channel axis and must have a span of at least 1. output: shall have the same shape as ‘input’. output: shall have rank 1, with the same span as input’s channel axis. output: shall have rank 1, with the same span as input’s channel axis.

void validate_and_infer_types()

Throws if the node is invalid.