BatchNormTraining  // Compute mean and variance from the input.



Name Element Type Shape
input real \((\bullet, C, \ldots)\)
gamma same as input \((C)\)
beta same as input \((C)\)


Name Type Notes
epsilon double Small bias added to variance to avoid division by 0.


Name Element Type Shape
normalized same as gamma Same as input
batch_mean same as gamma \((C)\)
batch_variance same as gamma \((C)\)

The batch_mean and batch_variance outputs are computed per-channel from input.

Mathematical Definition

The axes of the input fall into two categories: positional and channel, with channel being axis 1. For each position, there are \(C\) channel values, each normalized independently.

Normalization of a channel sample is controlled by two values:

  • the batch_mean \(\mu\), and
  • the batch_variance \(\sigma^2\);

and by two scaling attributes: \(\gamma\) and \(\beta\).

The values for \(\mu\) and \(\sigma^2\) come from computing the mean and variance of input.

\[\begin{split}\mu_c &= \mathop{\mathbb{E}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \sigma^2_c &= \mathop{\mathtt{Var}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \mathtt{normlized}_{\bullet, c, \ldots} &= \frac{\mathtt{input}_{\bullet, c, \ldots}-\mu_c}{\sqrt{\sigma^2_c+\epsilon}}\gamma_c+\beta_c\end{split}\]


\[\begin{split}[\overline{\texttt{input}}, \overline{\texttt{gamma}}, \overline{\texttt{beta}}]=\\ \mathop{\texttt{BatchNormTrainingBackprop}}(\texttt{input},\texttt{gamma},\texttt{beta},\texttt{mean},\texttt{variance},\overline{\texttt{normed_input}}).\end{split}\]

C++ Interface

class BatchNormTraining : public ngraph::op::Op

Batchnorm for training operation.

Subclassed by ngraph::op::gpu::BatchNormTrainingWithStats

Public Functions

const std::string &description() const

Get the string name for the type of the node, such as Add or Multiply. The class name, must not contain spaces as it is used for codegen.

A const reference to the node’s type name

BatchNormTraining(const Output<Node> &input, const Output<Node> &gamma, const Output<Node> &beta, double epsilon)

  • input: Must have rank >= 2, [., C, …]
  • gamma: gamma scaling for normalized value. [C]
  • beta: bias added to the scaled normalized value [C]
  • epsilon: Avoids divsion by 0 if input has 0 variance

BatchNormTraining(double eps, const Output<Node> &gamma, const Output<Node> &beta, const Output<Node> &input)

In this version of BatchNorm:

MEAN AND VARIANCE: computed directly from the content of ‘input’.

OUTPUT VALUE: A tuple with the following structure: [0] - The normalization of ‘input’. [1] - The per-channel means of (pre-normalized) ‘input’. [2] - The per-channel variances of (pre-normalized) ‘input’.

AUTODIFF SUPPORT: yes: ‘generate_adjoints(…)’ works as expected.

SHAPE DETAILS: gamma: must have rank 1, with the same span as input’s channel axis. beta: must have rank 1, with the same span as input’s channel axis. input: must have rank >= 2. The second dimension represents the channel axis and must have a span of at least 1. output[0]: shall have the same shape as ‘input’. output[1]: shall have rank 1, with the same span as input’s channel axis. output[2]: shall have rank 1, with the same span as input’s channel axis.

void validate_and_infer_types()

Throws if the node is invalid.