BatchNormTraining

BatchNormTraining  // Compute mean and variance from the input.

Description

Inputs

Name Element Type Shape
input real \((\bullet, C, \ldots)\)
gamma same as input \((C)\)
beta same as input \((C)\)

Attributes

Name Type Notes
epsilon double Small bias added to variance to avoid division by 0.

Outputs

Name Element Type Shape
normalized same as gamma Same as input
batch_mean same as gamma \((C)\)
batch_variance same as gamma \((C)\)

The batch_mean and batch_variance outputs are computed per-channel from input.

Mathematical Definition

The axes of the input fall into two categories: positional and channel, with channel being axis 1. For each position, there are \(C\) channel values, each normalized independently.

Normalization of a channel sample is controlled by two values:

  • the batch_mean \(\mu\), and
  • the batch_variance \(\sigma^2\);

and by two scaling attributes: \(\gamma\) and \(\beta\).

The values for \(\mu\) and \(\sigma^2\) come from computing the mean and variance of input.

\[\begin{split}\mu_c &= \mathop{\mathbb{E}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \sigma^2_c &= \mathop{\mathtt{Var}}\left(\mathtt{input}_{\bullet, c, \ldots}\right)\\ \mathtt{normlized}_{\bullet, c, \ldots} &= \frac{\mathtt{input}_{\bullet, c, \ldots}-\mu_c}{\sqrt{\sigma^2_c+\epsilon}}\gamma_c+\beta_c\end{split}\]

Backprop

\[\begin{split}[\overline{\texttt{input}}, \overline{\texttt{gamma}}, \overline{\texttt{beta}}]=\\ \mathop{\texttt{BatchNormTrainingBackprop}}(\texttt{input},\texttt{gamma},\texttt{beta},\texttt{mean},\texttt{variance},\overline{\texttt{normed_input}}).\end{split}\]

C++ Interface

class BatchNormTraining : public ngraph::op::Op

Batchnorm for training operation.

Public Functions

const NodeTypeInfo &get_type_info() const

Returns the NodeTypeInfo for the node’s class. During transition to type_info, returns a dummy type_info for Node if the class has not been updated yet.

BatchNormTraining(const Output<Node> &input, const Output<Node> &gamma, const Output<Node> &beta, double epsilon)

Parameters
  • input: Must have rank >= 2, [., C, …]
  • gamma: gamma scaling for normalized value. [C]
  • beta: bias added to the scaled normalized value [C]
  • epsilon: Avoids divsion by 0 if input has 0 variance

BatchNormTraining(double eps, const Output<Node> &gamma, const Output<Node> &beta, const Output<Node> &input)

In this version of BatchNorm:

MEAN AND VARIANCE: computed directly from the content of ‘input’.

OUTPUT VALUE: A tuple with the following structure: [0] - The normalization of ‘input’. [1] - The per-channel means of (pre-normalized) ‘input’. [2] - The per-channel variances of (pre-normalized) ‘input’.

AUTODIFF SUPPORT: yes: ‘generate_adjoints(…)’ works as expected.

SHAPE DETAILS: gamma: must have rank 1, with the same span as input’s channel axis. beta: must have rank 1, with the same span as input’s channel axis. input: must have rank >= 2. The second dimension represents the channel axis and must have a span of at least 1. output[0]: shall have the same shape as ‘input’. output[1]: shall have rank 1, with the same span as input’s channel axis. output[2]: shall have rank 1, with the same span as input’s channel axis.

void validate_and_infer_types()

Throws if the node is invalid.