.. avg_pool.rst: ####### AvgPool ####### .. code-block:: cpp AvgPool // Average Pooling operation Description =========== Average pooling windows its input and produces an average for each window. Inputs ------ +-----------------+----------------+--------------------------------+--------------------+ | Name | Element Type | Shape | Notes | +=================+================+================================+====================+ | data | Any | :math:(N,C,d_1,\ldots,d_n) | :math:n>0, d_i>0 | +-----------------+----------------+--------------------------------+--------------------+ Attributes ---------- +----------------------+-----------------+----------------------------------+ | Name | Type | Notes | +======================+=================+==================================+ | w | Shape[n] | Window shape. :math:w_i\le d_i | +----------------------+-----------------+----------------------------------+ | s | Strides[n] | Window strides. | +----------------------+-----------------+----------------------------------+ | p | Shape[n] | Padding below. | +----------------------+-----------------+----------------------------------+ | q | Shape[n] | Padding above. | +----------------------+-----------------+----------------------------------+ | i | Boolean | Include padding in average. | +----------------------+-----------------+----------------------------------+ Outputs ------- +-----------------+-------------------------+--------------------------------+ | Name | Element Type | Shape | +=================+=========================+================================+ | output | Any | :math:(N,C,d'_1,\ldots,d'_n) | +-----------------+-------------------------+--------------------------------+ Average pooling takes as its input, a batch tensor data of shape :math:(N,C,d_1,\ldots,d_n), where where :math:N is the batch size, and :math:C > 0 is the number of channels (sometimes called features). The dimensions :math:(d_1,\ldots,d_n) correspond to the shape of an :math:n-dimensional data item in a batch. For example, where :math:n=2, the data may represent a two-dimensional image. It also takes four attributes: 1. *window shape*, 2. *window movement strides*, (optional) 3. *padding below*, (optional) 4. *padding above*, (optional) 5. *include padding in average* The shape of output is :math:(N,C,d'_1,\ldots,d'_n), where :math:d'_n = \lceil \frac{p_i + d_i + q_i - w_i + 1}{s_i} \rceil. **Informal definition:** If :math:\textit{i} is :math:\textit{true}, then averages are computed as though the padding region contained regular elements of value zero. If :math:\textit{i} is :math:\textit{false}, then averages are computed using only the non-padding tensor elements that are present in each window. *Example:* Consider two instances of this operator with the following attributes: :math:\textit{w} = (2,2), :math:\textit{s} = (1,1), :math:\textit{p} = (1,1), and (in one instance) :math:\textit{i} = false or (in the other instance) :math:\textit{i} = true. Consider how those two operator instances would handle this input tensor: .. math:: T_\textit{in} = \begin{bmatrix} 1 & 3 & 5 & \ldots \\ 7 & 11 & 13 & \ldots \\ 17 & 19 & 23 & \ldots \\ \vdots & \vdots & \vdots & \ddots \end{bmatrix} Applying the padding indicated by the value of :math:\textit{p}, we have the padded image of :math:T_\textit{in} as follows: .. math:: T_\textit{in,padded} = \begin{bmatrix} (0) & (0) & (0) & (0) & \ldots \\ (0) & 1 & 3 & 5 & \ldots \\ (0) & 7 & 11 & 13 & \ldots \\ (0) & 17 & 19 & 23 & \ldots \\ (0) & \vdots & \vdots & \vdots & \ddots \end{bmatrix} Now consider how the two variations of this example's *AvgPool* operator will compute the "average" value of the top-left window, which contains exactly the elements: .. math:: \begin{bmatrix} (0) & (0) \\ (0) & 1 \end{bmatrix} If :math:\textit{i} = false, then the operator simply ignores the padding elements. It therefore computes the average of the single-element set :math:\{ 1 \}, yielding :math:1.0. If :math:\textit{i} = true, then the operator computes the average of the set :math:\{ 0, 0, 0, 1\}, yielding 0.25. *Note:* This operator is ill-defined when *both* of the following conditions hold: (1) :math:\textit{i} = false, and (2) the operator's other attribute values indicate that at least one window will contain only padding elements. **Formal definition:** *In the absence of padding*, given an input data batch tensor :math:T_\textit{in}, the output tensor is defined by the equation .. math:: T_\textit{out}[a,c,i_1,\ldots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1} T_\textit{in}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}} *In the presence of padding*, we do not always want to divide by a reciprocal equal to the number of elements in the window, since some of the output points are determined by a window that is partly hanging beyond the edge of the tensor. In this case we can define the output In this case we can define the output via a few intermediate steps. First define the *sum tensor* :math:T_\textit{sum}, with shape :math:(N,C,d'_1,\ldots,d'_n), as follows. .. math:: T_\textit{sum}[a,c,i_1,\ldots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1} \textit{val}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}} where .. math:: \textit{val}[a,c,j_1,\ldots,j_n] = \begin{cases} T_\textit{in}[a,c,j_1,\ldots,j_n]&\text{if for all } k, p_k \le j_k < p_k + d_k\\ 0&\text{otherwise}. \end{cases} Second, define the *divisor tensor* :math:T_\textit{div}, with shape :math:(N,C,d'_1,\ldots,d'_n), as follows. .. math:: T_\textit{div}[a,c,i_1,\ldots,i_n] = \frac{\sum_{j_1 = s_1 i_1, \ldots, j_n = s_n i_n}^{j_1 = s_1 i_1 + w_1 - 1, \ldots, j_n = s_n i_n + w_n - 1} \textit{val}[a,c,j_1,\ldots,j_n]}{\prod_{i=1}^n{w_n}} where .. math:: \textit{val}[a,c,j_1,\ldots,j_n] = \begin{cases} 1&\text{if for all }k, p_k \le j_k < p_k + d_k\\ 0&\text{otherwise}. \end{cases} Finally, define :math:T_\textit{out} as the result of elementwise dividing :math:T_\textit{sum} by :math:T_\textit{div}. Note that at positions where :math:T_\textit{div} is zero, values may be infinity or nan. (This corresponds to a condition where the pooling window is completely out of bounds, encompassing no valid values.) Backprop ======== C++ Interface ============= .. doxygenclass:: ngraph::op::AvgPool :project: ngraph :members: