.. fusion/index.rst:
Pattern matcher
###############
* :ref:`overview`
* :ref:`passes_list`
* :ref:`more_detail`
* :ref:`passes_examples`
* :doc:`optimize-graphs`
.. _overview:
Generic graph optimizers: Optimization passes
=============================================
The pass manager infrastructure in nGraph makes it easy to reuse and mix the
generic optimization passes. It also permits you to roll your own device-specific
optimizations; that is, the same unified interface and APIs may be used to
cover both things.
Invoking these passes is fairly straightforward:
#. Create a "pass manager" object.
#. Populate it with the desired passes.
#. Pass to it a pointer to your unoptimized graph, and it’ll return a pointer
to an optimized graph.
nGraph Core includes a large library of hardware-agnostic passes -- passes useful
for almost any kind of hardware backend. Some of these passes should be familiar
to people who are comfortable with classical compiler designs. Others, like the
reshape/transpose elimination and sinking passes, are quite specific to deep
learning.
Let’s take a look at some of these passes.
.. _passes_list:
List of Passes
==============
* :ref:`algebraic_simpl`
* :ref:`common_subex_elim`
* :ref:`constant_fold`
* :ref:`reshape_transpose_elim`
* :ref:`reshape_transpose_sink`
.. _algebraic_simpl:
Algebraic Simplification
------------------------
The **Algebraic Simplification** pass implements what amounts to a "grab bag" of
algebraic simplification rules. It does some basic things like rewrite "zero
times x" to simply "zero", or "zero plus x" to plain "x".
It can also do a number of tricks more specific to deep learning. For example,
if we discover that a tensor is being sliced up by adjacent segments, only to
have those slices concatenated back together again, we can skip the slicing and
concatting altogether.
Or, if a tensor is being padded, but the actual width of the padding is zero
all around, we can skip the padding step entirely.
Several other transformations like this are implemented in the algebraic
simplification pass. And while none of these transformations might seem
particularly impressive on their own, when everything comes together the
results of this pass often yield improvement even on the initial graph straight
out of the bridge. This pass is also quite important as a "glue" pass that can
be used to clean up and/or re-simplify after other passes have done their own
tricks.
.. _common_subex_elim:
Common Subexpression Elimination
--------------------------------
.. _constant_fold:
Constant Folding
----------------
.. _core_fusion:
Core Fusion
-----------
.. _reshape_transpose_elim:
Reshape/Transpose Elimination
-----------------------------
The pass called **Reshape/Transpose Elimination** will find and optimize where
we can "push" two ``Transpose`` ops through a matrix multiplication. For example,
if you have two matrices (say, *foo* and *bar*), both of these matrices will be
transposed (to produce *foo.t* and *bar.t*, respectively), aftew which *foo.t*
and *bar.t* get multiplied together.
Often a more efficient way to implement this is to switch the order of the
arguments *foo* and *bar*, multiply them together, and then transpose the output
of the matmul. Effectively, this cuts two `Transpose` operations down to just
one, where the **Reshape/Transpose** elimination will do that rewrite for you.
Another common pattern can be optimized via nGraph is the case where two
transpositions cancel each other out. One example of this is taking the
"Transpose" of the transpose of a matrix, though actually a more common case is
when the graph is translating among different batch formats. We can often move
these operations around through a process called **Reshape sinking/swimming**,
and in cases where two transposes wind up canceling each other out, we can cut
them both out of the graph.
.. _reshape_transpose_sink:
``Reshape/Transpose Sinking``
-----------------------------
.. _elementzero_tensor_elim:
``Zero-Element Tensor Elimination``
-----------------------------------
.. _more_detail:
More detail
-----------
Let us first consider a simple example. A user would like to execute a graph
that describes the following arithmetic expression:
:math:`a + b * 1` or :math:`Add(a, Mul(b, 1))`
In the above expressions, `1` is an identity element; any element multiplied by
the identity element is equal to itself. This is the same as saying:
:math:`b * 1 = b`
The writer of an optimization pass which uses algebraic simplification would
probably want to first ``locate`` all multiplication expressions where
multiplicands are multiplied by `1` (for stage 1) and to then ``transform``,
``simplify``, or ``replace`` those expressions with just their multiplicands
(for stage 2).
To make the work of an optimization pass writer easier, the nGraph Library
includes facilities that enable the *finding* of relevant candidates using
pattern matching (via ``pattern/matcher.hpp``), and the *transforming* of the
original graph into a condensed version (via ``pass/graph_rewrite.hpp``).
Let's consider each in more detail and many ways they can help the graph
optimizer.
.. toctree::
:maxdepth: 1
graph-rewrite.rst
passes-that-use-matcher.rst
optimize-graphs.rst
.. _passes_examples:
Examples of Passes
==================
The effectiveness of these passes is more striking to look at in terms of an
actual input graph, such as one from the framework bridge.
*Figure 0* shows an excerpt from ``MobileNet v1``, a topology which makes heavy
use of group convolution.
.. _figure-mobilenet-gc:
.. figure:: ../graphics/mobilenet-group-conv.png
:width: 700px
:alt:
Figure 0: Each of these grouped convolution complexes -- the
operations within the rectangles on the left -- is very wide; each is too
wide to fit legibly on the illustration.
The group convolution fusion is able to replace each of those giant subgraphs
with a single CPU group convolution node. This ends up being a win in several
ways:
* sheer node count,
* mappability to MKL-DNN (which has an accelerated group convolution implementation),
* elimination of unnecessary temporaries, and so on.