The current Deep Learning tend to be data intensive because they need to be taught all the types of invariances one would expect during inference. But humans are very efficient at recognising already seen objects when shown in a different lighting, rotational etc conditions.

One of the reasons behind success of CNNs (Convolutional Neural Networks) is their weight sharing property. The weight sharing property of CNNs make their predictions invariant and the output feature maps equivariant to translation of the subject in the input image.

Let $f$ be a function with input $x$ and $T$ be a transformation function that transforms $x$.

Mathematically invariance can be written as $f(T(x)) = f(x)$, invariance in predictions is generally expected from a well functioning Deep Learning system. And equivariance can be written as $f(T(x)) = T(f(x))$, this means feature maps of a transformed input are same as similarly transfomed feature maps of original inpu, to expect invariance in predictions the network should either intrinsically have equivariance to various input transformations or extrinsically fed via data augmentation, which is rather data intensive.

The authors of the paper Group Equivariant Convolutional Networks introduce a new type of architecture which takes the mathematical object "Group" as its inductive bias.

Symmetry

In mathematics when we talk about symmetry, we are actually talking about those transformations that leave the underlying representation same. In machine learning, we are interested in symmetries of the labels that we are trying to predict i.e we want out model to predict dog for all images below. Since the label in reality invariant under rotation, the label is said to have rotational symmetry. The mathematics of symmetry is called "Group Theory". A group is simply the set of all symmetries of an object.

Group Convolutions

In vanilla convolutions the filter is translated across the image and inner product is computed at each stride. In group convolution the filter is transformed according to the symmetries of the group and then inner product is computed.

In the paper the author consider a group called $p4$ which consists of all compositions of translations and rotations by 90 degrees about any center of rotation in a square grid. A convenient parameterization of this group in terms of three integers $r, u, v$ is

$\text { where } 0 \leq r<4 \text { and }(u, v) \in \mathbb{Z}^{2}$. Now the filter not only translates but also rotates. For each of the four orientations of the filter, vanilla convolution is performed and  the result is stored in an output feature map. It has been found that by doing this, the feature maps remained equivariant under rotation of input image.

In the experiments, it has been observed that G-Conv performed better than vanilla convolution with or without data augmentation. Also, when the data augmentation included transformations that are not under the group it helped the score, since using the transformation that are already part of the group makes the input data redundant.

This paper has opened a new line of research by adding additional mathematical structure to the current filters on CNNs which made the learning process data efficient.

References

  1. Group Equivariant Convolutional Networks, Cohen & Welling, ICML 2017
  2. http://scyfer.nl/2016/12/13/data-efficient-deep-learning-with-g-cnns/