An equivariant representation is a map from an input signal to a feature vector that commutes with a geometric transformation: in plain English, we can predict the transformation of the features if the input is rotated, for example.
Extracting 3D representations from 2D images that are equivariant to input transformations has been one of the most challenging tasks since 2D images present only a partial view of an object. However, the simultaneous availability of 3D models and images of the same class allows us to build reconstructions given an image of the object from a class, called semantic reconstruction. Such reconstructions have been constrained by a limited view variation. In this paper, we present a framework that learns a view equivariant representation from a 2D view of an object. Because of its equivariance property, it enables the direct computation of relative orientation (see video) without any regression or any use of a 3D model as a mediator. Moreover, it allows us to learn novel views of an object using only this powerful embedding.