Disentangled Representations

\overbrace{\mathbb E_{p_\phi(\mathbf z \vert \mathbf x)}}^{\text{Samples}} [ \underbrace{-\log q_{\mathbf \theta} ( \mathbf x\vert \mathbf z)}_{\text{reconstruction loss}} ] + {\color{orange}\beta}\; \sum_i \underbrace{\text{KL} \left(p_\phi( \mathbf z_i\vert \mathbf x)\:\vert \vert\:prior(\mathbf z_i) \right)}_{\text{compactness prior loss}}

Trying to disentangle a complicated feature space into a simpler latent representation

Code available at this repo

Beta-VAE (higgins et al. 2017) - adds hyperparameter beta to weight the compactness prior term

Beta-VAE H (burgess et al. 2018) - adds hyperparameter C to control the compactness prior term

Factor-VAE (kim & minh, 2018) - adds total correlation loss term

Beta-Total-Correlation VAE (chen et al. 2018) - same objective as factor-vae, but computed without a discriminator

TRIM (singh et al. 2020) - yields attribution on transformations to learn simpler representations

\text{encoder}: p_\phi( \mathbf z\vert \mathbf x)

InfoGAN

(chen et al. 2016)

(Nonlinear) ICA

(khemakhem et al. 2020)

\text{decoder}: q_{\mathbf \theta} ( \mathbf x\vert \mathbf z)

\underbrace{\sum_i I(x; z)}_{\text{mutual info}} + \underbrace{\text{KL} \left(q_\phi( \mathbf z_i)\:\vert\vert\:prior(\mathbf z_i) \right)}_{\text{factorial prior loss}} + \; \underbrace{\text{KL} \left( q_\phi(\mathbf z\vert \mathbf x) \vert \vert \prod_i q_\phi( \mathbf z_i\vert \mathbf x) \right)}_{\text{total correlation loss}}

\mathbf x

\mathbf{ \hat x}

\mathbf z

\mathbf \epsilon

encourages accurate reconstruction of the input

(note could do this w/ something smarter than pixel loss)

encourages points to be compactly placed in space;

this term can be further divided into 3 terms:

encourages latent variables to be independent

encourages mutual info between input and latent code to be high for a subset of the latent variables

\textbf{assumptions}\\ (1) \; X\approx f(z)\\ (2) \; \text{non-gaussianity of z}\\ (3) \; \text{independence: } P(z) = \prod_i P(z_i)

solving

maximize non-gaussianity of z or minimize mutual info between its components

\textcolor{orange}{\beta}\; \vert\sum_i \underbrace{\text{KL} \left(p_\phi( \mathbf z_i\vert \mathbf x)\:\vert\vert\:prior(\mathbf z_i) \right)}_{\text{compactness prior loss}} -C\vert

preserves information between the latent space + input

encourages latent space to be decoupled

csinva.github.io

details + code

VAE

(kingma & welling, 2013)

ALAE

(pidhorskyi et al. 2020)

StyleGan + StyleGan2

(karras et al. 2019)

disentangles by using latent representation at different scales

+ TRIM loss

(singh et al. 2020)

penalizes interpretations to be desirable (e.g. sparse, monotonic)

+ prediction loss

(singh et al. 2020)

if we are given a trained predictor, we can minimize its error rather than simply reconstructing the input