U-GAT-IT Architecture
Paper: U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive
Layer-Instance Normalization for Image-to-Image Translation
Architecture:
Generator:
- Let x ∈ {Xs, Xt} represent a sample from the source and the target domain.
- Our translation model Gs→t consists of an encoder Es, a decoder Gt, and an auxiliary classifier ηs
- where ηs(x) represents the probability that x comes from Xs.
- Let E_s^k(x) be the k-th activation map of the encoder.
- Inspired by CAM. the auxiliary classifier is trained to learn the importance weights of the k-th feature map, w_s^k, by using the global average pooling and global max pooling.
- By exploiting the importance weights, we can calculate a set of domain specific attention feature map. a_s(x) = w_s * E_s(x) , where n is the number of encoded feature maps.
- Then out translation model Gs→t becomes equal to Gt(as(x)).
- At decoder Gt, the residual blocks with AdaLIN whose parameters are dynamically computer by a fully connected layer from the attention map.
Discriminator:
- Let x ∈ {Xt, Gs→t(Xs)} represent a sample from the target domain and the translated source domain.
- The discriminator Dt consists of an encoder E_Dt, a classifier C_Dt, and an auxiliary classifier ηDt.
- Unlike the other translation models, both ηDt(x) and Dt(x) are now trained to discriminate whether x comes from Xt or Gs→t(Xs).
- Given a sample x, Dt(x) exploits the attention feature maps a_Dt(x) = w_Dt * E_Dt(x) using the importance weight w_Dt on the encoded feature maps E_Dt(x) that is trained by η_Dt(x).
- Then the discriminator Dt(x) becomes equal to C_Dt(a_Dt(x)).
Loss Function:
Adversarial Loss: An adversarial loss is employed to match the distribution of the translated images to the target image distribution.
Cycle Loss: To alleviate the mode collapse problem, we apply a cycle consistency constraint to the generator. Given an image x ∈ Xs , after the sequential translations of x from Xs to Xt and from Xt to Xs, the image should be successfully translated back to the original domain.
Identity Loss: To ensure that the color distributions of the input image and output image are similar, we apply an identity consistency constraint to the generator. Given an image x ∈ Xt, after the translation of x using Gs→t, the image should not change.
CAM Loss: By exploiting the information from the auxiliary classifiers ηs and η_Dt, given an image x ∈ {Xs, Xt}. Gs→t and Dt get to where they need to improve or what makes the most difference between two domains in the current state.
Full Objective: Finally, we jointly train the encoders, de-coders, discriminators, and auxiliary classifiers to optimize the final objective:
\(min_{G_{s->t}, G_{t->s}, \eta_s , \eta_t} . max_{D_s, D_t, \eta_Ds , \eta_Dt} \ \ \lambda_1 L_{gan} + \lambda_2 L_{cycle} + \lambda_3 L_{identity} + \lambda_4 L_{cam}\)
Resource
Written on
August
16th,
2019
by
Karthik
Feel free to share!