Every Figures, Tables are come from the paper. (Marked if it is from another paper or other website.)
Fig 1. Qualitative results using CelebA,
Fig 2. Qualitative results using Tag2pix
This paper is accepted by CVPR 2020.
The authors say that colorization tasks have been successful in grayscale images, but in the case of sketch or outline images, they are challenging because they do not include pixel intesity.
The commonly used method to solve this problem is that utilize User hint and Reference image.
However, in the case of the reference image, the study is still slow due to few datasets and information discrepancy between the sketch and the reference.
Therefore, the authors try to solve the above problem in two ways .
The authors argue that the above two methods can optimize the network without manually annotated labels.
Currently(2020–7–29), the official code for this model has not been released yet.
Fig 3. Overall workflow of model
As illustrated in Fig. 3, I is a color image source, I_s is a sketch image extracted using an outline extractor, and I_r is a reference image obtained by applying thin plate splines transformation (TPS).Models that receive I_s and I_r extract activation maps f_s and f_r using two independent encoders E_s(I_s) and E_r(I_r).
To transfer information from I_r to I_s, this model uses the SCFT module inspired by the self-attention mechanism. SCFT calculates dense correspondences between all I_r and I_s pixels. Based on visual mapping obtained from SCFT, context features that combine information between I_r and I_s get final colored output by passing through the models.
Fig 4. Appearance transform a(·) and TPS transformation s(·)
Appearance and spatial transformation are performed to generate I_r from I. At this time, the authors argue that since I_r is generated from I, it is guaranteed to include data useful for colorizing I_s.
Appearance transform a(·): The process of adding particular random noise to each RGB pixel.The reason for doing this is to prevent the model from memorizing color bias.(i.e apple-> red) In addition, the authors argue that by giving a different reference for each iteration, the model enforced to utilize both E_s and E_r. At this time, a(I) is used as ground truth I_gt.
TPS transformation s(·): After applying the appearance transform, the non-linear spatial transformation operator is applied to a(I). The authors said that this prevents model from lazily bringing the color in the same pixel position from I, while enforcing model to identify semantically meaningful spatial correspondences even for a reference image with a spatially different layout, e.g., different poses.
#gans #image-processing #deep-learning #image-colorization #cvpr-2020 #deep learning