Every Figures, Tables are come from the paper. (Marked if it is from another paper or other website.)

Content

  1. Abstract
  2. Method
  3. Result and Experiments
  4. My Opinion

1.Abstract

Image for post

Fig 1. Qualitative results using CelebA,
Image for post

Fig 2. Qualitative results using Tag2pix

This paper is accepted by CVPR 2020.

The authors say that colorization tasks have been successful in grayscale images, but in the case of sketch or outline images, they are challenging because they do not include pixel intesity.

The commonly used method to solve this problem is that utilize User hint and Reference image.

However, in the case of the reference image, the study is still slow due to few datasets and information discrepancy between the sketch and the reference.

Therefore, the authors try to solve the above problem in two ways .

  • we utilize an augmented-self reference which is generated from the original image by both color perturbation and geometric distortion. This reference contains the most of the contents from original image itself, thereby providing a full information of correspondence for the sketch, which is also from the same original image
  • our model explicitly transfers the contextual representations obtained from the reference into the spatially corresponding positions of the sketch by the attention-based pixel-wise feature transfer module, which we term the spatially corresponding feature transfer (SCFT) module

The authors argue that the above two methods can optimize the network without manually annotated labels.

Currently(2020–7–29), the official code for this model has not been released yet.

2. Method

Image for post

Image for post

Fig 3. Overall workflow of model

2–1. Overall Workflow

As illustrated in Fig. 3, I is a color image source, I_s is a sketch image extracted using an outline extractor, and I_r is a reference image obtained by applying thin plate splines transformation (TPS).Models that receive I_s and I_r extract activation maps f_s and f_r using two independent encoders E_s(I_s) and E_r(I_r).

To transfer information from I_r to I_s, this model uses the SCFT module inspired by the self-attention mechanism. SCFT calculates dense correspondences between all I_r and I_s pixels. Based on visual mapping obtained from SCFT, context features that combine information between I_r and I_s get final colored output by passing through the models.

2–2.Augmented-Self Reference Generation

Image for post

Fig 4. Appearance transform a(·) and TPS transformation s(·)

Appearance and spatial transformation are performed to generate I_r from I. At this time, the authors argue that since I_r is generated from I, it is guaranteed to include data useful for colorizing I_s.

Appearance transform a(·): The process of adding particular random noise to each RGB pixel.The reason for doing this is to prevent the model from memorizing color bias.(i.e apple-> red) In addition, the authors argue that by giving a different reference for each iteration, the model enforced to utilize both E_s and E_r. At this time, a(I) is used as ground truth I_gt.

TPS transformation s(·): After applying the appearance transform, the non-linear spatial transformation operator is applied to a(I). The authors said that this prevents model from lazily bringing the color in the same pixel position from I, while enforcing model to identify semantically meaningful spatial correspondences even for a reference image with a spatially different layout, e.g., different poses.

#gans #image-processing #deep-learning #image-colorization #cvpr-2020 #deep learning

Paper review- Reference-Based Sketch Image Colorization
1.45 GEEK