Deep Image Colorization

What is this project about?

The automatic colorization of grayscale images is a problem that has been dragging my attention for a long time. In this image-to-image translation problem, we want to infer the colors in an image based only on the patterns and textures that are recognizable in its colorless grayscale variant. Unfortunately, this arguably creative process is highly subjective since one can think of many different colorizations for the same grayscale image. I approached this problem by fitting a regression model in the form of a deep convolutional neural network that maps the lightness information onto the colors in the image. During this project, I learned more about so-called color spaces and discovered a new deep learning framework (PyTorch) for this and future projects. (GitHub)

What is a digital image?

We have to clarify what we understand as a digital image. In contrast to us humans, computers are relying on silicon-based hardware and digital circuits, which restricts them too finite and discrete representations of the real world. Therefore, a natural image captured by a camera is typically stored in a digital format that is composed of a few grid-based layers of numeric intensity values. Each of these grid layers (channels) has a certain semantic to it, which depends on the underlying color space. A single position in these grids, which is given by a depth vector with numeric values from all the channels is called a pixel. For instance, in the most commonly used color space (RGB) each of the channels encodes the light intensity values for the three light colors red, green, and blue so that a single pixel is a three-dimensional vector. Unfortunately, the RGB color space has a few problems related to it when it comes to the colorization of grayscale images. Because of that, I followed the approach of many related papers to tackle this image translation problem in another more suitable color space.

What is a color space: RGB vs. LAB?

  • RGB: As mentioned above, in the RGB color space each of the three channels is associated to one of the three additive primary colors: red, green, and blue. According to basic color theory, three monochromatic lights of these colors can emit light according to the pixels’ intensity values in order to create every color inbetween. Hence, all of these channels affect the color as well as the lightness of the entire image. This is suboptimal for our goal of automatically colorizing images solely based on their pixels’ lightness values. We would have had to transform the RGB image into a single-channel grayscale image and map it onto its three-channel ground-truth counterpart. Operating in another color space that I have never heard before this project has felt a lot more natural and suited to this image translation problem.

Image for post

Image for post

RGB color space visualization: (1) original image, (2) only the red ® channel, (3) only the green (G) channel, and (4) only the blue (B) channel.

  • LAB: In contrast to RGB, the responsibilities for lightness and hue/saturation are divided in the LAB color space between the three channels. The first channel (L) is given by the perceived lightness intensity values and contains no information about the hue or the saturation of the respective pixel. The other two remaining channels (A and B) are responsible for this. Together, these two color channels span out a two-dimensional plane, where each of the points refers to a certain hue and saturation combination. The first of these two channels specifies the amount of green (-) or red (+) that is contained, while the second channel specifies the amount of blue (-) or yellow (+) that is present in the respective pixel. Without going into too many details, this color space is more elaborated and approximately ensures that the euclidian distance between two colors resembles their actual perceptual distance. This is a strong argument for relying on mean-squared instead of mean-absolute regression during the training of the model. However, the arguably most convincing reason for working in this color space is that lightness and color-dependent channels are totally separated, which conceptually simplifies the structure of the colorization model.

Image for post

LAB color space visualization: (1) merged A and B color channels, (2) only the lightness (L) channel, (3) only the green-red color channel (A), and (4) only the blue-yellow color channel (B).

Image for post

LAB color space visualization: (1) merged A and B color channels, (2) only the lightness (L) channel, (3) only the green-red color channel (A), and (4) only the blue-yellow color channel (B).

The network architecture

The basic neural network architecture that I used in this project is given by a so-called cascaded refinement network, which is composed of several refinement blocks where each operates on a certain image resolution. These blocks are chained together starting at a very small resolution and get increasingly larger until the final target resolution is reached. Between these refinement blocks, bilinear upsampling is used to reduce the number of learnable parameters and induce some kind of prior on the generative function. Each of these blocks gets a concatenation of a bilinearly downsampled version of the input lightness channel (L) and the upsampled version of the previous block’s output as an input. The forward pass through this generator network is then recursively defined as a flow from the initial block that only receives a downscaled version of the main input L to the final refinement block that produces the two AB color channels. Feeding the lightness channel multiple times to the network at different resolutions is supposed to help the network with keeping the shapes and textures of the grayscale image in mind and presumably allows it to focus more on the iterative refinement of its color choices. For more technical information about the exact structure of the generator blocks, I refer to my GitHub repository. This cascaded network was then trained in a supervised manner using the mean-squared-error loss function. Unfortunately, this basic generator network has its problems with standard mean-squared regression and produced in the majority of the cases only low-saturated colorizations with a low variety in colors.

#pytorch #computer-science #deep-learning #computer-vision #deep learning

What is GEEK

Buddha Community

Deep Image Colorization
Vern  Greenholt

Vern Greenholt


Paper review- Reference-Based Sketch Image Colorization

Every Figures, Tables are come from the paper. (Marked if it is from another paper or other website.)


  1. Abstract
  2. Method
  3. Result and Experiments
  4. My Opinion


Image for post

Fig 1. Qualitative results using CelebA,
Image for post

Fig 2. Qualitative results using Tag2pix

This paper is accepted by CVPR 2020.

The authors say that colorization tasks have been successful in grayscale images, but in the case of sketch or outline images, they are challenging because they do not include pixel intesity.

The commonly used method to solve this problem is that utilize User hint and Reference image.

However, in the case of the reference image, the study is still slow due to few datasets and information discrepancy between the sketch and the reference.

Therefore, the authors try to solve the above problem in two ways .

  • we utilize an augmented-self reference which is generated from the original image by both color perturbation and geometric distortion. This reference contains the most of the contents from original image itself, thereby providing a full information of correspondence for the sketch, which is also from the same original image
  • our model explicitly transfers the contextual representations obtained from the reference into the spatially corresponding positions of the sketch by the attention-based pixel-wise feature transfer module, which we term the spatially corresponding feature transfer (SCFT) module

The authors argue that the above two methods can optimize the network without manually annotated labels.

Currently(2020–7–29), the official code for this model has not been released yet.

2. Method

Image for post

Image for post

Fig 3. Overall workflow of model

2–1. Overall Workflow

As illustrated in Fig. 3, I is a color image source, I_s is a sketch image extracted using an outline extractor, and I_r is a reference image obtained by applying thin plate splines transformation (TPS).Models that receive I_s and I_r extract activation maps f_s and f_r using two independent encoders E_s(I_s) and E_r(I_r).

To transfer information from I_r to I_s, this model uses the SCFT module inspired by the self-attention mechanism. SCFT calculates dense correspondences between all I_r and I_s pixels. Based on visual mapping obtained from SCFT, context features that combine information between I_r and I_s get final colored output by passing through the models.

2–2.Augmented-Self Reference Generation

Image for post

Fig 4. Appearance transform a(·) and TPS transformation s(·)

Appearance and spatial transformation are performed to generate I_r from I. At this time, the authors argue that since I_r is generated from I, it is guaranteed to include data useful for colorizing I_s.

Appearance transform a(·): The process of adding particular random noise to each RGB pixel.The reason for doing this is to prevent the model from memorizing color bias.(i.e apple-> red) In addition, the authors argue that by giving a different reference for each iteration, the model enforced to utilize both E_s and E_r. At this time, a(I) is used as ground truth I_gt.

TPS transformation s(·): After applying the appearance transform, the non-linear spatial transformation operator is applied to a(I). The authors said that this prevents model from lazily bringing the color in the same pixel position from I, while enforcing model to identify semantically meaningful spatial correspondences even for a reference image with a spatially different layout, e.g., different poses.

#gans #image-processing #deep-learning #image-colorization #cvpr-2020 #deep learning

Agnes  Sauer

Agnes Sauer


All about images -Types of Images:

Everything we see around us is nothing but an Image. we capture them using our mobile camera. In Signal Processing terms, Image is a signal which conveys some information. First I will tell you about what is a signal? how many types are they? Later part of this blog I will tell you about the images.

We are saying that image is signal. Signals are carry some information. It may be useful information or random noise. In Mathematics, Signal is function which depends on independent variables. The variables which are responsible for the altering the signal are called independent Variables. we have multidimensional signals. Here you will know about only three types of signals which are mainly used in edge cutting techniques such as Image processing, Computer Vision, Machine Learning, Deep Learning.

  • 1D signal: Signals which has only one independent variable. Audio signals are the perfect example. It depends on the time. For instance, if you change time of an audio clip, you will listen sound at that particular time.
  • 2D signal: Signals which depends on two independent variables. Image is an 2D signal as its information is only depends on its length and width.
  • 3D signals : Signals which depends on three independent variables. Videos are the best examples for this. It is just motion of images with respect to time. Here image’s length and width are two independent variables and time is the third one.

Types of Images:

  • Analog Images: These are natural images. The images which we see with our eye all are Analog image such as all physical objects. It has continuous values. Its amplitude is infinite.
  • **Digital images: **By quantizing the analog images we can produce the digital images. But now-a-days, mostly all cameras produce digital images only. In digital Images, All values are discrete. Each location will have finite amplitude. Mostly we are using digital images for processing.

Image for post

Image for post

Every digital image will have group of pixels. Its coordinate system is starts from top coroner

Digital images contains stack of small rectangles. Each rectangle we call as Pixel. Pixel is the smallest unit in the image.Each Pixel will have particular value that is intensity. this intensity value is produced by the combination of colors. We have millions of colors. But our eye is perceive only three colors and their combinations. Those color we call primary colors i.e., Red, Green and Blue.

Image for post

Image for post

Why only those three colors ???

Do not think much. the reason is as our human eye has only three color receptors. Different combinations in the stimulation of the receptors enable the human eye to distinguish nearly 350000 colors

Lets move to our image topic:

As of now, we knew that image intensity values is combination of Red, Green and Blue. Each pixel in color image will have these three color channels. Generally, we represent each color value in 8 bits i.e., one byte.

Now, you can say how many bits will require at each pixel. We have 3 colors at each pixel and each color value will be stored in 8 bits. Then each pixel will have 24 bits. This 24 bit color image will display 2**24 different colors.

Now you have a question. how much memory does it require to store RGB image of shape 256*256 ???I think so explanation is not required, if you want to clear explanation please comment below.

#machine-learning #computer-vision #image-processing #deep-learning #image #deep learning

I am Developer


Laravel 7/6 Image Validation

In this image validation in laravel 7/6, i will share with you how validate image and image file mime type like like jpeg, png, bmp, gif, svg, or webp before uploading image into database and server folder in laravel app.

#laravel image validation #image validation in laravel 7 #laravel image size validation #laravel image upload #laravel image validation max #laravel 6 image validation

Ahebwe  Oscar

Ahebwe Oscar


how to integrate CKEditor in Django

how to integrate CKEditor in Django

Welcome to my Blog, in this article we learn about how to integrate CKEditor in Django and inside this, we enable the image upload button to add an image in the blog from local. When I add a CKEditor first time in my project then it was very difficult for me but now I can easily implement it in my project so you can learn and implement CKEditor in your project easily.

how to integrate CKEditor in Django

#django #add image upload in ckeditor #add image upload option ckeditor #ckeditor image upload #ckeditor image upload from local #how to add ckeditor in django #how to add image upload plugin in ckeditor #how to install ckeditor in django #how to integrate ckeditor in django #image upload in ckeditor #image upload option in ckeditor

Guide To Image Color Analyzer In Python

In the modern age, we store all image data in digital memory. This can be your computer or your mobile device, or your cloud space. Whenever a device stores an image, it keeps it by breaking it into a very small mosaic or pixel form or simply saying that the computer breaks it into tiny box parts before storing any image. These small box parts can be considered as the pixel of the images. Therefore, as the size of tiles increases, the resolution of the image decreases, and the fineness of the tiles increases the resolution of the images.

The simplest way to explain the pixels is that they consist of Red, Green, and Blue. Pixels are the smallest unit of information about any image, which are arranged in the 2-dimensional grid.


If any of those three colours of any pixel is at full intensity, we can consider it is having a pixel value of 255. A pixel value can change between 0-255; if an image is fully red, then the RGB value is (255,0,0), where red is denoted by 255 and green and blue are 0. Below is the example where we can see the formation of a full-color image.

#developers corner #image color analyzer #opencv #opencv image processing #python #guide to image color analyzer in python