If you are familiar with generative adversarial networks (GANs) and their popular variants, the term Pix2Pix should not be an unfamiliar at all. Pix2Pix is a type of conditional GAN (cGAN) that performs image-to-image translation. In the medical field, they are typically used to perform modality translation and in some cases organ segmentation.

Image for post

“Voxels on My Mind”- Don Backos

Pix2Pix, the Basics

Similar to most GANs, Pix2Pix consists of a single generator network and a single discriminator network. The generator network is nothing but a U-Net, which is a type of deep convolutional neural network originally proposed to perform biomedical image segmentation. U-Net has the following architecture:

Image for post
‘U-Net Architecture’- Department of Computer Science, University of Freiburg

The U-Net contains three major components: the Encoder, the Bottleneck and the Decoder. Technical details including numbers of input/ output channels, kernel sizes, strides and paddings of Pix2Pix’s generator U-Net can be found in its original paper. In short, the Encoder provides a contracting path which convolves and reduces the dimensionality of a given 2D image. The Bottleneck blocks contain convolutional blocks with skip connections, and finally the Decoder provides an expansion path to upscale the encoded representation. Encoder and decoder layers that share the same size are concatenated along their channels. If you are interested in learning more about the U-Net specifically and how it performs image segmentation, Heet Sankesara has a great article about it.

The discriminator of Pix2Pix is nothing but a standard convolutional network which ‘discriminates’ whether a given image is real (original training data) or fake (generated by the U-Net generator). The training objective of Pix2Pix is a simple minmax formulation of L₂/ MSE loss (adversarial loss) and L₁ loss (reconstruction loss) between the generated and real images.

One of the earliest applications of Pix2Pix was to generate cat pictures from drawings (for all you cool cats and kittens). However, it has also been extended to the medical imaging field to perform domain transfer between magnetic resonance (MR), positron emission tomography (PET) and computed tomography (CT) images.

#image-segmen #technology #data-science #deep-learning #vox2vox #data analysis

Pix2Pix, the Basics

towardsdatascience.com

Volumetric Medical Image Segmentation with Vox2Vox