Introduction

In my effort to better understand the concept of self-attention, I tried dissecting one of its particular use cases on one of my current deep learning subtopic interests: Generative Adversarial Networks (GANs). As I delved deeply into the Self-Attention GAN (or “SAGAN”) research paper, while following similar implementations on Pytorch and Tensorflow in parallel, I noticed how exhausting it could get to power through the formality and the mathematically intense blocks to arrive at a clear intuition of the paper’s contents. Although I get that formal papers are written that way for precision of language, I do think there’s a need for bite-sized versions that define the prerequisite knowledge needed and also lay down the advantages and disadvantages candidly.

In this article, I am going to try to make a computationally efficient interpretation of the SAGAN without reducing too much of the accuracy for the “hacky” people out there who want to just get started (Wow, so witty).

So, here’s how I’m going to do it:

  • What do I need to know?
  • What is it? Who made it?
  • What does it solve? Advantages and Disadvantages?
  • Possible further studies?
  • Source/s

What do I need to know?

  • Basic Machine Learning and Deep Learning concepts (Dense Layers, Activation Functions, Optimizers, Backpropagation, Normalization, etc.)
  • Vanilla GAN
  • Other GANs: Deep Convolutional GAN (DCGAN), Wasserstein GANs (WGAN)
  • Convolutional Neural Networks — Intuition, Limitations and Relational Inductive Biases (Just think of this as assumptions)
  • Spectral Norms and the Power Iteration Method
  • Two Time-Scale Update Rule (TTUR)
  • Self-Attention

#attention #deep-learning #machine-learning #data-science #generative-adversarial #deep learning

Self-Attention Generative Adversarial Networks (SAGAN)
2.25 GEEK