CNN Deep Learning Models — Why it interpreted what it interpreted?

Deep Learning models are now able to give very high accuracy. The most critical piece for adopting computer vision algorithms at scale for Image Classification, Object Detection, Semantic Segmentation, Image Captioning, or Visual Question-Answer is understanding why the CNN model interpreted what they interpreted.

Explainability or Interpretability of a CNN model is the key to build the trust and its adoption

Only if we understand why the model failed to identify a class or an object, then we can concentrate our efforts to address the failure, of the model. Better explainable or interpretable deep learning models will help humans build trust and lead to higher adoption rates.

A good explainable or interpretable model should highlight fine-grained details in the image to visually explain why a class was predicted by the model.

Several methods explain the CNN models like

  • Guided backpropagation visualizes fine-grained details in the image. Its premise is: neurons act like detectors of particular image features, so *when backpropagation, the gradient, negative gradients are set to zero *to highlight the pixels that are important in the image.
  • Class Activation Maps(CAM) are class discriminative, where it localizes the category or class of the image. CAM requires feature maps to directly precede the prediction layer. CAM is thus applicable to CNN architectures performing global average pooling over convolutional maps immediately prior to the prediction layer and hence not generic to other computer vision algorithms.
  • Grad CAM visualizations *are **class-discriminative *and localize relevant image regions but do not highlight the fine-grained pixel importance like guided backpropagation; however, unlike CAM, **Grad CAM applies to any CNN architecture.

Guided Grad CAM combines the best of Grad CAM, which is class-discriminative and localizes relevant image regions, and Guided Backpropagation, which visualizes gradients with respect to the image where negative gradients set to zero to highlight import pixel in the image when backpropagating through ReLU layers.

