Among the many disciplines in the field of machine learning, computer vision has arguably seen unprecedented growth. In its current form, it offers plethora of models to choose from, each with its own shine. It’s quite easy then, to lose your way in this abyss. Fret not, for this foe can be defeated, like many others, using the power of mathematics and a touch of intuition.

Before we venture forth, it is important to have the basic knowledge of machine learning under your belt. To start with, we should understand the concept of convolution in general and then we can narrow it down to its application in machine learning. In hindsight, by convolution, we mean that a function strides over the domain of another function thereby leaving its imprint(Fig 1.).

Image for post

Image for post

Fig1: The function g(x) passes through the function f(x) in its domain leaving an imprint. Source: Wikipedia

A computer can’t really “see” an image, all it can perceive are bits, 0/1. To comply with this lingo, we can represent images as a matrix of numbers, where each number corresponds to the pixel strength (0–255). We can then perform convolution by taking a **filter window **which strides over the image progressively(Fig 2.). Each filter is associated with a set of numbers which is multiplied to a portion of the image to extract a specific information. We employ multiple kernels to gather various aspects of the image. The end-goal is to learn suitable kernel weights which best encodes the data for our use case. This information capture process is what grants a computer the ability to “see”.

#convolutional-network #computer-vision #computer-science #machine-learning #beginner #deep learning

Starter’s pack for Computer Vision
1.25 GEEK